[leuk0]    Leuk: Cox regression

Several authors have discussed Bayesian inference for censored survival data where the integrated baseline hazard function is to be estimated non-parametrically Kalbfleisch (1978) ,Kalbfleisch and Prentice (1980), Clayton (1991), Clayton (1994).Clayton (1994) formulates the Cox model using the counting process notation introduced by Andersen and Gill (1982) and discusses estimation of the baseline hazard and regression parameters using MCMC methods. Although his approach may appear somewhat contrived, it forms the basis for extensions to random effect (frailty) models, time-dependent covariates, smoothed hazards, multiple events and so on. We show below how to implement this formulation of the Cox model in BUGS .

For subjects
i = 1,...,n, we observe processes N i (t) which count the number of failures which have occurred up to time t. The corresponding intensity process I i (t) is given by

   I
i (t)dt = E(dN i (t) | F t- )
   
where dN
i (t) is the increment of N i over the small time interval [t, t+dt), and F t- represents the available data just before time t. If subject i is observed to fail during this time interval, dN i (t) will take the value 1; otherwise dN i (t) = 0. Hence E(dN i (t) | F t- ) corresponds to the probability of subject i failing in the interval [t, t+dt). As dt -> 0 (assuming time to be continuous) then this probability becomes the instantaneous hazard at time t for subject i . This is assumed to have the proportional hazards form   

   I
i (t) = Y i (t) l 0 (t)exp( b z i )
   
where Y
i (t) is an observed process taking the value 1 or 0 according to whether or not subject i is observed at time t and l 0 (t)exp( b z i ) is the familiar Cox regression model. Thus we have observed data D = N i (t), Y i (t), z i ; i = 1,..n and unknown parameters b and L 0 (t) = Integral( l 0 (u), u, t, 0), the latter to be estimated non-parametrically.    

The joint posterior distribution for the above model is defined by

   P(
b , L 0 () | D) ~ P(D | b , L 0 ()) P( b ) P( L 0 ())
   
For
BUGS , we need to specify the form of the likelihood P(D | b , L 0 ()) and prior distributions for b and L 0 (). Under non-informative censoring, the likelihood of the data is proportional to   

   
n
   
P [ P I i (t) dN i (t) ] exp(- I i (t)dt)
   
i = 1 t >= 0
   
   
This is essentially as if the counting process increments dN
i (t) in the time interval [t, t+dt) are independent Poisson random variables with means I i (t)dt:   

   dN
i (t) ~ Poisson(I i (t)dt)
   
We may write   

   I
i (t)dt = Y i (t)exp( b z i )d L 0 (t)
   
where d
L 0 (t) = L 0 (t)dt is the increment or jump in the integrated baseline hazard function occurring during the time interval [t, t+dt). Since the conjugate prior for the Poisson mean is the gamma distribution, it would be convenient if L 0 () were a process in which the increments d L 0 (t) are distributed according to gamma distributions. We assume the conjugate independent increments prior suggested by Kalbfleisch (1978), namely   

   d
L 0 (t) ~ Gamma(cd L * 0 (t), c)
   
Here, d
L * 0 (t) can be thought of as a prior guess at the unknown hazard function, with c representing the degree of confidence in this guess. Small values of c correspond to weak prior beliefs. In the example below, we set d L * 0 (t) = r dt where r is a guess at the failure rate per unit time, and dt is the size of the time interval.    
   
The above formulation is appropriate when genuine prior information exists concerning the underlying hazard function. Alternatively, if we wish to reproduce a Cox analysis but with, say, additional hierarchical structure, we may use the multinomial-Poisson trick described in the
BUGS manual. This is equivalent to assuming independent increments in the cumulative `non-informative' priors. This formulation is also shown below.

The fixed effect regression coefficients
b are assigned a vague prior

   
b ~ Normal(0.0, 0.000001)
   
   

BUGS language for the Leuk example:

   model
   {
   # Set up data
      for(i in 1:N) {
         for(j in 1:T) {
   # risk set = 1 if obs.t >= t
            Y[i,j] <- step(obs.t[i] - t[j] + eps)
   # counting process jump = 1 if obs.t in [ t[j], t[j+1] )
   # i.e. if t[j] <= obs.t < t[j+1]
            dN[i, j] <- Y[i, j] * step(t[j + 1] - obs.t[i] - eps) * fail[i]
         }
      }
   # Model
      for(j in 1:T) {
         for(i in 1:N) {
            dN[i, j] ~ dpois(Idt[i, j]) # Likelihood
            Idt[i, j] <- Y[i, j] * exp(beta * Z[i]) * dL0[j]    # Intensity
         }
         dL0[j] ~ dgamma(mu[j], c)
         mu[j] <- dL0.star[j] * c # prior mean hazard

   # Survivor function = exp(-Integral{l0(u)du})^exp(beta*z)
         S.treat[j] <- pow(exp(-sum(dL0[1 : j])), exp(beta * -0.5));
         S.placebo[j] <- pow(exp(-sum(dL0[1 : j])), exp(beta * 0.5));   
      }
      c <- 0.001
      r <- 0.1
      for (j in 1 : T) {
         dL0.star[j] <- r * (t[j + 1] - t[j])
      }
      beta ~ dnorm(0.0,0.000001)
   }

Data ( click to open )

Inits for chain 1        Inits for chain 2    ( click to open )


Results

A 1000 update burn in followed by a further 10000 updates gave the parameter estimates

[leuk1]