GeoBUGS User Manual
^{
}

Version 3.2.3, March 2014

Andrew Thomas
^{1
} Nicky Best
^{2
} Dave Lunn
^{2
}

Richard Arnold
^{3
} David Spiegelhalter
^{4
}

^{1
} Dept of Mathematics & Statistics,

University of St Andrews

St Andrews

Scotland

^{2
} Department of Epidemiology & Public Health,

Imperial College School of Medicine,

Norfolk Place,

London W2 1PG, UK

^{3
} School of Mathematical and Computing Sciences

Victoria University,

P. O. Box 600, Wellington,

New Zealand

^{4
}^{
}MRC Biostatistics Unit,

Institute of Public Health,

Robinson Way,

Cambridge CB2 2SR, UK

e-mail: bugs@mrc-bsu.cam.ac.uk [general]

helsinkiant@gmail.com [technical]

internet: http://www.mrc-bsu.cam.ac.uk/bugs

**Contents
**

**Introduction
**

**Importing map polygon files
**

Splus format

ArcInfo format

Epimap format

ArcView format

Loading a polygon file into GeoBUGS

**Exporting maps from GeoBUGS into Splus format
**

**Producing adjacency matrices
**

**Editing adjacency matrices
**

**Producing maps
**

**User-specified cut-points and shading
**

**Identifying individual areas on a map
**

**Copying and saving maps
**

**Spatial distributions
**

car.normal and car.l1

car.proper

spatial.exp and spatial.disc

spatial.pred and spatial.unipred

pois.conv

mv.car

**Temporal distributions
**

Using car.normal as a random walk prior for

temporal smoothing

**Examples
**

Conditional Autoregressive (CAR) models for disease mapping:

Lip cancer in Scotland

Convolution priors: Lung cancer in a

London Health Authority

Proper CAR model: Lip cancer revisited

Bayesian kriging and spatial prediction: Surface elevation

Poisson-gamma spatial moving average (convolution) model:

Distribution of hickory trees in Duke forest

Poisson-gamma spatial moving average (convolution) model:

Ecological regression of air pollution and respiratory

illness in children

Intrinsic multivariate CAR prior for mapping multiple

diseases:Oral cavity cancer and lung cancer in

West Yorkshire (UK)

Shared component for mapping multiple diseases:

Oral cavity cancer and lung cancer in

West Yorkshire (UK)

Random walk priors for temporal smoothing of

daily air pollution estimates

Appendix 1: Technical details of

Structured Multivariate Gaussian

and Conditional Autoregressive (CAR) models

and hyperprior specification

Joint specification

Conditional specification

Intrinsic CAR model

Proper CAR model

Intrinsic Multivariate CAR model

Appendix 2: Technical details of the

Poisson-gamma Spatial Moving Average

convolution model

References

_{
}**
**Introduction
**
** [top]

GeoBUGS is an add-on module to WinBUGS which provides an interface for:

** *
**
producing maps
of the output from disease mapping and other spatial models

** *
**
creating
and
manipulating
adjacency matrices that are required as input for the

conditional autoregressive models
available in OpenBUGS for carrying out

spatial smoothing.

GeoBUGS contains map files for

** *
** Districts in Scotland (called
** Scotland
**)

northern England (called

A list of the area IDs for each map and the order in which the areas are stored in the map file can be obtained using the export Splus command.

GeoBUGS also has facilities for importing user-defined maps reading polygon formats from Splus

Polygon files can be imported into GeoBUGS from a variety of other packages:

Import files are text files containing:

The different GeoBUGS import formats are designed to follow as closely as possible the format in which Splus, ArcInfo and EpiMap export polygons, respectively. However, some manual editing of the polygon files exported from these various packages is also necessary before they can be read into GeoBUGS.

The following simple map is used to illustrate the different import formats. The map contains 3 areas (labelled area 1, area 2 and area 3); area 1 consists of 2 separate polygons, while areas 2 and 3 consist of one polygon each.

map:3

Xscale: 1000

Yscale: 1000

1 area1

2 area2

3 area3

area1 0 2

area1 1 2

area1 1 3

area1 0 3

NA NA NA

area1 2 1

area1 4 1

area1 4 3

area1 2 3

NA NA NA

area2 0 0

area2 2 0

area2 2 1

area2 0 1

NA NA NA

area3 2 0

area3 3 0

area3 3 1

area3 2 1

END

The first line contains the key word 'map' (lower case) followed by a colon and an integer, N, where N is the number of distinct areas in the map (note that one area can consist of more than one polygon). The 2nd and 3rd lines are

The next part of the import file is a 2 column list giving:

(column 1) the numeric ID of the area - this must be a unique integer between 1 and N; the areas should be labelled in the same order as the corresponding data for that area appears in the model.

(column 2) the area label - this must start with a character, and can be a maximum of 79 alphanumeric characters (no spaces allowed)

The final part of the import file is a 3 column list giving the co-ordinates of the polygons. The format is:

(col 1) the label of the area to which the polygon belongs

(col 2) x-coordinate

(col 3) y-xoordinate

The polygon coordinates can be listed either clockwise or anticlockwise. Polygons should be separated by a row of NA's

The import file should end with the key word: END

Note: Brad Carlin has a link on his web page to an Splus program called poly.S to extract polygons for any state in the United States in the appropriate format for loading into GeoBUGS (http://www.biostat.umn.edu/~brad/software.html)

map:3

Xscale: 1000

Yscale: 1000

1 area1

2 area2

3 area3

regions

99 area1

103 area1

210 area2

211 area3

END

99 0 0

0 2

1 2

1 3

0 3

END

103 0 0

2 1

4 1

4 3

2 3

END

210 0 0

0 0

2 0

2 1

0 1

END

211 0 0

2 0

3 0

3 1

2 1

END

END

The ArcInfo import file is in four parts:

The first 2 parts are the same as the Splus format.

The third part begins with a line containing the key word 'regions' (lower case). Below this is a 2 column list giving:

(column 1) an integer label corresponding to the integer label for one of the polygons listed in the final part of the import file. Each polygon should have a unique integer label, but this can be arbitrary (i.e. labels don't need to start at 1 or be consecutive). If using the ArcInfo command UNGENERATE to export a set of polygons, this is the integer label that ArcInfo automatically attaches to each polygon.

(column 2) the area label to which the polygon with that integer ID belongs. Note, if an area contains more than one polygon, then each polygon ID should be listed on a separate line and given the same area label (e.g., area1 in the above example).

There should be as many rows in this part of the file as there are polygons. This will be equal to or greater than the number of distinct areas in the map.

The final part of the import file gives the co-ordinates of the polygons. The format for each polygon is:

(row 1, column 1) the integer ID for the polygon (this should correspond to one of the integer IDs listed in the previous part of the import file).

(row 1, columns 2 and 3) if the polygon file has been exported directly from ArcInfo, these 2 numbers usually give the centroid of the polygon. However, they are not used by GeoBUGS, so can be arbitrary.

Subsequent rows contain a 2-column list of numbers giving the x- and y-coordinates of the poly. The polygon coordinates can be listed either clockwise or anticlockwise.

Polygons should be separated by a line containing the key word END.

The final row of the import file should also contain the key word END

map:3

Xscale: 1000

Yscale: 1000

1 area1

2 area2

3 area3

area1, 4

0, 2

1, 2

1, 3

0, 3

area1, 4

2, 1

4, 1

4, 3

2, 3

area2, 4

0, 0

2, 0

2, 1

0, 1

area3, 4

2, 0

3, 0

3, 1

2, 1

END

The Epimap import file is in three parts:

The first 2 parts are the same as the Splus format.

The third gives the polygon co-ordinates. The format for each polygon is:

(row 1, column 1) the label of the area to which the polygon belongs.

(row 1, column 2) the number of vertices in the polygon (note the comma separator)

Subsequent rows contain a 2-column list of numbers giving the x- and y-coordinates of the poly, separated by a comma. The polygon coordinates can be listed either clockwise or anticlockwise.

The final row of the import file should contain the key word END

GeoBUGS does not have an option for loading ArcView shape files directly. However, Ms Yue Cui at the University of Minnesota has written programs in Splus and R for converting shape files into the GeoBUGS Splus format so that they can be loaded in GeoBUGS (http://www.biostat.umn.edu/~yuecui/).

Open the polygon file as a separate text file in OpenBUGS and select the appropriate import option from the

Focus the window containing the map in GeoBUGS and select

GeoBUGS includes an option to produce a data file containing the adjacency matrix for any map loaded on the system. This file is in a format required by the car.normal , car.l1

click on

cause the specified region to be highlighted in red on the map; its neighbours (defined to

be any region adjacent to the red region) are highlighted in green. A region and its

neighbours can also be highlighted by positioning the mouse cursor over the required

region on the map and clicking with the left button.

form suitable for loading as data into OpenBUGS for use with the

distributions).

Note: when calculating which areas are adjacent to which others, GeoBUGS includes a

'tolerance' zone of 0.1 metres. This tolerance zone should not lead to spurious neighbours

unless you forget to appropriately scale your distance units in the polygon file using the

Xscale and Yscale options, or your map covers a tiny geographic region (in

which case, artificially re-scaling the distance units for your map should overcome any

problems).

To remove a region from the set of neighbours for a specific area:

red area;

longer be highlighted in green.

To add a region to the set of neighbours for a specific area:

red area

then become highlighted in green.

Once you have finished editing the set of neighbours for each region on your map, create the new adjacency matrix by clicking on the

Note: in order to produce a map of the mean or other summary statistic of the posterior distribution of a stochastic variable, you must have already set a samples or summary monitor for that parameter and have carried out some updates.

To produce a map:

shaded according to the values of the variable will now appear.

which you can select from the

- if you have monitored the variable by setting a

the

summary monitor;

- if you have monitored the variable by setting a

posterior sample), you can select any of the remaining options from the Quantity menu:

then type the required percentile in the box labelled

or equal to the specified threshold, which you must type in the box labelled

equal to the specified threshold, which you must type in the box labelled

When you have selected the quantity you want to map, click the

map.

each category on the map.

Fig 1. GeoBUGS map of SMRs for Scottish Lip Cancer data

GeoBUGS can work with two kinds of cut-points: absolute value cut-points and percentile cut-points. For absolute cut-points, GeoBUGS chooses a default set of breaks based on the absolute value of the variable to be mapped: these are chosen to give equally spaced intervals. For percentile cutpoints, GeoBUGS chooses the 10th, 50th and 90th percentiles of the empirical distribution of the variable to be mapped. The default shading is blue-scale.

To edit the colours for shading the map:

alternative colours that can be selected.

the currently selected map or click on the

on the

To edit the absolute value cutpoints:

from the menu labelled

etc.

To produce maps using cutpoints based on percentiles:

the

10th, 50th and 90th percentiles of the empirical distribution of values to be mapped.

- reselect the

- click on the

now be displayed in the

- click on the

absolute values of the cutpoints.

To use the same set of cutpoints for multiple maps:

displayed in the

Some current limitations:

The index, label and value of an individual area on the map can be found by placing the cursor over the area of interest on the map and clicking with the left mouse button. The index (i.e. ID number i of the area, where i=1,...,Number of areas), area label (given in the polygon file) and value of variable currently being mapped for the selected area will be shown in left of the grey bar at the bottom of the WinBUGS program window.

Maps produced using the GeoBUGS map tool can be copied and pasted into other Microsoft Windows software such as Word and PowerPoint. To select the map, click anywhere on the map window to focus it, then press the Ctrl and Space keys simultaneouly (a blue border should then appear around the figure); then select 'Copy' from the 'Edit' menu (or Crtl-C). Then paste into the appropriate Word or PowerPoint file etc. To save the map as a postscript file, you will need to install a postscript print driver on your PC, then select 'Print' from the 'File' menu, check the 'print to file' box, and then select 'Print'. You can also save the maps as OpenBUGS .odc documents; this will allow you to re-open the map within OpenBUGS and re-edit the cutpoints and colours if you wish.

See appendices for further tecnhical details about the various spatial distributions implemented in GeoBUGS 1.2.

The intrinsic Gaussian CAR prior distribution is specified using the distribution

S[1:N] ~ car.normal(adj[], weights[], num[], tau)

S[1:N] ~ car.l1(adj[], weights[], num[], tau)

where:

The first 3 arguments must be entered as data (it is currently not possible to allow the

for(j in 1:sumNumNneigh) { weights[j] <- 1}

where

Important things to check when using the

intrinsic CAR models in Appendix 1).

area itself does not appear in as one of its list of neighbours in the

and returns an error message if non-symmetric weights are detected.

choice.

constraint on the random effects. This means that you must also include a separate

intercept term in your model, which MUST be assigned an improper uniform prior using the

prior distributions, and not as a likelihood.

The proper Gaussian CAR prior distribution is specified using the distribution

S[1:N] ~ car.proper(mu[], C[], adj[], num[], M[], tau, gamma)

where:

min.bound(C[], adj[], num[], M[])

max.bound(C[], adj[], num[], M[])

where the arguments are the same as the corresponding vectors used as arguments to the

Important things to check when using the

unknown);

clumsy) way of creating the

alternatively, these can be created externally to

proper CAR priors in Appendix 1).

area itself does not appear in as one of its list of neighbours in the

check for this and returns an error message if lack of symmetry is detected.

choice.

the appropriate bounds. Besag, York and Mollie (1991) suggest that gamma needs to be

close to its upper bound before it reflects reasonable spatial dependence, so you may

want to try informative priors to represent this, and be prepared to carry out sensitivity

analysis.

Bayesian Gaussian kriging models (multivariate Gaussian distribution with covariance matrix expressed as a parametric function of distance between pairs of points - e.g. see Diggle, Tawn and Moyeed, 1998 and Appendix 1 ) can be specified using the distributions

S[1:N] ~ spatial.exp(mu[], x[], y[], tau, phi, kappa)

S[1:N] ~ spatial.disc(mu[], x[], y[], tau, alpha)

where:

Two options are available for specifying the form of the covariance matrix: the powered exponential function and the 'disc' function (see section on

The

The

Non-hierarchically centred

for (i in 1:N){

y[i] ~ dnorm(S[i], gamma)

mu[i] <- alpha+beta*z[i]

}

S[1:N] ~ spatial.exp(mu[], x[], y[], tau, phi,1)

Hierarchically centred

for (i in 1:N){

y[i] ~ dnorm(S[i], gamma)

S[i] <- alpha+beta*z[i] + W[i]

mu[i]<-0

}

W[1:N] ~ spatial.exp(mu[], x[], y[], tau, phi,1)

In some simulated examples, the non-hierarchically centred parameterisation has produced incorrect results, while the hierarchically centred parameterisation gives sensible answers. This may be a feature of the single-site updating schemes used in WinBUGS, so interpret your results with care!

(Thanks to Alan Gelfand, Shanshan Wu and Alex Schmidt for noting this problem).

Experience also suggests that there is often very little information in the data about the values of the parameters of the powered exponential (i.e. phi and kappa) or disc (i.e. alpha) functions. We therefore recommend that reasonably informative priors are used, or that the values are fixed a priori, based on either substantive knowledge or exploratory analysis using e.g. variograms.

Spatial interpolation or prediction at arbitrary locations can be carried out using the

Joint prediction:

T[1:P] ~ spatial.pred(mu.T[], x.T[], y.T[], S[])

Single site prediction:

for(j in 1:P) {

T[j] ~ spatial.unipred(mu.T[j], x.T[j], y.T[j], S[])

}

where:

A conjugate Poisson-gamma spatial moving average distribution can be specified for non-negative counts defined on a spatial lattice (i.e. discrete geographical partition), using the distribution

S ~ dpois.conv(mu[])

where

for (j in 1 : J) {

mu[j] <- gamma[j] * k[j]

gamma[j] ~ dgamma(a, b)

}

where

More typically, the distribution will be used for each element of a

for (i in 1:N) {

S[i] ~ dpois.conv(mu[i, ])

for (j in 1 : J) {

mu[i, j] <- gamma[j] * k[i, j]

}

}

where the latent gamma[j] parameters are defined as above, and k[,] is now an N x J matrix with elements k[i,j] representing the 'weight' or contribution of the latent gamma random variable at location j to the expected value of S at location i.

Conditional on mu, the S[i] have independent Poisson distributions with mean = S

Note that the model may be extended to include observed covariates as well as latent variables in the Poisson mean - see MVCAR Example.

The multivariate intrinsic Gaussian CAR prior distribution is specified using the distribution

S[1:p, 1:N] ~ mv.car(adj[], weights[], num[], omega[ , ])

where:

The first 3 arguments must be entered as data (it is currently not possible to allow the

for(j in 1:sumNumNneigh) { weights[j] <- 1}

where

Important things to check when using the

In one dimension, the intrinsic Gaussian CAR distribution reduces to a Gaussian random walk (see e.g.Fahrmeir and Lang, 2001). Assume we have a set of temporally correlated random effects q

q

~ Normal ( (q

~ Normal ( q

where q

q

where C

A second order random walk prior is defined as

q

~ Normal ( (2 q

~ Normal ( (-q

~ Normal ( (-q

~ Normal ( -q

Again this is equivalent to specifying

q

where C

Note that if the observed time points are not equally spaced, it is necessary to include missing values (NA) for the intermediate time points (see the Air Pollution Example ).

Appendix 1: Technical details of Structured Multivariate Gaussian and Conditional Autoregressive (CAR) models and hyperprior specification .

(Note: for clarity of exposition, we parameterise the Normal distribution in terms of the mean and

Assume we have a set of area-specific spatially correlated Gaussian data or random effects S

where

We may assume a parametric form for the elements of the between-area correlation matrix:

S

where d

f(d

The parameter f controls the rate of decline of correlation with distance:

f large - > rapid decay

f small - > slow decay

One possible strategy for specifying a prior for f is to choose a uniform distribution between f

The parameter k controls the amount by which spatial variations in the data are smoothed. Large values of k lead to greater smoothing, with k = 2 corresponding to the Gaussian correlation function (although the resulting covariance matrix is nearly singular). It is often difficult to learn much about this parameter, so unless there is a good reason for believing otherwise, it is usually advisable to set k = 1 a priori.

f(d

= 0 for d

The parameter a controls the rate of decline of correlation with distance:

a large - > slow decay

a small - > rapid decay

The disc function leads to an approximately linear decrease in correlation with increasing distance, with correlation declining to zero at a distance equal to a .

Exploratory analysis using variograms maybe be helpful to help chose an appropriate specification for the correlation function and associated parameters.

By writing the between-area covariance matrix in the following form:

v

g = controls overall strength of spatial dependence

and using standard multivariate normal theory (e.g. Johnson and Kotz, 1972; Besag and Kooperberg, 1995), the joint multivariate Gaussian model can be expressed in the form of a set of conditional distributions

S

(here S

From a modelling perspective, use of the joint formulation requires specification of the elements of the covariance matrix

*

g

*

Besag, York and Mollie (1991) propose an intrinsic version of this CAR model in which the covariance matrix

S

where S.bar

Since the CAR model defined above is improper (the overall mean of the S

We must also specify prior distributions for the overall variance parameter v. As usual in WinBUGS, the

If g is constrained to lie in the interval ( g

M

C

Assume we have a multivariate p-dimensional vector of spatially correlated Gaussian data or random effects in each area,

(here (

where

Gelfand and Vounatsou (2003) discuss a multivariate generalisation of the proper version of CAR distribution, but only the improper intrinsic multivariate CAR is currently implemented in WinBUGS.

Spatial moving average models are a flexible class of models that have been used to describe continuous spatial processes, particularly in geostatistical applications. Such models are constructed by integrating a simple two-dimensional random noise process (for example, a grid of iid Gaussian random variables) with a smoothing kernel that is a function of distance and, possibly, location. The kernel can be thought of as a device to `smear out' the random noise process in two-dimensional space to give a smooth surface.

Spatial moving average models have been developed primarily for continuous spatial processes and are currently not implementable in WinBUGS 1.4. However, Ickstadt and Wolpert (1998) and Best et al (2000b) proposed a discrete version of a gamma moving average process, for use in identity-link Poisson regression models. Suppose we have a set of area-specific spatially correlated Poisson count data (or random effects) S

S

The model for each l

l

Best et al (2000b) assume an isotropic, stationary Gaussian kernel function (although other kernel forms are easily accommodated, such as an adjacency-based kernel - see the 'Forest Example' ):

k

where t can be thought of as a scale factor for the l 's, d

One interpretation of Poisson-gamma moving average model is to view the gamma random variables as representing the location and magnitude of unmeasured risk factors, and the area-specific Poisson means l

The model may be extended to include observed covariates and an offset adjustment (e.g. to account for different populations in different areas). Observed covariates are included as additive terms in the linear predictor l

l

where X

l

p

Note that Best et al (2004) discuss how to standardise the offset N

Estimation of the

Besag, J., York, J. and Mollie, A. (1991). Bayesian image restoration, with two applications in spatial statistics.

Besag, J. and Kooperberg, C.L. (1995). On conditional and intrinsic autoregressions.

Best, N.G., Richardson, S. and Thomson, A. (2004). A comparison of Bayesian spatial models for disease mapping.

Best, N.G., Ickstadt, K.and Wolpert, R.L. and Briggs, D.J. (2000a).

Best, N.G., Ickstadt, K., Wolpert, R.L. and Briggs, D.J. (2000b). Combining models of health and exposure data: the SAVIAH study. In

Clayton, D.G and Kaldor, J. (1987). Empirical Bayes estimates of age-standardized relative risks for use in disease mapping.

Cressie, N.A. and Chan, N.H. (1989). Spatial modeling of regional variables.

Diggle, P.J., Tawn, J.A. and Moyeed, R.A. (1998). Model-based geostatistics.

Fahrmeir, L. and Lang, S. (2001). Bayesian inference for generlaized additive mixed models based on Markov random field priors.

Gelfand, A. and Vounatsou, P. (2003). Proper multivariate conditional autoregressive models for spatial data analysis.

Ickstadt, K. and Wolpert, R.L. (1998). Multiresolution assessment of forest inhomogeneity. In

Johnson, N.L. and Kotz, S. (1972).

Kelsall, J.E. and Wakefield, J.C. (1999). Discussion of "Bayesian models for spatially correlated disease and exposure data", by Best

Knorr-Held L. and Best N.G. (2001). A shared component model for joint and selective clustering of two diseases.

Mollie, A. (1996). Bayesian mapping of disease. In

Richardson, S. (1992). Statistical methods for geographical correlation studies. In

Shaddick, G. and Wakefield, J. (2002). Modelling daily multivariate pollutant data at multiple sites.

Stern, H.S. and Cressie, N.A. (1999). Inference for extremes in disease mapping. In

Wakefield, J.C., Best, N.G., and Waller, L.A. (2000). Bayesian Approaches to Disease Mapping. In

Wolpert, R.L. and Ickstadt, K. (1998). Poisson/Gamma random field models for spatial statistics.