Types of model
Models can be classified by:
A: Type (distribution) of response variable
 Continuous (normal)
 Binary or proportions (Bernoulli or binomial)
 Categorical – nominal (unordered multinomial)
 Categorical – ordinal (ordered multinomial)
 Counts (Poisson)
 Duration or survival

Continuous (normal)
Most introductory texts focus on continuous responses, e.g. exam scores (possibly normalised).
Resources
Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapters 2, 3 and 4).

Binary or proportions (Bernoulli or binomial)
Binary or dichotomous responses have two categories, usually coded 0 and 1. Sometimes binary data have been aggregated to give proportions. For example, in a study of unemployment using census data we would not usually have access to individuals’ employment status; instead we would have the unemployment rate (e.g. the proportion unemployed among those eligible to work) for geographical areas. When data are in the form of proportions, information on the number of individuals on which each proportion is based (the denominator) is required.
Logistic regression models can be applied to both binary data and proportions. The most commonly used logistic regression models are logit and probit models. In these models a nonlinear transformation of the probability of being in one of the response categories is modelled as a function of explanatory variables.
Examples
response categories of binary variable proportion Exam performance Pass, fail Pass rate in school Voting preference Party A, other party Proportion voting for party A in area Unemployment Unemployed, employed Unemployment rate in area Resources
Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 9).
Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 10).

Categorical – nominal (unordered multinomial)
Logistic regression models for binary responses can be extended to handle categorical responses that have more than two categories. The type of model we fit depends on whether these response categories are unordered (i.e. nominal) or ordered. The most commonly applied model for nominal responses is the multinomial logit model.
Examples
Voting preference: party A, party B, other.
Employment status: employed fulltime, employed parttime, unemployed, not in labour market.
Resources
Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 10).
Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 19).

Categorical – ordinal (ordered multinomial)
Logistic regression models for binary responses can be extended to handle categorical responses that have more than two categories. The type of model we fit depends on whether these response categories are unordered (i.e. nominal) or whether they can be considered ordered. Ordinal variables may be analysed using either a proportional odds model (also called a cumulative logit model) or a continuation odds model.
Examples
 Exam grade: A, B, C, D, fail.
 Attitudinal scale: strongly agree, agree, neutral, disagree, strongly disagree
 Severity of disease.
Resources
Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol.(Chapter 11).
Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 20).

Counts (Poisson)
Sometimes the response variable is a count of individuals in a particular state or the number of events of a particular type. These counts may refer to geographical areas or groups (e.g. defined by age and sex) within areas. In a Poisson model, we model the lograte which is achieved by including as an explanatory variable the log of the population size and constraining its coefficient to equal one (this term is called an offset). The population size would be the number of individuals who could have been in a particular state or the number at risk of the event of interest. A Poisson model is used rather than a logistic regression model when the population size is large or the event is rare.
Examples
rate count population ‘at risk’ Mortality Number of deaths Total population (perhaps within a particular age group) Teenage pregnancy Number of teenage pregnancies Number of teenagers Resources
Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 12).
Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 11).

Duration or survival
A duration or survival time response records the time at which some event occurs. A common feature of such data is right censoring where the event has not yet occurred to a proportion of the sample. Survival or event history models allow rightcensored observations to be retained in the analysis.
Examples
event duration Death Age or duration of treatment in a clinical trial Divorce Duration of marriage Unemployment Duration in employment Resources
Steele, F. (2005) Multilevel Discretetime Event History Analysis.
Training materials from workshop on multilevel event history analysis.
Steele, F. (2005) Event History Analysis. National Centre for Research Methods Review Paper NCRM/004
B: Type of data structure
 Hierarchical
 Hierarchical – repeated measures
 Hierarchical – multivariate
 Nonhierarchical – crossclassified
 Nonhierarchical – multiple membership
Note that a given data set may consist of a combination of these structures. For example, we may have repeated measures on students (a twolevel hierarchy) who are nested within a crossclassification of schools and neighbourhoods.

Hierarchical
The simplest and most common data structure is hierarchical. In a twolevel structure each lower level unit is nested within one higherlevel unit.
Examples
 Students (level 1) within schools (level 2)
 Individuals (level 1) within areas of residence (level 2)
Resources
Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapters 212).

Hierarchical  repeated measures
A special case of a hierarchical structure arises when the lowest level units are repeated measurements taken over time on a higherlevel unit. Repeated measures data may be analysed using a growth curve model where the time at each measurement occasion is included as an explanatory variable, e.g. as a polynomial function. Auto correlation between responses on a given individual can be incorporated using a multilevel time series model. For example, we might assume that the correlation between responses decays as a function of the distance between measurement occasions.
Examples
 Reading scores on students
 Voting intentions of individuals in successive waves of a panel study
 Health outcomes measured over time
Resources
Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 13).
Goldstein, H. (1998) Random coefficient repeated measure models. In Encyclopedia of Biostatistics (ed. P. Armitage and T. Colton). London: Wiley.
Goldstein, H., Healy, M.J.R. and Rasbash, J. (1994) Multilevel time series models with applications to repeated measures data. Statistics in Medicine, 13: 164355.

Hierarchical – multivariate
Multivariate responses arise when there are measurements on more than one variable for individuals, leading to a twolevel hierarchical structure with responses at level 1 nested within individuals at level 2.
One way of modelling multivariate responses is to specify a separate regression equation for each response, allowing for correlation between individuallevel (and possibly higherlevel) residuals across the responses. This type of model is commonly referred to as a multivariate response model. If the responses may be viewed as indicators of one or more unobserved (latent) construct, a factor model is usually more appropriate. In a factor model, the correlation between responses is assumed to be due to their common dependence on one or more latent variable or factor. A structural equation model is a generalisation of a factor model in which each factor may depend on explanatory variables and possibly other factors.
Examples
 Students’ scores in different subjects
 Scores on IQ test items
 Responses to items in a scale designed to measure attitude towards political and social issues
Resources
Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 14).
Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 16 and 18).

Nonhierarchical  crossclassified
The assumption that data structures are purely hierarchical is often a simplification. Individuals may belong to more than one grouping at a given hierarchical level. For example, one might be interested in exploring school and neighbourhood of residence effects on student attainment. School and neighbourhoods are nonnested because a school will typically have students from more than one neighbourhood, and children in a given neighbourhood will not all go to the same school; this leads to a crossclassified structure where children are nested within a crossclassification of school and neighbourhood. Another way of viewing the data structure is in terms of classifications rather than levels. In the above example, we have three classifications  child, school and neighbourhood – and in the multilevel model we would have residuals corresponding to each.
Examples
 Students crossclassified by primary and secondary school
 Patients crossclassified by general practitioners and hospitals
 Survey respondents crossclassified by sampling cluster and interviewer
Resources
Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 18).
Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 13).
Rasbash, J. and Goldstein, H. (1994) Efficient analysis of mixed hierarchical and cross classified random structures using a multilevel model. Journal of Educational and Behavioural Statistics, 19: 337350.
Raudenbush, S.W. (1993) A crossed random effects model for unbalanced data with applications in crosssectional and longitudinal research. Journal of Education Statistics, 18: 321350.

Nonhierarchical  multiple membership
Multiple membership models are used in situations where level 1 units belong to two or more higherlevel units. In a longitudinal study of school students, for example, children may change school and thus ‘belong’ to more than one school. In a multiple membership model, a student receives a weighted combination of residuals from all the schools attended, where the weights might be proportional to the length of time spent at the different schools.
Examples
 Household dynamics where individuals move between households over time
 Individuals moving between areas over time
Resources
Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 19).
Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 14).
Browne, W.J., Goldstein, H. and Rasbash, J. (2001) Multiple membership multiple classification (MMMC) models. Statistical Modelling, 1: 103124.
Goldstein, H., Rasbash, J., Browne, W. and Woodhouse, G. (2000) Multilevel models in the study of dynamic household structures. European Journal of Population, 16: 373387.
C: Variance structure
 Random intercept (variance components) models
 Random slope models
 Complex level 1 variation (heteroskedasticity)
 Autocorrelation (time series models)

Random intercept (variance components) models
The simplest multilevel model has a single residual term for each level (or classification in a nonhierarchical model). For example, in a model for school effects on student attainment there would be student (level 1) residuals and school (level 2) residuals. This has the effect of partitioning the residual variance into a betweenschool and withinschool component, which is why this model is often referred to as a variance components model. The model is also called a random intercept model because only the intercept term in the regression equation is assumed to vary randomly across schools. The effects of explanatory variables, such as prior attainment, are assumed to be the same for each school. A plot of the predicted school regression lines would therefore show a set of parallel lines, one for each school.
Resources
Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapters 24).

Random slope models
An often unrealistic assumption of the random intercept model is that the effects of explanatory variables are constant across higher level units. In educational research, for example, we might expect that schools differ in their effect on attainment at age 16 (y) according to students’ prior attainment (x). In other words, rather than being parallel as in the random intercept model, school prediction lines for the relationship between x and y will have different slopes. This is achieved by specifying two residuals at the school level: a intercept residual and a slope residual, which may be correlated. A random slope for x also implies that the betweenschool variance depends on x. Random slope models are more generally known as random coefficient models.
Resources
Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapters 4 and 7).

Complex level 1 variation (heteroskedasticity)
In a random slope model the coefficient of one or more explanatory variable (x) varies randomly across higherlevel units. For example, the effect of prior attainment (x) on performance at age 16 (y) may vary across schools. A random slope model also implies that the betweenschool variance is a quadratic function of x. The variance between students within a school, however, is assumed to be constant (homoskedastic). This assumption may be unreasonable. It is often observed, for example, that boys vary more than girls in their attainment, i.e. there may be heteroskedasticity at the student level. A model which allows the level 1 variance to depend on explanatory variables is called a complex level 1 variance model.
Resources
Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 7)

Autocorrelation (time series models)
In a standard twolevel model the pair wise correlation between the responses for two level 1 units selected at random from the same level 2 unit is assumed to be the same whichever pair is selected. This assumption of exchangeability may be questionable for repeated measures data. For example, we may expect the correlation between responses to decay as a function of the distance between measurement occasions, leading to autocorrelated residuals at level 1. Autocorrelation can be incorporated using a multilevel time series model. For example, we might assume that the correlation between responses decays as a function of the distance between measurement occasions.
Resources
Goldstein, H., Healy, M.J.R. and Rasbash, J. (1994) Multilevel time series models with applications to repeated measures data. Statistics in Medicine, 13: 164355.
D: Other

Measurement error
Many measurements are made with error, especially in the social and biological sciences. If the measurement were to be repeated we would not expect always to get an identical result. In some cases, such as the measurement of height and weight, these errors should be small. In other cases, for example educational tests and attitude measures, this will not usually be true and a failure to ignore errors in the measurement of explanatory variables may lead to incorrect inferences.
Methods have been developed to adjust for measurement error in multilevel models. Currently models that allow for error in continuous explanatory variables have been implemented in MLwiN. Methods for handling measurement error in categorical variables, usually referred to as misclassification error, are not yet available in MLwiN but are the subject of a current research project.
Resources
Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 12).
Goldstein, H. (2003) Multilevel Statistical Models. 3rd edition. London: Arnold. (Chapter 13).
Woodhouse, G., Yang, M., Goldstein, H. and Rasbash, J. (1996) Adjusting for measurement errors in multilevel analysis. Journal of the Royal Statistical Society, Series A 159: 201212.

Missing data
In most studies the data are incomplete because some questions were unanswered by a subset of the respondents. Little and Rubin (1987) distinguish between three nonresponse mechanisms: missing completely at random, missing at random and missing not at random (also called nonignorable nonresponse). A variable Y is said to be missing completely at random if missingness does not depend on any other variable. Missing at random refers to the situation where missingness depends on some other variables X, such that within cells defined by X the data are missing completely at random. Y is missing not at random if the unobserved values depend on the values of Y itself.
In the case of multivariate or repeated measures data, individuals with missing responses can be incorporated in the analysis without any special procedures, provided they have at least one response and responses can be assumed missing at random. (If MCMC estimation is used, an additional step is required to generate values for the missing data  see Browne (2003) for details.)
Missing values on explanatory variables can be ‘filled in’ using multiple imputation, again under a missing at random assumption. Carpenter and Goldstein (2004) describe a procedure for fitting the multilevel model of interest combined with a multilevel model for imputing missing data. Their procedure has been implemented in MLwiN via macros.
Resources
Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 16).
Carpenter and Goldstein (2004). Multiple imputation using MLwiN. Multilevel Modelling Newsletter, 16(2) (PDF, 298kB).
Goldstein, H. (2003) Multilevel Statistical Models. 3rd edition. London: Arnold. (Chapter 14).
Little, R.J.A. and Rubin, D.B. (1987). Statistical Analysis with Missing Data. New York: John Wiley.
Support for researchers with incomplete data and MLwiN macros for carrying out multiple imputation are provided at http://www.missingdata.org.uk.

Spatial models
A general definition for spatial data is information collected at a number of sites together with some measure of the location of those sites. If we have multiple observations for each site, we can view the data as having a twolevel hierarchical structure with sites at the higher level and fit a standard hierarchical model, with individuals at level 1 and sites at level 2. If any of these sites are contiguous, we may also wish to allow for the effects of neighbouring sites on an individual’s response, i.e. spatial correlation. One way to account for spatial correlation is to fit a multiple membership model in which an individual receives a weighted combination of residuals from all neighbouring sites. Another approach is to use a conditional autoregressive (CAR) model.
Resources
Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 15).
Browne, W.J., Goldstein, H. and Rasbash, J. (2001) Multiple membership multiple classification (MMMC) models. Statistical Modelling, 1: 103124.
Note: some of the documents on this page are in PDF format. In order to view a PDF you will need Adobe Acrobat Reader