Types of model

Models can be classified by:

Type (distribution) of response variable
Type of data structure
Variance structure
Other

A: Type (distribution) of response variable

Continuous (normal)
Binary or proportions (Bernoulli or binomial)
Categorical – nominal (unordered multinomial)
Categorical – ordinal (ordered multinomial)
Counts (Poisson)
Duration or survival

Continuous (normal)

Most introductory texts focus on continuous responses, e.g. exam scores (possibly normalised).

Resources

Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapters 2, 3 and 4).

Binary or proportions (Bernoulli or binomial)

Binary or dichotomous responses have two categories, usually coded 0 and 1. Sometimes binary data have been aggregated to give proportions. For example, in a study of unemployment using census data we would not usually have access to individuals’ employment status; instead we would have the unemployment rate (e.g. the proportion unemployed among those eligible to work) for geographical areas. When data are in the form of proportions, information on the number of individuals on which each proportion is based (the denominator) is required.

Logistic regression models can be applied to both binary data and proportions. The most commonly used logistic regression models are logit and probit models. In these models a nonlinear transformation of the probability of being in one of the response categories is modelled as a function of explanatory variables.

Examples

response	categories of binary variable	proportion
Exam performance	Pass, fail	Pass rate in school
Voting preference	Party A, other party	Proportion voting for party A in area
Unemployment	Unemployed, employed	Unemployment rate in area

Resources

Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 9).

Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 10).

(Back to top)

Categorical – nominal (unordered multinomial)

Logistic regression models for binary responses can be extended to handle categorical responses that have more than two categories. The type of model we fit depends on whether these response categories are unordered (i.e. nominal) or ordered. The most commonly applied model for nominal responses is the multinomial logit model.

Examples

Voting preference: party A, party B, other.

Employment status: employed full-time, employed part-time, unemployed, not in labour market.

Resources

Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 10).

Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 19).
(Back to top)

Categorical – ordinal (ordered multinomial)

Logistic regression models for binary responses can be extended to handle categorical responses that have more than two categories. The type of model we fit depends on whether these response categories are unordered (i.e. nominal) or whether they can be considered ordered. Ordinal variables may be analysed using either a proportional odds model (also called a cumulative logit model) or a continuation odds model.

Examples
- Exam grade: A, B, C, D, fail.
- Attitudinal scale: strongly agree, agree, neutral, disagree, strongly disagree
- Severity of disease.
Resources

Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol.(Chapter 11).

Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 20).
Counts (Poisson)

Sometimes the response variable is a count of individuals in a particular state or the number of events of a particular type. These counts may refer to geographical areas or groups (e.g. defined by age and sex) within areas. In a Poisson model, we model the log-rate which is achieved by including as an explanatory variable the log of the population size and constraining its coefficient to equal one (this term is called an offset). The population size would be the number of individuals who could have been in a particular state or the number at risk of the event of interest. A Poisson model is used rather than a logistic regression model when the population size is large or the event is rare.

Examples

rate count population ‘at risk’

Mortality Number of deaths Total population (perhaps within a particular age group)

Teenage pregnancy Number of teenage pregnancies Number of teenagers

Resources

Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 12).

Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 11).
(Back to top)

Duration or survival

A duration or survival time response records the time at which some event occurs. A common feature of such data is right censoring where the event has not yet occurred to a proportion of the sample. Survival or event history models allow right-censored observations to be retained in the analysis.

Examples

event duration

Death Age or duration of treatment in a clinical trial

Divorce Duration of marriage

Unemployment Duration in employment

Resources

Steele, F. (2005) Multilevel Discrete-time Event History Analysis.

Training materials from workshop on multilevel event history analysis.

Steele, F. (2005) Event History Analysis. National Centre for Research Methods Review Paper NCRM/004

rate	count	population ‘at risk’
Mortality	Number of deaths	Total population (perhaps within a particular age group)
Teenage pregnancy	Number of teenage pregnancies	Number of teenagers

event	duration
Death	Age or duration of treatment in a clinical trial
Divorce	Duration of marriage
Unemployment	Duration in employment

(Back to top)

Note that a given data set may consist of a combination of these structures. For example, we may have repeated measures on students (a two-level hierarchy) who are nested within a cross-classification of schools and neighbourhoods.

Hierarchical

The simplest and most common data structure is hierarchical. In a two-level structure each lower level unit is nested within one higher-level unit.

Examples
- Students (level 1) within schools (level 2)
- Individuals (level 1) within areas of residence (level 2)
Resources

Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapters 2-12).
(Back to top)

Hierarchical - repeated measures

A special case of a hierarchical structure arises when the lowest level units are repeated measurements taken over time on a higher-level unit. Repeated measures data may be analysed using a growth curve model where the time at each measurement occasion is included as an explanatory variable, e.g. as a polynomial function. Auto correlation between responses on a given individual can be incorporated using a multilevel time series model. For example, we might assume that the correlation between responses decays as a function of the distance between measurement occasions.

Examples
- Reading scores on students
- Voting intentions of individuals in successive waves of a panel study
- Health outcomes measured over time
Resources

Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 13).

Goldstein, H. (1998) Random coefficient repeated measure models. In Encyclopedia of Biostatistics (ed. P. Armitage and T. Colton). London: Wiley.

Goldstein, H., Healy, M.J.R. and Rasbash, J. (1994) Multilevel time series models with applications to repeated measures data. Statistics in Medicine, 13: 1643-55.
Hierarchical – multivariate

Multivariate responses arise when there are measurements on more than one variable for individuals, leading to a two-level hierarchical structure with responses at level 1 nested within individuals at level 2.

One way of modelling multivariate responses is to specify a separate regression equation for each response, allowing for correlation between individual-level (and possibly higher-level) residuals across the responses. This type of model is commonly referred to as a multivariate response model. If the responses may be viewed as indicators of one or more unobserved (latent) construct, a factor model is usually more appropriate. In a factor model, the correlation between responses is assumed to be due to their common dependence on one or more latent variable or factor. A structural equation model is a generalisation of a factor model in which each factor may depend on explanatory variables and possibly other factors.

Examples
- Students’ scores in different subjects
- Scores on IQ test items
- Responses to items in a scale designed to measure attitude towards political and social issues
Resources

Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 14).

Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 16 and 18).
(Back to top)

Non-hierarchical - cross-classified

The assumption that data structures are purely hierarchical is often a simplification. Individuals may belong to more than one grouping at a given hierarchical level. For example, one might be interested in exploring school and neighbourhood of residence effects on student attainment. School and neighbourhoods are non-nested because a school will typically have students from more than one neighbourhood, and children in a given neighbourhood will not all go to the same school; this leads to a cross-classified structure where children are nested within a cross-classification of school and neighbourhood. Another way of viewing the data structure is in terms of classifications rather than levels. In the above example, we have three classifications - child, school and neighbourhood – and in the multilevel model we would have residuals corresponding to each.

Examples
- Students cross-classified by primary and secondary school
- Patients cross-classified by general practitioners and hospitals
- Survey respondents cross-classified by sampling cluster and interviewer
Resources

Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 18).

Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 13).

Rasbash, J. and Goldstein, H. (1994) Efficient analysis of mixed hierarchical and cross classified random structures using a multilevel model. Journal of Educational and Behavioural Statistics, 19: 337-350.

Raudenbush, S.W. (1993) A crossed random effects model for unbalanced data with applications in cross-sectional and longitudinal research. Journal of Education Statistics, 18: 321-350.
Non-hierarchical - multiple membership

Multiple membership models are used in situations where level 1 units belong to two or more higher-level units. In a longitudinal study of school students, for example, children may change school and thus ‘belong’ to more than one school. In a multiple membership model, a student receives a weighted combination of residuals from all the schools attended, where the weights might be proportional to the length of time spent at the different schools.

Examples
- Household dynamics where individuals move between households over time
- Individuals moving between areas over time
Resources

Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 19).

Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 14).

Browne, W.J., Goldstein, H. and Rasbash, J. (2001) Multiple membership multiple classification (MMMC) models. Statistical Modelling, 1: 103-124.

Goldstein, H., Rasbash, J., Browne, W. and Woodhouse, G. (2000) Multilevel models in the study of dynamic household structures. European Journal of Population, 16: 373-387.

(Back to top)

(Back to top)

C: Variance structure

Random intercept (variance components) models
Random slope models
Complex level 1 variation (heteroskedasticity)
Autocorrelation (time series models)

Random intercept (variance components) models

The simplest multilevel model has a single residual term for each level (or classification in a non-hierarchical model). For example, in a model for school effects on student attainment there would be student (level 1) residuals and school (level 2) residuals. This has the effect of partitioning the residual variance into a between-school and within-school component, which is why this model is often referred to as a variance components model. The model is also called a random intercept model because only the intercept term in the regression equation is assumed to vary randomly across schools. The effects of explanatory variables, such as prior attainment, are assumed to be the same for each school. A plot of the predicted school regression lines would therefore show a set of parallel lines, one for each school.

Resources

Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapters 2-4).
Random slope models

An often unrealistic assumption of the random intercept model is that the effects of explanatory variables are constant across higher level units. In educational research, for example, we might expect that schools differ in their effect on attainment at age 16 (y) according to students’ prior attainment (x). In other words, rather than being parallel as in the random intercept model, school prediction lines for the relationship between x and y will have different slopes. This is achieved by specifying two residuals at the school level: a intercept residual and a slope residual, which may be correlated. A random slope for x also implies that the between-school variance depends on x. Random slope models are more generally known as random coefficient models.

Resources

Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapters 4 and 7).
Complex level 1 variation (heteroskedasticity)

In a random slope model the coefficient of one or more explanatory variable (x) varies randomly across higher-level units. For example, the effect of prior attainment (x) on performance at age 16 (y) may vary across schools. A random slope model also implies that the between-school variance is a quadratic function of x. The variance between students within a school, however, is assumed to be constant (homoskedastic). This assumption may be unreasonable. It is often observed, for example, that boys vary more than girls in their attainment, i.e. there may be heteroskedasticity at the student level. A model which allows the level 1 variance to depend on explanatory variables is called a complex level 1 variance model.

Resources

Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 7)
(Back to top)

Autocorrelation (time series models)

In a standard two-level model the pair wise correlation between the responses for two level 1 units selected at random from the same level 2 unit is assumed to be the same whichever pair is selected. This assumption of exchangeability may be questionable for repeated measures data. For example, we may expect the correlation between responses to decay as a function of the distance between measurement occasions, leading to autocorrelated residuals at level 1. Autocorrelation can be incorporated using a multilevel time series model. For example, we might assume that the correlation between responses decays as a function of the distance between measurement occasions.

Resources

Goldstein, H., Healy, M.J.R. and Rasbash, J. (1994) Multilevel time series models with applications to repeated measures data. Statistics in Medicine, 13: 1643-55.

(Back to top)

D: Other

Measurement error
Missing data
Spatial models

Measurement error

Many measurements are made with error, especially in the social and biological sciences. If the measurement were to be repeated we would not expect always to get an identical result. In some cases, such as the measurement of height and weight, these errors should be small. In other cases, for example educational tests and attitude measures, this will not usually be true and a failure to ignore errors in the measurement of explanatory variables may lead to incorrect inferences.

Methods have been developed to adjust for measurement error in multilevel models. Currently models that allow for error in continuous explanatory variables have been implemented in MLwiN. Methods for handling measurement error in categorical variables, usually referred to as misclassification error, are not yet available in MLwiN but are the subject of a current research project.

Resources

Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 12).

Goldstein, H. (2003) Multilevel Statistical Models. 3rd edition. London: Arnold. (Chapter 13).

Woodhouse, G., Yang, M., Goldstein, H. and Rasbash, J. (1996) Adjusting for measurement errors in multilevel analysis. Journal of the Royal Statistical Society, Series A 159: 201-212.
Missing data

In most studies the data are incomplete because some questions were unanswered by a subset of the respondents. Little and Rubin (1987) distinguish between three nonresponse mechanisms: missing completely at random, missing at random and missing not at random (also called non-ignorable nonresponse). A variable Y is said to be missing completely at random if missingness does not depend on any other variable. Missing at random refers to the situation where missingness depends on some other variables X, such that within cells defined by X the data are missing completely at random. Y is missing not at random if the unobserved values depend on the values of Y itself.

In the case of multivariate or repeated measures data, individuals with missing responses can be incorporated in the analysis without any special procedures, provided they have at least one response and responses can be assumed missing at random. (If MCMC estimation is used, an additional step is required to generate values for the missing data - see Browne (2003) for details.)

Missing values on explanatory variables can be ‘filled in’ using multiple imputation, again under a missing at random assumption. Carpenter and Goldstein (2004) describe a procedure for fitting the multilevel model of interest combined with a multilevel model for imputing missing data. Their procedure has been implemented in MLwiN via macros.

Resources

Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 16).

Carpenter and Goldstein (2004). Multiple imputation using MLwiN. Multilevel Modelling Newsletter, 16(2) (PDF, 298kB).

Goldstein, H. (2003) Multilevel Statistical Models. 3rd edition. London: Arnold. (Chapter 14).

Little, R.J.A. and Rubin, D.B. (1987). Statistical Analysis with Missing Data. New York: John Wiley.

Support for researchers with incomplete data and MLwiN macros for carrying out multiple imputation are provided at http://www.missingdata.org.uk.
(Back to top)

Spatial models

A general definition for spatial data is information collected at a number of sites together with some measure of the location of those sites. If we have multiple observations for each site, we can view the data as having a two-level hierarchical structure with sites at the higher level and fit a standard hierarchical model, with individuals at level 1 and sites at level 2. If any of these sites are contiguous, we may also wish to allow for the effects of neighbouring sites on an individual’s response, i.e. spatial correlation. One way to account for spatial correlation is to fit a multiple membership model in which an individual receives a weighted combination of residuals from all neighbouring sites. Another approach is to use a conditional autoregressive (CAR) model.

Resources

Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.10. Centre for Multilevel Modelling, University of Bristol. (Chapter 15).

Browne, W.J., Goldstein, H. and Rasbash, J. (2001) Multiple membership multiple classification (MMMC) models. Statistical Modelling, 1: 103-124.

Note: some of the documents on this page are in PDF format. In order to view a PDF you will need Adobe Acrobat Reader

(Back to top)

Types of model

A: Type (distribution) of response variable

Continuous (normal)

Resources

Binary or proportions (Bernoulli or binomial)

Examples

Resources

Categorical – nominal (unordered multinomial)

Examples

Resources

Categorical – ordinal (ordered multinomial)

Examples

Resources

Counts (Poisson)

Examples

Resources

Duration or survival

Examples

Resources

B: Type of data structure

Hierarchical

Examples

Resources

Hierarchical - repeated measures

Examples

Resources

Hierarchical – multivariate

Examples

Resources

Non-hierarchical - cross-classified

Examples

Resources

Non-hierarchical - multiple membership

Examples

Resources

C: Variance structure

Random intercept (variance components) models

Resources

Random slope models

Resources

Complex level 1 variation (heteroskedasticity)

Resources

Autocorrelation (time series models)

Resources

D: Other

Measurement error

Resources

Missing data

Resources

Spatial models

Resources