The Avon local group provides an opportunity for statisticians in Bath and Bristol to meet and discuss statistical methods and their applications in areas of local interest, including healthcare, education, and the environment.
Flyer for the course (pdf)
Traditionally, Bayesian inference for general models has been based on computationally expensive Monte Carlo simulation. However, for large classes of models this is unnecessary, as they can be written on a form that allows the use of direct optimisation and numerical integration, which can be substantially and fundamentally faster, as well as more accurate. This tutorial will show how this can be applied to a wide class of generalised linear models, via the R-INLA software package (www.r-inla.org).
From a Bayesian perspective, many generalised linear models (including but not limited to mixed models, survival models and spatio-temporal models) can be formulated as Latent Gaussian Models (LGMs), where all fixed and random effects are joined into a single multivariate Gaussian vector, including fixed effects, random effects, as well as nonlinear covariate effects, spatial effects, and any other effects.
In the typical case, the observations are conditionally independent given the latent Gaussian vector, with some Gaussian or non-Gaussian likelihood, e.g. Poisson counts. Such latent models allow the use of Laplace approximations for approximating the posterior distributions of the latent variables, in combination with numerical integration over other unknown parameters, such as observation noise variance. The whole procedure is called Integrated Nested Laplace Approximation (INLA).
The tutorial will consist of lectures explaining the fundamentals of Bayesian modelling with LGMs and INLA, and demonstrations of how to estimate models using R-INLA, with examples taken from environmental science, medicine, and epidemiology. Some previous experience with linear models is needed.
About the instructor: Finn Lindgren obtained his PhD degree in mathematical statistics at the Centre for Mathematical Sciences at Lund University. His main research interests are computational methods for Bayesian inference, spatial modelling, Gaussian Markov random fields and stochastic partial differential equations, with applications in geostatistics and climate modelling. He is among the developers of the statistical software INLA which aims to perform fast inference on Bayesian hierarchical models.
The meeting will take place in room 4E 3.38, University of Bath (see http://www.bath.ac.uk/maps/ for a map).
The meeting is open to all and free of charge. For more information and to register please contact: email@example.com
Donald Hedeker is Professor of Biostatistics at the University of Illinois, Chicago. He is an international leader in the development and application of multilevel models for longitudinal and clustered data. He is the primary author of the popular ‘Longitudinal Data Analysis’ graduate textbook and the SuperMix mixed-effects software. He is a Fellow of the American Statistical Association, associate editor for Statistics in Medicine and the Journal of Statistical Software, and the author of numerous peer-reviewed papers.
Abstract: For longitudinal data, mixed models include random subject effects to indicate how subjects influence their responses over the repeated assessments. The error variance and the variance of the random effects are usually considered to be homogeneous. These variance terms characterize the within-subjects (error variance) and between-subjects (random-effects variance) variation in the data. In studies using Ecological Momentary Assessment (EMA), up to thirty or forty observations are often obtained for each subject, and interest frequently centers around changes in the variances, both within- and between-subjects. Also, such EMA studies often include several waves of data collection.
In this presentation, we focus on an adolescent smoking study using EMA at both one and several measurement waves, where interest is on characterizing changes in mood variation associated with smoking. We describe how covariates can influence the mood variances, and also describe an extension of the standard mixed model by adding a subject-level random effect to the within-subject variance specification. This permits subjects to have influence on the mean, or location, and variability, or (square of the) scale, of their mood responses. Additionally, we allow the location and scale random effects to be correlated. These mixed-effects location scale models have useful applications in many research areas where interest centers on the joint modeling of the mean and variance structure.
Abstract: It is common practice in genome-wide association studies (GWAS) and their meta-analyses to focus on the relationship between disease risk and SNPs one genetic variant at a time. In this way, it is possible to implicate biological pathways and sometimes modifiable environmental exposures that are likely to be involved in disease aetiology. However, single genetic variants typically only explain small amounts of disease risk and hence may be difficult to identify in GWAS. An alternative approach is to construct allelic scores of potentially hundreds of thousands of genetic variants that proxy environmental exposures and/or biological intermediates, and then use these scores to data mine genome-wide association studies and/or perform Mendelian Randomization analysis. In this talk I will discuss the merits of this approach, its potential problems, and will present some data where the method has been applied.
Abstract: The first recorded lung transplant was performed in the UK in 1986 and activity has steadily increased over the years. As the donor pool is drawn from the general population, it would be expected to include donors with a history of smoking. However, the risk that a positive smoking history in lung donors could adversely affect post-transplant survival causes concern. Conversely, reduction of the donor pool by exclusion of such donors could compromise survival of patients on the transplant list. Using risk-adjusted sequentially stratified Cox regression modelling, this study examined the effect of donor smoking on 3-year post-transplant survival and the potential impact of not transplanting lungs from such donors.
For more information please contact: E.Evangelou@maths.bath.ac.uk
Young Statisticians Working in the Social Sciences (PDF, 0.08 mb)
by Donald Rubin
Direct and Indirect Effects: An Unhelpful Distinction? (PDF, 0.3 mb)
The terminology of direct and indirect causal effects is relatively common in causal conversation as well in some more formal language. In the context of real statistical problems, however, I do not think that the terminology is helpful for clear thinking, and rather leads to confused thinking. This presentation will present several real examples where this point arises, as well as one that illustrates even the great Sir Ronald Fisher was vulnerable to such confusion.
For Objective Causal Inference, Design Trumps Analysis (PDF, 0.7 mb)
For obtaining causal inferences that are objective, and therefore have the best chance of revealing scientific truths, carefully designed and executed randomized experiments are generally considered to be the gold standard. Observational studies, in contrast, are generally fraught with problems that compromise any claim for objectivity of the resulting causal inferences. The thesis here is that observational studies have to be carefully designed to approximate randomized experiments, in particular, without examining any final outcome data. Often a candidate data set will have to be rejected as inadequate because of lack of data on key covariates, or because of lack of overlap in the distributions of key covariates between treatment and control groups, often revealed by careful propensity score analyses. Sometimes the template for the approximating randomized experiment will have to be altered, and the use of principal stratification can be helpful in doing this. These issues are discussed and illustrated using the framework of potential outcomes to define causal effects, which greatly clarifies critical issues.
by Donald Rubin
Are Job-Training Programs Effective? (PDF, 0.4 mb)
In recent years, job-training programmes have become increasingly important in many developed countries with rising unemployment. It is widely accepted that the best way to evaluate such programmes is to conduct randomized experiments. With these, among a group of people who indicate that they want job-training, some are randomly assigned to be offered the training and the others are denied it, at least initially. According to a well-defined protocol, outcomes such as employment statuses or wages for those who are employed are then measured for those who were offered the training and compared to the same outcomes for those who were not.
Despite the high cost of these experiments, their results can be difficult to interpret because of inevitable complications when doing experiments with humans. Three in particular are that some people do not comply with their assigned treatment, others drop out of the experiment before outcomes can be measured, and others who stay in the experiment are not employed, and thus their wages are not cleanly defined.
Statistical analyses of such data can lead to important policy decisions, and yet the analyses typically deal with only one or two of these complications, which may obfuscate subtle effects. An analysis that simultaneously deals with all three complications generally provides more accurate conclusions.
by Nicky Welton
Abstract: We present models for the combined analysis of evidence from randomized controlled trials categorized as being at either low or high risk of bias due to a flaw in their conduct. We formulate a bias model that incorporates between-study and between-meta-analysis heterogeneity in bias, and uncertainty in overall mean bias.We obtain algebraic expressions for the posterior distribution of the bias-adjusted treatment effect, which provide limiting values for the information that can be obtained from studies at high risk of bias. The parameters of the bias model can be estimated from collections of previously published meta-analyses. We explore alternative models for such data, and alternative methods for introducing prior information on the bias parameters into a new meta-analysis. Results from an illustrative example show that the bias-adjusted treatment effect estimates are sensitive to the way in which the meta-epidemiological data are modelled, but that using point estimates for bias parameters provides an adequate approximation to using a full joint prior distribution. A sensitivity analysis shows that the gain in precision from including studies at high risk of bias is likely to be low, however numerous or large their size, and that little is gained by incorporating such studies, unless the information from studies at low risk of bias is limited.We discuss approaches that might increase the value of including studies at high risk of bias, and the acceptability of the methods in the evaluation of health care interventions.
by Sofia Dias
Abstract: There is good empirical evidence that specific flaws in the conduct of randomised controlled trials are associated with exaggeration of treatment effect estimates. Mixed Treatment Comparison (MTC) meta-analysis, which combines data from trials on several treatments that form a network of comparisons, has the potential to both estimate bias parameters within the synthesis, and to produce bias-adjusted estimates of treatment effects. We present a hierarchical model for bias with common mean across treatment comparisons of active treatment vs control. It is often unclear, from the information reported, whether a study is at risk of bias or not. We extend our model to estimate the probability that a particular study is biased, where the probabilities for the "unclear" studies are drawn from a common beta distribution. We illustrate these methods with a synthesis of 130 trials on 4 fluoride treatments and two control interventions for the prevention of dental caries in children. Whether there is adequate allocation concealment and/or blinding are considered as indicators of whether a study is at risk of bias. Bias adjustment reduces the estimated relative efficacy of the treatments and the extent of between-trial heterogeneity.
Metrics, research award grades, and the REF (PowerPoint, 0.6 mb)
Estimating academic performance using research grant award gradings with general relevance to the use of metrics in judging research quality
Abstract: In the UK and elsewhere, there has been considerable debate about the use of quantitative indicators, known as 'metrics', to judge research performance of individual academics and university departments. This talk reports an analysis of the grades awarded to research grant applications made by ESRC, to explore the extent to which such information can be used for this purpose. Results suggest that the usefulness of these data is limited and also that there are similar important limitations associated with other metrics such as those based on journal paper citation indices.
Mark Gittoes, Hannah White and David Mawdsley (Analytical Services Group, HEFCE)
Abstract: HEFCE are currently developing a new system for examining the quality of research in UK HEIs. The primary focus of the system will be to identify excellent research of all kinds and this will be assessed through a process of expert review, informed by citation information in subjects where robust data are available. This talk reports on two technical aspects of citation information which will help inform HEFCE's consultations on the process.
An important part of any approach to bibliometrics is benchmarking. The citation count of a piece of work, by itself, tells us little about it; citations accumulate with time, and rates of citation vary substantially between academic disciplines. In the first aspect discussed, we will look at the approaches to benchmarking (or normalising) citation counts against cohorts of similar outputs. Conventionally an output is benchmarked against the mean citation score of its cohort. We will discuss an alternative approach using percentiles.
In the second aspect, we will discuss an equalities analysis that was carried out using citation information. This considered the question of whether early career researchers (ECRs) were more or less likely than non-ECRs to be 'highly cited' on two citation databases. The analysis also considered other equality areas such as sex, ethnicity and disability and used statistical models to compare staff on a like for like basis.
Abstract: I will describe a general statistical framework for interpreting computer experiments, taking account of the limitations of the simulator (ie the computer model), and some particular implementations. I will also describe how 'emulators' can be used in experiments with expensive simulators, as often occur in environmental applications. The key feature of an emulator is that it allows us to augment the information from simulator evaluations with expert judgements about the simulator, such as those concerning smoothness, predictability, monotonicity, and so on.
Vincent Garreta (CEREGE)
Abstract: Inversion is the statistical inference of inputs of computer model based on some data. This is needed for (a) calibrating the model's parameters which values are not derivable from theory, and (b) reconstructing the inputs that produced some data through a process partly mimicked by the computer model. Due to its black-box nature, the computer model imposes strong and original constraints for modelling and inference.
Process-based modelling appears in case (a) because it can be preferable to use (stat) process modelling to fill the gap between models in/outputs and data. It case (b) process-based modelling is often made possible by including one or more computer models.
We present the dimensions of the inversion problem and a challenging case study in palaeoclimatology where the inversion of a vegetation model provides climate reconstructions based on pollen data.
Hugo Maruri-Aguilar (Queen Mary)
Abstract: When modelling a computer experiment, the deviation between model and simulation data is due only to the bias (discrepancy) between the model for the computer experiment and the deterministic (albeit complicated) computer simulation. For this reason, replications in computer experiments add no extra information and the experimenter is more interested in efficiently exploring the design region.
I'll present a survey of designs useful for exploring the design region and for modelling computer simulations.
An overview of the ALSPAC data resource (PowerPoint, 0.2 mb)
David Herrick: ALSPAC is a cohort study of parents and their children in the former county of Avon. The study children were born in 1991-92 to mothers resident in Avon. David described the various data sources (including self-completion, hands-on assessments, biological samples, DNA, health records, educational tests and school data, and linked datasets) and sub-studies.
Parental income and child outcomes (PowerPoint, 0.7 mb)
Liz Washbrook used data from the ALSPAC cohort to explore the association between family income and children's cognitive ability (IQ and school performance), socio-emotional outcomes (self esteem, locus of control and behavioural problems) and physical health (risk of obesity). The team (Paul Gregg, Carol Propper and Liz Washbrook) compared and contrasted the degree to which different aspects of low income children's environments can predict the observed deficits in a range of outcomes. Some specific mechanisms that were highlighted were maternal smoking and breastfeeding, child nutrition, parental psychological functioning and the home learning environment.
James Carpenter gave an introduction to missing data. We sketched the issues raised by missing data (item non-response),and outlined a principled approach for the analysis of a partially observed data set, reviewed common jargon, and finally outlined the ideas behind multiple imputation.
Missing data – issues and extensions (PowerPoint, 0.2 mb)
Harvey Goldstein's presentation showed some of the limitations of standard multiple imputation techniques for handling missing data. The standard approach to multiple imputation rests heavily upon normality assumptions which become questionable for binary or multicategorical variables. In addition, for multilevel structures, it is necessary to incorporate the structure at the imputation stage. We showed how categorical (ordered and unordered) variables with missing data can be handled and also multilevel data.
Organiser: Alex Hudson (firstname.lastname@example.org)
Abstract: In England, so-called 'league tables' based upon examination results and test scores are published annually, ostensibly to inform parental choice of secondary schools. A crucial limitation of these tables is that the most recent published information is based on the current performance of a cohort of pupils who entered secondary schools several years earlier, whereas for choosing a school it is the future performance of the current cohort that is of interest. We show that there is substantial uncertainty in predicting such future performance and that incorporating this uncertainty leads to a situation where only a handful of schools' future performances can be separated from both the overall mean and from one another with an acceptable degree of precision. This suggests that school league tables, including value-added ones, have very little to offer as guides to school choice.
by Kate Thomas, PhD student, Department of Social Medicine, University of Bristol
Abstract: Levels of the proinflammatory cytokine interleukin-18 (IL-18) are raised in old age and are associated with reduced physical functioning. Studies have identified a polymorphism in the IL-18 gene that is strongly associated with raised circulating IL-18 levels. This variant has previously been associated with reduced locomotor performance in old age, but the finding requires independent replication. We examined the association between the IL-18 polymorphism rs5744256 and physical functioning in three cohorts with a total of 4,107 subjects aged 60-85 years: English Longitudinal Study of Ageing, Caerphilly and Boyd Orr. We also meta-analysed the results with data from the original paper to report on this association: the Iowa-EPESE and InCHIANTI cohorts. Physical functioning was assessed by timed walks or the get up and go test. As locomotor performance tests differed between cohorts and the distributions of times to complete the test (in seconds) were positively skewed, we used the reciprocal transformation and computed study-specific z-scores. This presentation will further describe the methods and give the results of the analyses.
A Matching Algorithm for Paired Living Kidney Donation in the UK (PowerPoint, 0.5 mb)
by Joanne Allen, Senior Statistician, NHS Blood and Transplant, Bristol
Abstract: Blood group or tissue type incompatibility prevents many potential living donor kidney transplants. From September 2006, the Human Tissue Act enabled paired donation in the UK, whereby incompatible donor-recipient pairs can now exchange kidneys so that recipients can receive alternative compatible living donor organs. The British Transplantation Society and UK Transplant worked together to develop the arrangements for paired donation in the UK. Simulation programs were written in SAS, and the simulations were carried out using real data. Several stages were involved in the simulation process - identifying the possible two-way (paired) exchanges, identifying the possible combinations of two-way exchanges and then identifying the optimum combination. The simulations were used to determine effective waiting list sizes, the effectiveness of different matching and prioritisation factors and the likely chance of transplant for different types of patient. This presentation will describe the simulation process used to develop the paired donation scheme in the UK.
Why is Mixture Modelling so Popular? (PDF, 0.4 mb)
by Tony Robinson (University of Bath)
Abstract: Over the last decade or so, interest in using mixtures of statistical models for all types of applications has risen rapidly from both the frequentist and Bayesian points of view. What is so appealing about such models? Are they always good news? Mainly through examples, I shall discuss the reasons for this interest and try to point out what you might gain and what you might lose from adopting a mixture modelling approach.
Estimating redshift for a mixture of types of galaxy (PDF, 1.9 mb)
by Merrilee Hurn (University of Bath)
Abstract: The Sloan Digital Sky Survey (SDSS) is an extremely large astronomical survey conducted with the intention of mapping more than a quarter of the sky (http://www.sdss.org/). Among the data it is generating are spectroscopic and photometric measurements, both containing information about the redshift of galaxies. We consider a Bayesian mixture approach to providing uncertainty bounds associated with the underlying redshifts and the classifications of the galaxies. This is joint work with Peter Green and Fahima Al-Awadhi.
Can Pay regulation kill? The effect of regulated pay on NHS performance (PowerPoint,1.3 mb)
Abstract: Labour market regulation can have harmful unintended consequences. In markets, especially for public sector workers, pay is regulated to be the same for workers across heterogeneous labour markets. We would predict that this will mean labour supply problems and potential falls in the quality of service provision in areas with stronger labour markets. In this paper we exploit panel data from the population of English acute hospitals where pay for medical staff is almost flat across geographies. We predict that areas with higher outside wages should suffer from problems of recruiting, retaining and motivating workers and this should harm hospital performance. We construct hospital-level panel data on both quality as measured by death rates (within hospital deaths within thirty days of emergency admission for acute myocardial infarction, AMI) and productivity. We present evidence that stronger local labour markets significantly worsen hospital outcomes in terms of quality and productivity. A 10% increase in the outside wage is associated with a 4% to 8% increase in AMI death rates. We find that an important part of this effect operates through hospitals in high outside wage areas having to rely more on temporary “agency staff” as they are unable to increase (regulated) wages in order to attract permanent employees. We quantify the magnitudes of these “hidden costs” of labour market regulation, which appear to be substantial.
Evidence-based decision making. How would you like that done, Minister? (PowerPoint, 2.0 mb)
by Tony Ades
Abstract: It is well-established health care decisions should be based on evidence, and that the evidence-base should be generated by a pre-defined protocol. Of course, it is not really the evidence that drives the decision, but a model, and the model is informed by the evidence.
What are the principles of evidence-based modelling? It does not seem controversial to suggest that a Minister of Health considering a national policy decision, for example on screening for chlamydia, should expect : (1) that the model is based on all the available evidence, and (2) that all the available evidence is consistent. (3) The Minister also needs to know the probability that a decision based on the currently available evidence is wrong, and (4) whether or not to ask for more information before making a decision that would change current policy.
We look at current modelling methods and practices, and some illustrative examples, to see how close they are to this ideal.
Abstract: There is increasing empirical evidence that particular flaws in the conduct of randomized controlled trials (RCTs) lead to bias in intervention effect estimates, as well as increasing between-trial heterogeneity. This literature is based on meta-epidemiological studies, in which collections of meta-analyses are used to examine associations between trial characteristics and intervention effect estimates. I will review this empirical evidence, describe a project to combine it in a single database, and describe statistical methods that might use existing evidence to correct for bias in new meta-analyses.
Genetic meta-analysis and mendelian randomization (PowerPoint, 0.5 mb)
Abstract: Genetic association studies constitute a huge and growing component of observational epidemiological data. They have particular value in being able to contribute to understanding causal associations, but have a poor history of replication. Particular issues with respect to genetic associations are the small effect sizes (requiring large sample sizes) but also a general lack of confounding and bias. This makes meta-analysis of genetic association studies more similar to meta-analyses of randomised controlled trials then meta-analyses in many observational settings. These issues will be discussed, as will the application of meta-analyses of genetic association studies to understanding the causal nature of environmental exposures.
Note: some of the documents on this page are in PDF format. In order to view a PDF you will need Adobe Acrobat Reader