The Avon local group of the Royal Statistical Society

Mission

The Avon local group provides an opportunity for statisticians in Bath and Bristol to meet and discuss statistical methods and their applications in areas of local interest, including healthcare, education, and the environment.

Committee members: Karim Anaya-Izquierdo, Nicole Augustin, Sofia Dias, Evangelos Evangelou, Tobias Kley, Richard Parker, Chris Rogers, Sandipan Roy, Adrian Sayers, Theresa Smith, and Yi Yu.

To subscribe to our mailing list, please send an email to sympa@lists.bath.ac.uk with the subject line subscribe avon_rss

To view our privacy policy, please see Avon RSS Privacy Notice (pdf)

RSS Avon local group workshop: Challenges in local air pollution modelling

Wednesday, 8th May 2019 13:15, University of Bath 4W 1.2

Please see the workshop webpage and registration form for further details.

Course: Introducing modern Generalized Additive Models

Wednesday 27th June 2018 9:00 - 17:00, Bristol

Please see this link for further information about this course.

Meeting: RSS Environmental Statistics Afternoon

Monday 21st May 2018 13:30 - 17:00, Bath

Please see this link for further information about this meeting.

To register your attendance at this meeting, please enter your details on this online form.

Meeting: Statistical methods for intensively measured health data

Monday 18 January 2016 14:00 - 17:00

The University of Bath is part of a GW4 consortium looking at Analysis of Intensively Collected Health Data.

Dr Nicole Augustin, from the Department of Mathematical Sciences, is the consortium lead at the University of Bath and is hosting the next meeting on Monday 18 January 2016. Afternoon talks are open to everyone, but please register in advance if you would like to attend. Visit Nicole's GW4 meeting page to view the full programme at http://people.bath.ac.uk/nha20/basta/GW4meeting.html

Speakers at the meeting include:

Sonja Greven (Ludwig-Maximilians-University Munich) - Regression with Functional Data with Applications in Bioprocess Monitoring and Pig Sensor Data
Marc Fiecas (University of Warwick) - Discrimination and Classification of Multivariate Time Series, with Applications to Sensor Data
Vanessa Didelez (University of Bristol) - Causal inference with intensively measured data

To attend please register at: http://www.eventbrite.co.uk/e/gw4-consortium-meeting-statistical-methods-for-intensively-measured-health-data-18-january-2016-tickets-19654157113

Please contact the IMI Co-ordinator Dr Catrin Yeomans at imi@bath.ac.uk for further information.

"Registry Studies in Organ Donation and Transplantation"

Dave Collett, NHS Blood and Transplant

University of Bath , 12 May 2015, 14:15-15:15

Flyer for May meeting (PDF, 77kB)

Much of the evidence used to inform the development of organ transplantation in the UK comes from analyses of observational data that make up a centrally maintained registry. In this talk, a brief overview of the nature and uses of registry data will be presented, with particular reference to the work of NHS Blood and Transplant and the UK Transplant Registry. The use of this Registry in exploring the association between donor smoking and lung transplant outcomes, the development of a new liver allocation scheme, and in comparing kidney transplant outcomes from three different countries will be described and illustrated.

About the speaker: Dave obtained his first degree at the University of Leicester, before going on to complete an MSc in statistics at the University of Newcastle and a PhD in statistics at the University of Hull. He spent over 25 years as lecturer and senior lecturer in the Department of Applied Statistics at the University of Reading, including 8 years as Head of that Department. In 2003, he was appointed Associate Director of Statistics and Clinical Studies at NHS Blood and Transplant, and is also now the Director of the NHSBT Clinical Trials Unit. He also has a visiting chair in the Southampton Statistical Sciences Research Institute, University of Southampton. Dave leads a team of close to 40 staff, who work on registry based studies in organ donation and transplantation, clinical trials and other studies in transfusion medicine and stem cell transplantation. He has published many papers in peer-reviewed journals and is the author of text books on modelling binary data and modelling survival data in medical research – a third edition of which has recently been published.

The meeting will take place in room 4W 1.7, University of Bath (see www.bath.ac.uk/maps for a map).

The meeting is open to all and free of charge. For more information and to
register please contact: E.Evangelou@maths.bath.ac.uk

"This code is a complete hack, may or may not work, etc.." - The Challenges of Validating R

Aimee Gott, Mango Solutions Ltd.

University of Bath , 17 March 2015, 14:15-15:15

Flyer for meeting (PDF, 74kB)

Whilst R usage has grown hugely in recent years the use of R in regulated industries, such as the pharmaceutical industry, is still limited. The R core team has provided documentation as guidance for the use of R in such industries, though R still comes with ``absolutely no warranty'' and there is no formal documentation related to the many additional packages available on CRAN. To comply with FDA guidelines these add on packages must be validated along with core and recommended packages. Mango was first asked to validate a version of R in 2009. The growth of R and the number of companies wishing to validate R has led to a steady stream of R validations at Mango in recent months. In this talk we will consider some of the challenges that we have faced in validating R packages and discuss some of the tools that we have developed to aid the process.

The meeting will take place in room 4W 1.17, University of Bath (see http://www.bath.ac.uk/maps/ for a map).

The meeting is open to all and free of charge. For more information and to register please contact: E.Evangelou@maths.bath.ac.uk

Direct Bayesian Inference for Generalised Linear Models: Short Course

Finn Lindgren, University of Bath

University of Bath , 15 May 2013, 13:15-16:30

Flyer for the course (PDF, 126kB) (pdf)

Traditionally, Bayesian inference for general models has been based on computationally expensive Monte Carlo simulation. However, for large classes of models this is unnecessary, as they can be written on a form that allows the use of direct optimisation and numerical integration, which can be substantially and fundamentally faster, as well as more accurate. This tutorial will show how this can be applied to a wide class of generalised linear models, via the R-INLA software package (www.r-inla.org).

From a Bayesian perspective, many generalised linear models (including but not limited to mixed models, survival models and spatio-temporal models) can be formulated as Latent Gaussian Models (LGMs), where all fixed and random effects are joined into a single multivariate Gaussian vector, including fixed effects, random effects, as well as nonlinear covariate effects, spatial effects, and any other effects.

In the typical case, the observations are conditionally independent given the latent Gaussian vector, with some Gaussian or non-Gaussian likelihood, e.g. Poisson counts. Such latent models allow the use of Laplace approximations for approximating the posterior distributions of the latent variables, in combination with numerical integration over other unknown parameters, such as observation noise variance. The whole procedure is called Integrated Nested Laplace Approximation (INLA).

The tutorial will consist of lectures explaining the fundamentals of Bayesian modelling with LGMs and INLA, and demonstrations of how to estimate models using R-INLA, with examples taken from environmental science, medicine, and epidemiology. Some previous experience with linear models is needed.

About the instructor: Finn Lindgren obtained his PhD degree in mathematical statistics at the Centre for Mathematical Sciences at Lund University. His main research interests are computational methods for Bayesian inference, spatial modelling, Gaussian Markov random fields and stochastic partial differential equations, with applications in geostatistics and climate modelling. He is among the developers of the statistical software INLA which aims to perform fast inference on Bayesian hierarchical models.

The meeting will take place in room 4E 3.38, University of Bath (see http://www.bath.ac.uk/maps/ for a map).

The meeting is open to all and free of charge. For more information and to register please contact: e.evangelou@maths.bath.ac.uk

Previous meetings and materials

26 February 2013: Seminar with Professor Don Hedeker

Mood Changes Associated with Smoking in Adolescents: Applications of Mixed-Effects Location Scale Models for Cross-Sectional and Longitudinal Ecological Momentary Assessment (EMA) Data (PDF, 400kB) (pdf)

Donald Hedeker is Professor of Biostatistics at the University of Illinois, Chicago. He is an international leader in the development and application of multilevel models for longitudinal and clustered data. He is the primary author of the popular ‘Longitudinal Data Analysis’ graduate textbook and the SuperMix mixed-effects software. He is a Fellow of the American Statistical Association, associate editor for Statistics in Medicine and the Journal of Statistical Software, and the author of numerous peer-reviewed papers.

Abstract: For longitudinal data, mixed models include random subject effects to indicate how subjects influence their responses over the repeated assessments. The error variance and the variance of the random effects are usually considered to be homogeneous. These variance terms characterize the within-subjects (error variance) and between-subjects (random-effects variance) variation in the data. In studies using Ecological Momentary Assessment (EMA), up to thirty or forty observations are often obtained for each subject, and interest frequently centers around changes in the variances, both within- and between-subjects. Also, such EMA studies often include several waves of data collection.

In this presentation, we focus on an adolescent smoking study using EMA at both one and several measurement waves, where interest is on characterizing changes in mood variation associated with smoking. We describe how covariates can influence the mood variances, and also describe an extension of the standard mixed model by adding a subject-level random effect to the within-subject variance specification. This permits subjects to have influence on the mean, or location, and variability, or (square of the) scale, of their mood responses. Additionally, we allow the location and scale random effects to be correlated. These mixed-effects location scale models have useful applications in many research areas where interest centers on the joint modeling of the mean and variance structure.

10-Dec-2012: Medical Statistics

Dave Evans - MRC Centre for Causal Analyses in Translational Epidemiology

Using Genome-wide allelic scores in Data Mining and Mendelian Randomization Studies

Abstract: It is common practice in genome-wide association studies (GWAS) and their meta-analyses to focus on the relationship between disease risk and SNPs one genetic variant at a time. In this way, it is possible to implicate biological pathways and sometimes modifiable environmental exposures that are likely to be involved in disease aetiology. However, single genetic variants typically only explain small amounts of disease risk and hence may be difficult to identify in GWAS. An alternative approach is to construct allelic scores of potentially hundreds of thousands of genetic variants that proxy environmental exposures and/or biological intermediates, and then use these scores to data mine genome-wide association studies and/or perform Mendelian Randomization analysis. In this talk I will discuss the merits of this approach, its potential problems, and will present some data where the method has been applied.

Rhiannon Taylor- NHS Blood and Transplant

The smoking donor - hazard or benefit to the potential lung transplant recipient? (ppt)

Abstract: The first recorded lung transplant was performed in the UK in 1986 and activity has steadily increased over the years. As the donor pool is drawn from the general population, it would be expected to include donors with a history of smoking. However, the risk that a positive smoking history in lung donors could adversely affect post-transplant survival causes concern. Conversely, reduction of the donor pool by exclusion of such donors could compromise survival of patients on the transplant list. Using risk-adjusted sequentially stratified Cox regression modelling, this study examined the effect of donor smoking on 3-year post-transplant survival and the potential impact of not transplanting lungs from such donors.

25-May-12

Group Sequential and Adaptive Methods for Clinical Trials: Short Course and Seminar (DOCX, 0.95 mb)

For more information please contact: E.Evangelou@maths.bath.ac.uk

28-Mar-12

Young Statisticians Working in the Social Sciences (PDF, 82kB)

The effect of different incentive strategies in a national pilot study of the over-50s in Ireland, by Mark Hanly, Department of Economics, University of Bristol

Abstract:This experiment explores the area of survey incentivisation, drawing on data from a nationally representative face-to-face survey. Two components of incentivisation are examined: the monetary amount offered and the manner in which it is issued. Moreover, we test whether persons of lower SES are disproportionately attracted by monetary incentives. The experiment focuses on 376 Irish households who received a request to partake in The Irish Longitudinal Study on Ageing (TILDA) pilot study. Households were randomly assigned to one of two incentive levels (€10/€25). Refusers at the €10 level were offered a further €15 for their participation, meaning that the same incentive amount was ultimately available to both groups. Response rates were significantly higher in the €25 group (61.4%) compared to the €10 group before (34.6%) and after (53.1%) the remaining €15 was offered. There was some evidence that individuals who agreed to participate after the offer of the additional €15 had a lower educational profile than those who participated for €10. The results suggest that while issuing the highest available incentive from the offset achieves the highest response rate, offering a lower incentive initially, which can be raised in the case of refusers, may be a cost effective method for survey organisations with limited funds.

Using linked census and survey data to identify bias in measures of child's home background and the implications for school evaluation, by Robert French, Graduate School of Education, University of Bristol

Abstract:Value added models are used to evaluate schools, comparing results while controlling for differences in the prior attainment and background characteristics of the school’s intake. The measures used to control for a child’s home background in these models, whether a child receives a free school meal (FSM) is an administrative binary proxy for a complex set of underlying variables. This research investigates to what extent this proxy is valid and identifies the bias arising from its use. The research uses two linked data sets. The National Pupil Database (NPD) which has limited information (including FSM) for the population of students and the Longitudinal Study of Young People in England (LSYPE) which has rich background variables (including parents’ income, education and occupation) for a sample of the population. This research uses the linkage between the two data sets to identify the effect of using better home background characteristics from LSYPE compared with the proxy available in the NPD. The research finds some justification for the proxy, the FSM threshold coincides with the most significant division in household income for educational attainment and it has significant explanatory power in the school evaluation model. However, there is also evidence of measurement error and bias arising from the use of FSM, further we find this proxy captures only a very small proportion of the true variation in scores resulting from the home background of a school’s intake.

Genetics and environment of intelligence sub-tests, by Rebecca Pillinger, Graduate School of Education, University of Bristol

Abstract:Using data from the US National Collaborative Perinatal Project, differences between individuals at age 7 on the different scales within the Wechsler Intelligence Scale for Children IQ test (WISC) are decomposed into genetic, shared (family) environmental and non-shared environmental components. The possibility that the magnitude of these components may change across measures of socio-economic status is then investigated.

24-Nov-11

Bayesian model specification: towards a Theory of Applied Statistics (PDF, 512kB) by David Draper, Department of Applied Mathematics and Statistics, University of California, Santa Cruz, USA
Abstract: The Bayesian approach to statistical inference, prediction and decision-making has a simple structure, with one equation for each of these three fundamental activities, and it can be shown to be based on a straightforward logical progression from principles to axioms to a foundational theorem with corollaries. However, this approach requires the user to specify two ingredients (usually called the prior and sampling distributions) for inference and prediction, and two more ingredients (an action space and a utility function) for decision-making, and as a profession we lack the same kind of logical progression from principles to axioms to theorems that would constitute optimal specification of these four ingredients (by "optimal" here I mean "coming as close as possible to the goal of {conditioning only on true/false propositions that are rendered true by the context of the problem and the design of the data-gathering activity}"). Successfully developing such a logical progression would yield a Theory of Applied Statistics, which we need and do not yet have. In this talk I'll explore the extent to which four principles (Calibration, Modeling-As-Decision, Prediction, and Decision-Versus-Inference) constitute progress toward this goal. Abstract - further details (PDF, 104kB)

10-May-11: Applications of Structural Modelling and Genetics to Forensics and Epidemiology

Sensitivity of inferences in forensic genetics to assumptions about founding genes (PDF, 1,508kB) by Peter Green(Department of Mathematics University of Bristol)
- Abstract: Many forensic genetics problems can be handled using structured systems of discrete variables, for which Bayesian networks offer an appealing practical modelling framework, and allow inferences to be computed very quickly by probability propagation methods. However, when standard assumptions are violated – for example when allele frequencies are unknown, there is identity by descent or the population is heterogeneous – dependence is generated among founding genes, that makes exact calculation of conditional probabilities by propagation methods less straightforward. Here we illustrate methodologies for assessing sensitivity to assumptions about founders in forensic genetics problems. We illustrate these methods on several forensic genetics examples involving criminal identification, simple and complex disputed paternity and DNA mixtures.
Using Generalised Method of Moments for Estimation and Testing of Structural Mean Models with Multivalued and/or Multiple Instruments (PDF, 118kB) by Frank Windmeijer, Department of Economics, University of Bristol
- Abstract: We show how Structural Mean Models (SMM), used for the estimation of causal effects in the presence of confounding, can be estimated by the Generalised Method of Moments (GMM) estimator. It is then straightforward to apply the GMM estimator to additive, multiplicative and logistic SMMs with multivalued and/or multiple instruments using standard software, like Stata and R. The GMM estimator produces efficient projections of the instruments and correct asymptotic inference. The validity of the SMM model assumptions can further be tested using the standard Hansen J-test. We apply the GMM estimator to determine the causal effect of adiposity on hypertension, using the genetic markers FTO and MC4R as instruments for adiposity.

(Back to top)

10-Feb-10 Theme: Young Statisticians working in Medical Research

Statistical Image Analysis in Cone-beam Computed Tomography (PDF, 2,733kB) by Susan Doshi, Chris Jennison (University of Bath, Department of Mathematical Sciences), Cathy HallUniversity Hospitals Bristol NHS Foundation Trust, Radiotherapy Physics Unit)
- Abstract: In cone-beam computed tomography (CBCT), a three-dimensional representation of the patient is reconstructed from two-dimensional X-ray projections acquired at a range of angles round the patient. CBCT is becoming increasingly popular in radiotherapy imaging, where it is used to verify that the organ to be treated is in the desired position. Small metallic markers may be implanted in the patient to improve the accuracy with which an organ can be located; however these markers may cause artefacts in the three-dimensional reconstruction. We will present a statistical analysis of images containing markers, resulting in estimates of their locations and reconstructions with fewer artefacts elsewhere in the image.
Hierarchical modeling of performance indicators, with application to MRSA and teenage conception rates (PDF, 320kB) by Hayley Jones
- Abstract: Healthcare providers are now regularly compared with each other based on measures of their "performance". Examples include MRSA bacteraemia rates in NHS Trusts and teenage conception rates in Local Authorities. I will discuss the motivation for fitting hierarchical models to such data, which provide shrinkage estimates of the performance of each provider. Using various measures of forecasting accuracy, the improved short-term predictive ability of these estimates compared with crude observed measures will be demonstrated for the two examples. Such performance data are often collected repeatedly at regular intervals over time, but the data from each time period tend to be analysed independently. I will discuss a hierarchical longitudinal model, in which the underlying standardised performance is assumed to follow an autoregressive process with lag 1 (AR(1)) over time. It will be shown that the resulting smoothing of observations over time periods led to a further improvement in predictive ability for the two examples.
Multilevel spline models for blood pressure changes in pregnancy (PowerPoint, 1.2 mb), by Corrie MacDonald-Wallis
- Abstract: High blood pressure occurring after 20 weeks of gestation is part of the diagnostic criteria for pre-eclampsia, which is associated with increased risk of maternal and perinatal morbidity and mortality. To investigate the pattern of blood pressure change during normal pregnancy we fitted fractional polynomial and multilevel linear spline models to repeated antenatal blood pressure measurements of women in the Avon Longitudinal Study of Parents and Children (ALSPAC). The linear spline models were also used to assess associations of risk factors for pre-eclampsia with patterns of blood pressure change, and the extension of these models to examine relationships between weight and blood pressure changes in pregnancy using a multivariate response will be discussed.
Mendelian randomization: using genetic data in epidemiological studies, by Tom Palmer
- Abstract: Mendelian randomization represents the use of genotypes as instrumental variables in epidemiological studies. It is potentially powerful because instrumental variable approaches can overcome problems such as unmeasured confounding which can bias observational estimates. In this talk I outline the Mendelian randomization approach, its history, and give an example application investigating the effect of body mass index on the risk of Ischaemic Heart Disease in the Copenhagen General Population Study.

03-Nov-10: Statistical aspects of modelling the relationship between obesity and physical activity

ALSPAC Update - physical activity and obesity (PDF, 1,071kB)
Dealing with missing data in physical activity models (PDF, 201kB) Alex Griffiths The Avon Longitudinal Study of Parents and Children (ALSPAC)
Longitudinal relationships between fat mass and physical activity (PDF, 476kB) Kate Tilling, Alex Griffiths, Sam Leary
‌ Nicole Augustin, in collaboration with Callum Mattocks, Chris Riddoch, Andy Ness and Julian Faraway

11-Oct-10 - Causal inference

by Donald Rubin

Direct and Indirect Effects: An Unhelpful Distinction? (PDF, 330kB)

The terminology of direct and indirect causal effects is relatively common in causal conversation as well in some more formal language. In the context of real statistical problems, however, I do not think that the terminology is helpful for clear thinking, and rather leads to confused thinking. This presentation will present several real examples where this point arises, as well as one that illustrates even the great Sir Ronald Fisher was vulnerable to such confusion.

For Objective Causal Inference, Design Trumps Analysis (PDF, 740kB)

For obtaining causal inferences that are objective, and therefore have the best chance of revealing scientific truths, carefully designed and executed randomized experiments are generally considered to be the gold standard. Observational studies, in contrast, are generally fraught with problems that compromise any claim for objectivity of the resulting causal inferences. The thesis here is that observational studies have to be carefully designed to approximate randomized experiments, in particular, without examining any final outcome data. Often a candidate data set will have to be rejected as inadequate because of lack of data on key covariates, or because of lack of overlap in the distributions of key covariates between treatment and control groups, often revealed by careful propensity score analyses. Sometimes the template for the approximating randomized experiment will have to be altered, and the use of principal stratification can be helpful in doing this. These issues are discussed and illustrated using the framework of potential outcomes to define causal effects, which greatly clarifies critical issues.

London lecture (12-Oct-10): Methodological issues in the evaluation of a job-training programme

by Donald Rubin

Are Job-Training Programs Effective? (PDF, 353kB)

In recent years, job-training programmes have become increasingly important in many developed countries with rising unemployment. It is widely accepted that the best way to evaluate such programmes is to conduct randomized experiments. With these, among a group of people who indicate that they want job-training, some are randomly assigned to be offered the training and the others are denied it, at least initially. According to a well-defined protocol, outcomes such as employment statuses or wages for those who are employed are then measured for those who were offered the training and compared to the same outcomes for those who were not.

Despite the high cost of these experiments, their results can be difficult to interpret because of inevitable complications when doing experiments with humans. Three in particular are that some people do not comply with their assigned treatment, others drop out of the experiment before outcomes can be measured, and others who stay in the experiment are not employed, and thus their wages are not cleanly defined.

Statistical analyses of such data can lead to important policy decisions, and yet the analyses typically deal with only one or two of these complications, which may obfuscate subtle effects. An analysis that simultaneously deals with all three complications generally provides more accurate conclusions.

(Back to top)

25-May-10: Topics on evidence synthesis in medicine

Models for potentially biased evidence in meta-analysis using empirically based priors (PDF, 287kB)

by Nicky Welton

Abstract: We present models for the combined analysis of evidence from randomized controlled trials categorized as being at either low or high risk of bias due to a flaw in their conduct. We formulate a bias model that incorporates between-study and between-meta-analysis heterogeneity in bias, and uncertainty in overall mean bias.We obtain algebraic expressions for the posterior distribution of the bias-adjusted treatment effect, which provide limiting values for the information that can be obtained from studies at high risk of bias. The parameters of the bias model can be estimated from collections of previously published meta-analyses. We explore alternative models for such data, and alternative methods for introducing prior information on the bias parameters into a new meta-analysis. Results from an illustrative example show that the bias-adjusted treatment effect estimates are sensitive to the way in which the meta-epidemiological data are modelled, but that using point estimates for bias parameters provides an adequate approximation to using a full joint prior distribution. A sensitivity analysis shows that the gain in precision from including studies at high risk of bias is likely to be low, however numerous or large their size, and that little is gained by incorporating such studies, unless the information from studies at low risk of bias is limited.We discuss approaches that might increase the value of including studies at high risk of bias, and the acceptability of the methods in the evaluation of health care interventions.

Estimation and adjustment of bias in randomised evidence using mixed treatment comparison meta-analysis (PowerPoint, 1.3 mb)

by Sofia Dias

Abstract: There is good empirical evidence that specific flaws in the conduct of randomised controlled trials are associated with exaggeration of treatment effect estimates. Mixed Treatment Comparison (MTC) meta-analysis, which combines data from trials on several treatments that form a network of comparisons, has the potential to both estimate bias parameters within the synthesis, and to produce bias-adjusted estimates of treatment effects. We present a hierarchical model for bias with common mean across treatment comparisons of active treatment vs control. It is often unclear, from the information reported, whether a study is at risk of bias or not. We extend our model to estimate the probability that a particular study is biased, where the probabilities for the "unclear" studies are drawn from a common beta distribution. We illustrate these methods with a synthesis of 130 trials on 4 fluoride treatments and two control interventions for the prevention of dental caries in children. Whether there is adequate allocation concealment and/or blinding are considered as indicators of whether a study is at risk of bias. Bias adjustment reduces the estimated relative efficacy of the treatments and the extent of between-trial heterogeneity.

04-Feb-10: Performance indicators in higher education

Metrics, research award grades, and the REF (PowerPoint, 0.6 mb)

Harvey Goldstein

Estimating academic performance using research grant award gradings with general relevance to the use of metrics in judging research quality

Abstract: In the UK and elsewhere, there has been considerable debate about the use of quantitative indicators, known as 'metrics', to judge research performance of individual academics and university departments. This talk reports an analysis of the grades awarded to research grant applications made by ESRC, to explore the extent to which such information can be used for this purpose. Results suggest that the usefulness of these data is limited and also that there are similar important limitations associated with other metrics such as those based on journal paper citation indices.

"Bibliometrics and the REF" and "REF equalities analysis" (PDF, 415kB)

Mark Gittoes, Hannah White and David Mawdsley (Analytical Services Group, HEFCE)

Abstract: HEFCE are currently developing a new system for examining the quality of research in UK HEIs. The primary focus of the system will be to identify excellent research of all kinds and this will be assessed through a process of expert review, informed by citation information in subjects where robust data are available. This talk reports on two technical aspects of citation information which will help inform HEFCE's consultations on the process.

An important part of any approach to bibliometrics is benchmarking. The citation count of a piece of work, by itself, tells us little about it; citations accumulate with time, and rates of citation vary substantially between academic disciplines. In the first aspect discussed, we will look at the approaches to benchmarking (or normalising) citation counts against cohorts of similar outputs. Conventionally an output is benchmarked against the mean citation score of its cohort. We will discuss an alternative approach using percentiles.

In the second aspect, we will discuss an equalities analysis that was carried out using citation information. This considered the question of whether early career researchers (ECRs) were more or less likely than non-ECRs to be 'highly cited' on two citation databases. The analysis also considered other equality areas such as sex, ethnicity and disability and used statistical models to compare staff on a like for like basis.

(Back to top)

28-Oct-09: Computer experiments

Introduction to computer experiments, and the challenges of expensive models

Abstract: I will describe a general statistical framework for interpreting computer experiments, taking account of the limitations of the simulator (ie the computer model), and some particular implementations. I will also describe how 'emulators' can be used in experiments with expensive simulators, as often occur in environmental applications. The key feature of an emulator is that it allows us to augment the information from simulator evaluations with expert judgements about the simulator, such as those concerning smoothness, predictability, monotonicity, and so on.

Vincent Garreta (CEREGE)

Inversion of computer models and statistical process-based modelling

Abstract: Inversion is the statistical inference of inputs of computer model based on some data. This is needed for (a) calibrating the model's parameters which values are not derivable from theory, and (b) reconstructing the inputs that produced some data through a process partly mimicked by the computer model. Due to its black-box nature, the computer model imposes strong and original constraints for modelling and inference.

Process-based modelling appears in case (a) because it can be preferable to use (stat) process modelling to fill the gap between models in/outputs and data. It case (b) process-based modelling is often made possible by including one or more computer models.

We present the dimensions of the inversion problem and a challenging case study in palaeoclimatology where the inversion of a vegetation model provides climate reconstructions based on pollen data.

Hugo Maruri-Aguilar (Queen Mary)

Experimental designs for computer experiments

Abstract: When modelling a computer experiment, the deviation between model and simulation data is due only to the bias (discrepancy) between the model for the computer experiment and the deterministic (albeit complicated) computer simulation. For this reason, replications in computer experiments add no extra information and the experimenter is more interested in efficiently exploring the design region.

I'll present a survey of designs useful for exploring the design region and for modelling computer simulations.

Organiser: David Leslie (david.leslie@bristol.ac.uk)

(Back to top)

26-May-09: Avon longitudinal study of parents and children (ALSPAC)

An overview of the ALSPAC data resource (PowerPoint, 0.2 mb)

David Herrick: ALSPAC is a cohort study of parents and their children in the former county of Avon. The study children were born in 1991-92 to mothers resident in Avon. David described the various data sources (including self-completion, hands-on assessments, biological samples, DNA, health records, educational tests and school data, and linked datasets) and sub-studies.

Parental income and child outcomes (PowerPoint, 0.7 mb)

Liz Washbrook used data from the ALSPAC cohort to explore the association between family income and children's cognitive ability (IQ and school performance), socio-emotional outcomes (self esteem, locus of control and behavioural problems) and physical health (risk of obesity). The team (Paul Gregg, Carol Propper and Liz Washbrook) compared and contrasted the degree to which different aspects of low income children's environments can predict the observed deficits in a range of outcomes. Some specific mechanisms that were highlighted were maternal smoking and breastfeeding, child nutrition, parental psychological functioning and the home learning environment.

17-Mar-09: Handling missing data

Handling Missing Data (PDF, 389kB)

James Carpenter gave an introduction to missing data. We sketched the issues raised by missing data (item non-response),and outlined a principled approach for the analysis of a partially observed data set, reviewed common jargon, and finally outlined the ideas behind multiple imputation.

Missing data – issues and extensions (PowerPoint, 0.2 mb)

Harvey Goldstein's presentation showed some of the limitations of standard multiple imputation techniques for handling missing data. The standard approach to multiple imputation rests heavily upon normality assumptions which become questionable for binary or multicategorical variables. In addition, for multilevel structures, it is necessary to incorporate the structure at the imputation stage. We showed how categorical (ordered and unordered) variables with missing data can be handled and also multilevel data.

(Back to top)

14-Oct-08: Meeting the young statisticians in our area

Organiser: Alex Hudson (alex.hudson@uktransplant.nhs.uk)

The limitations of using school league tables to inform school choice (PDF, 309kB)

by George Leckie and Harvey Goldstein, Centre for Multilevel Modelling

Abstract: In England, so-called 'league tables' based upon examination results and test scores are published annually, ostensibly to inform parental choice of secondary schools. A crucial limitation of these tables is that the most recent published information is based on the current performance of a cohort of pupils who entered secondary schools several years earlier, whereas for choosing a school it is the future performance of the current cohort that is of interest. We show that there is substantial uncertainty in predicting such future performance and that incorporating this uncertainty leads to a situation where only a handful of schools' future performances can be separated from both the overall mean and from one another with an acceptable degree of precision. This suggests that school league tables, including value-added ones, have very little to offer as guides to school choice.

Interleukin-18 and Physical Function in Old Age: A Replication Study and Meta-analysis (PDF, 1,901kB)

by Kate Thomas, PhD student, Department of Social Medicine, University of Bristol

Abstract: Levels of the proinflammatory cytokine interleukin-18 (IL-18) are raised in old age and are associated with reduced physical functioning. Studies have identified a polymorphism in the IL-18 gene that is strongly associated with raised circulating IL-18 levels. This variant has previously been associated with reduced locomotor performance in old age, but the finding requires independent replication. We examined the association between the IL-18 polymorphism rs5744256 and physical functioning in three cohorts with a total of 4,107 subjects aged 60-85 years: English Longitudinal Study of Ageing, Caerphilly and Boyd Orr. We also meta-analysed the results with data from the original paper to report on this association: the Iowa-EPESE and InCHIANTI cohorts. Physical functioning was assessed by timed walks or the get up and go test. As locomotor performance tests differed between cohorts and the distributions of times to complete the test (in seconds) were positively skewed, we used the reciprocal transformation and computed study-specific z-scores. This presentation will further describe the methods and give the results of the analyses.

A Matching Algorithm for Paired Living Kidney Donation in the UK (PowerPoint, 0.5 mb)

by Joanne Allen, Senior Statistician, NHS Blood and Transplant, Bristol

Abstract: Blood group or tissue type incompatibility prevents many potential living donor kidney transplants. From September 2006, the Human Tissue Act enabled paired donation in the UK, whereby incompatible donor-recipient pairs can now exchange kidneys so that recipients can receive alternative compatible living donor organs. The British Transplantation Society and UK Transplant worked together to develop the arrangements for paired donation in the UK. Simulation programs were written in SAS, and the simulations were carried out using real data. Several stages were involved in the simulation process - identifying the possible two-way (paired) exchanges, identifying the possible combinations of two-way exchanges and then identifying the optimum combination. The simulations were used to determine effective waiting list sizes, the effectiveness of different matching and prioritisation factors and the likely chance of transplant for different types of patient. This presentation will describe the simulation process used to develop the paired donation scheme in the UK.

15-May-08: Mixture modelling

Why is Mixture Modelling so Popular? (PDF, 393kB)

by Tony Robinson (University of Bath)

Abstract: Over the last decade or so, interest in using mixtures of statistical models for all types of applications has risen rapidly from both the frequentist and Bayesian points of view. What is so appealing about such models? Are they always good news? Mainly through examples, I shall discuss the reasons for this interest and try to point out what you might gain and what you might lose from adopting a mixture modelling approach.

Estimating redshift for a mixture of types of galaxy (PDF, 1,918kB)

by Merrilee Hurn (University of Bath)

Abstract: The Sloan Digital Sky Survey (SDSS) is an extremely large astronomical survey conducted with the intention of mapping more than a quarter of the sky (http://www.sdss.org/). Among the data it is generating are spectroscopic and photometric measurements, both containing information about the redshift of galaxies. We consider a Bayesian mixture approach to providing uncertainty bounds associated with the underlying redshifts and the classifications of the galaxies. This is joint work with Peter Green and Fahima Al-Awadhi.

7-Feb-08: Current themes in health economics research and statistical methods in economic evaluation

Can Pay regulation kill? The effect of regulated pay on NHS performance (PowerPoint,1.3 mb)

by Carol Propper

Abstract: Labour market regulation can have harmful unintended consequences. In markets, especially for public sector workers, pay is regulated to be the same for workers across heterogeneous labour markets. We would predict that this will mean labour supply problems and potential falls in the quality of service provision in areas with stronger labour markets. In this paper we exploit panel data from the population of English acute hospitals where pay for medical staff is almost flat across geographies. We predict that areas with higher outside wages should suffer from problems of recruiting, retaining and motivating workers and this should harm hospital performance. We construct hospital-level panel data on both quality as measured by death rates (within hospital deaths within thirty days of emergency admission for acute myocardial infarction, AMI) and productivity. We present evidence that stronger local labour markets significantly worsen hospital outcomes in terms of quality and productivity. A 10% increase in the outside wage is associated with a 4% to 8% increase in AMI death rates. We find that an important part of this effect operates through hospitals in high outside wage areas having to rely more on temporary “agency staff” as they are unable to increase (regulated) wages in order to attract permanent employees. We quantify the magnitudes of these “hidden costs” of labour market regulation, which appear to be substantial.

Evidence-based decision making. How would you like that done, Minister? (PowerPoint, 2.0 mb)

by Tony Ades

Abstract: It is well-established health care decisions should be based on evidence, and that the evidence-base should be generated by a pre-defined protocol. Of course, it is not really the evidence that drives the decision, but a model, and the model is informed by the evidence.

What are the principles of evidence-based modelling? It does not seem controversial to suggest that a Minister of Health considering a national policy decision, for example on screening for chlamydia, should expect : (1) that the model is based on all the available evidence, and (2) that all the available evidence is consistent. (3) The Minister also needs to know the probability that a decision based on the currently available evidence is wrong, and (4) whether or not to ask for more information before making a decision that would change current policy.

We look at current modelling methods and practices, and some illustrative examples, to see how close they are to this ideal.

16-Oct-07: Meta-analysis in health services research

Investigating and dealing with bias in randomized trials and meta-analyses (PDF, 85kB)

by Jonathan Sterne

Abstract: There is increasing empirical evidence that particular flaws in the conduct of randomized controlled trials (RCTs) lead to bias in intervention effect estimates, as well as increasing between-trial heterogeneity. This literature is based on meta-epidemiological studies, in which collections of meta-analyses are used to examine associations between trial characteristics and intervention effect estimates. I will review this empirical evidence, describe a project to combine it in a single database, and describe statistical methods that might use existing evidence to correct for bias in new meta-analyses.

Genetic meta-analysis and mendelian randomization (PowerPoint, 0.5 mb)

by George Davey-Smith

Abstract: Genetic association studies constitute a huge and growing component of observational epidemiological data. They have particular value in being able to contribute to understanding causal associations, but have a poor history of replication. Particular issues with respect to genetic associations are the small effect sizes (requiring large sample sizes) but also a general lack of confounding and bias. This makes meta-analysis of genetic association studies more similar to meta-analyses of randomised controlled trials then meta-analyses in many observational settings. These issues will be discussed, as will the application of meta-analyses of genetic association studies to understanding the causal nature of environmental exposures.

Note: some of the documents on this page are in PDF format. In order to view a PDF you will need Adobe Acrobat Reader

(Back to top)