Dr Luisa Zuccolo (lead), Dr Gemma Sharp, Dr Cheryl McQuire, Dr Matt Suderman
Fetal alcohol spectrum disorders (FASDs) are lifelong disabilities caused by prenatal alcohol exposure, thought to be the leading preventable cause of developmental disability in the world. Understanding which individuals are most at risk to develop FASD is key. Early identification of FASD risk through robust maternal and infant biomarkers will offer an early window for intervention, will focus specialist diagnostic efforts in a resource-limited healthcare system, and will contribute towards mitigating FASD-related disabilities in later life. It may also prevent or reduce feetal exposure to alcohol in subsequent pregnancies.
This project will focus on DNA methylation (DNAm). A DNA methylation score was recently discovered that can robustly distinguish between current heavy drinkers and non-drinkers in a general population. In addition, strong evidence from animal models indicates that alcohol use during pregnancy can alter offspring DNAm.
We have developed an algorithm to screen for FASD in the Avon Longitudinal Study of Parents and Children (ALSPAC). This allows to identify individuals at high risk of FASD regardless of a formal (and rare!) diagnosis. Using the algorithm as the best available proxy for an FASD diagnosis in the ALSPAC study, we propose to run analyses integrating molecular data (DNAm-based markers) to find out whether a very early (prenatal or early perinatal) risk assessment would be possible.
To identify optimal maternal and infant biomarkers of FASD, by integrating DNA methylation data and rich phenotypic data on FASD risk in the ALSPAC study and replication cohorts.
1. To review current evidence on epigenetic (specifically methylation)-based biomarkers in prediction of alcohol use and/or health consequences of prenatal alcohol use, including FASD
2. To identify novel maternal DNAm-based predictors of FASD and compare their diagnostic performance to previously known predictors
3. To identify novel infant DNAm-based predictors of FASD and compare their diagnostic performance to previously known predictors (maternal and offspring)
4. To replicate the findings of Objectives 2 and 3 in independent cohorts from populations with different rates of FASD (eg Western Europe Vs South Africa)
5. To extend the predictor by including genetic and phenotypic risk factors for FASD
Univariate and multivariate regressions
Epigenome-wide association studies (EWAS)
Liu, C. et al. A DNA methylation biomarker of alcohol consumption. Mol. Psychiatry 23(2):422-433 (2018)
Mandal, C., Halder, D., Jung, K. H. & Chai, Y. G. In Utero Alcohol Exposure and the Alteration of Histone Marks in the Developing Fetus: An Epigenetic Phenomenon of Maternal Drinking. Int. J. Biol. Sci. 13, 1100–1108 (2017).
McQuire C, Mukherjee R, Hurt L, Higgins A, Greene G, Farewell D, Kemp A, Paranjothy, S. Screening prevalence of fetal alcohol spectrum disorders in a region of the United Kingdom: a population-based birth-cohort study. Preventive Medicine (submitted).
Chudley AE, Conry J, Cook JL, Loock C, Rosales T, LeBlanc N. Fetal alcohol spectrum disorder: Canadian guidelines for diagnosis. Canadian Medical Association Journal. 2005;172(5 suppl):S1-S21.
Dr Gemma Sharp (lead), Dr Luisa Zuccolo, Prof Anita Thapar
There is growing concern that paternal exposures before conception have been greatly neglected. Studying how these impact on future generations’ health could open new avenues for prevention-prospective fathers are not generally advised to change their behaviour. In animal studies, one of the paternal exposures showing the largest and most consistently reported effects relative to the prenatal period is alcohol, however convincing human evidence is lacking to date. In this project, we propose to investigate the relationship between pre-conception paternal alcohol use and offspring mental health, and in particular attention deficit hyperactivity disorder (ADHD).
1 Is paternal alcohol use before conception associated with offspring ADHD, and/or its brain morphology correlates?
2 Is this due to shared genetic influences?
3 Is this association robust to different causal inference methods of analyses such as Mendelian randomization, and the use of negative controls (eg non-biological fathers)?
4 Are offspring cord blood DNA methylation and ADHD structural brain correlates on the causal pathway from paternal alcohol exposure childhood ADHD?
To improve the chance of disentangling correlation from causation, we propose to use a paradigm similar to that of (lab studies of) rodent models of paternal effects: a pseudo-experimental study design (employing Mendelian randomization and other analytical approaches to improve causal inference), restricting to families with no intrauterine exposure to alcohol (like in the animal studies), focusing on early manifestation of the outcome occurring before the onset of own drinking to remove confounding by own drinking, and even earlier potential biological mediators of the effects (eg cord-blood DNA methylation, eliminating confounding by paternal drinking in the postnatal period, both set to ‘zero’ in animal studies).
Finegersh A, Alcohol 2015 49:461-70
Rogers JC, JAMA Psychiatry 2016 73:64-72
Chen YC, Mol Psychiatry 2017 Mar 21 -Epub ahead of print
Dr Tom Richardson (lead), Professor George Davey Smith, Professor Tim Frayling
Leveraging high dimensional biomedical and molecular datasets provides an unprecedented opportunity to develop our understanding of both the environmental and genetic components of complex disease. This project will focus on developing expertise in applying state-of-the-art techniques in causal inference and bioinformatics to large-scale and phenotypically rich datasets. In doing so, the successful student will undertake leading research in the field of epidemiology by uncovering risk factors, genes and biological pathways which contribute to disease susceptibility.
Our understanding of how our genes confer a predisposition to disease from birth has made substantial advances in the last decade. Polygenic risk scores (PRS), commonly defined as the weighted sum of risk alleles for a given disease, are now beginning to demonstrate their full potential in terms clinical utility and disease prediction. Furthermore, these profiles provide an unparalleled opportunity to evaluate the causal relationship between modifiable risk factors and outcomes by applying the principles of Mendelian randomization (MR).
- Build polygenic risk scores for various risk factors and complex traits
- Evaluate causal relationships between modifiable risk factors and disease outcomes using these scores
- Dissect scores for identified associations to develop understanding into how lifestyle factors confer risk of disease
- Evaluate the individual genes driving observed associations to improve biological interpretation of results
This PhD will involve constructing PRS for hundreds of complex traits and using them in MR analyses to identify and characterise novel epidemiological relationships. There will be an emphasis on innovative development and application of these scores, as well as dissecting identified associations, for example to discern which pathway sets of genes appear to be driving the identified effect. Extensive follow-up of these results will involve using high-dimensional molecular datasets to develop our understanding of the underlying biological mechanisms in disease. For instance, integrating molecular datasets can help us unravel causal genes and mechanisms in disease, as well as detect biomarkers which may prove valuable for early disease prognosis and prevention.
We will also harness large-scale sequencing data (e.g. exome and whole genome sequencing data from the UK Biobank study) and build gene risk scores (GRS) to elucidate genes which influence disease risk. Furthermore, these analyses will likely yield insight into proposed therapeutic intervention, for instance evaluating potential adverse effects and possible drug repurposing.
Professor Kate Tilling (lead), Dr Jonathan Bartlett, Dr Rachael Hughes Rosie Cornish
Missing data can cause bias, and common solutions (MI and IPW) are not always applicable – particularly where data are missing not at random (MNAR). We will develop methods for analysing incomplete MNAR data – including Bayesian methods and Instrumental Variables – and apply them to cohort studies, particularly those with linkage to routine data (e.g. GP records).
The central aim of this PhD is to develop sensitivity analyses which allow for data to be MNAR, and which make optimal use of external data.
Typically such MNAR sensitivity analyses have involved analysts specifying priors for sensitivity parameters which are difficult to interpret and hence also difficult to elicit prior beliefs or knowledge about (1). We will build on recent work (2) to develop approaches which instead rely on estimates of simple population quantities (e.g. from the linked data), such as the average BMI or proportion with depression. This will involve investigating how sensitive results are to these priors/estimates and to the particular non-response model used. Methods explored will include Bayesian and Instrumental Variables models.
The methods developed will be applied to important epidemiological questions using data from cohorts, including the world-leading Avon Longitudinal Study of Parents and Children (ALSPAC) cohort, and UK Biobank.
1. Stat Med. 2018 Jul 10;37(15):2338-2353. On the use of the not-at-random fully conditional specification (NARFCS) procedure in practice. Tompsett et al.
2. Population-calibrated multiple imputation for a binary/categorical covariate in categorical regression models. Statistics in medicine. ISSN 0277-6715 DOI: https://doi.org/10.1002/sim.8004. Pham et al.
Dr Eleanor Sanderson (lead), Dr Rhian Daniel (Cardiff), Professor George Davey Smith Dr Laura Howe
Many of an individual’s traits are observationally associated with their health outcomes. Understanding the exact relationships between these factors is critical to effective public health intervention. When multiple traits are potentially associated with a disease or health outcome, it is often not clear how much of the observed effect of each single trait is due to the effect of that trait or behaviour on other traits and behaviours, which then affect the outcome, and how much is “direct” in the sense that it is not mediated by the other traits being considered.
Causal mediation analysis is one approach that can be used to determine the proportion of the effect of a trait on an outcome that is via a mediating variable. However, this method relies on many strong assumptions. An alternative approach, relying on different assumptions, is Mendelian Randomisation (MR). MR is a method of instrumental variable analysis which utilises genetic variation between individuals to help understand causal effects.
The aim of this project is to conduct research on the strengths and limitations of MR when trying to understand the causal effects of multiple exposures on a health outcome. This will include investigation of how novel methods of MR analysis such as Multivariable MR relate to mediation analysis and developing and extend existing MR methods to deal with multiple mediators. This project will also explore possibilities of combining the two approaches.
In this PhD you will have the opportunity to work with leading researchers in the fields of population health, statistics, and Mendelian randomisation to further develop statistical methods for causal analysis based around MR mediation analysis.
This project will involve mathematical derivation of the properties of the extended methods and verification using simulation studies. This project will also involve analysis of both individual-level and summary-level data to apply the methods developed.
Although this project will be methodological in focus, the student will have the opportunity to develop a relevant application of these methods based on their personal research interests. Prospective applicants should have a strong quantitative background and an interest in developing methods for causal analysis within a population health setting; however, no particular background knowledge is required.
Professor Jon Tobias (lead), Dr Louise Millard, Dr Celia Gregson
Osteoarthritis (OA) is a major cause of morbidity in older people representing a significant economic burden for the NHS through costs of surgery. The AUGMENT study (funded by a Wellcome collaborative grant) aims to reduce this impact, by underpinning new preventative strategies for OA based on improved understanding of causal pathways, and development of predictive tools for progression to joint replacement. Involving 100,000 individuals from UK Biobank, AUGMENT will focus on the role of joint shape, derived from DXA scans by principal components analysis. However, OA progression is related to several other features that are detectable by DXA scans but independent of joint shape, including thickening of the bone immediately adjacent to the joint (sclerosis), bone mineral density (BMD) (1), and bone texture (2); these features also need evaluation if the potential of using high-resolution DXA to identify those at risk of OA progression is to be fully realised.
I. To develop automated methods for evaluating sclerosis, BMD and texture using knee DXA scans.
II. To use machine learning to predict risk of OA progression from knee DXA scans by combining modalities in (I), with joint shape including osteophytes.
III. To evaluate how well our classifier predicts subsequent total knee replacement.
IV. To identify novel genetic pathways contributing to knee OA progression, based on relationships between genome wide genetic data and our classifier output.
I. Sclerosis will be derived from pixel density profiles adjacent to the knee joint. Positioning of ROIs previously used to measure medial/lateral BMD will be automated (1). Methods for evaluating lumbar spine bone texture (3) will be applied to the tibial subchondral region.
II. Modalities from (I) and knee shape evaluated in AUGMENT including osteophytes, will be obtained in the OAI bone study (n=629), an early knee OA cohort at high risk of subsequent progression. These data will be used to 1) train a classifier to predict radiographic knee OA progression and 2) evaluate classifier performance using cross-validation. Following validation in a manually labelled subsample of UK Biobank, the classifier will be applied to all 100,000 UK Biobank DXA images.
III. We will test whether our OA progression classifier predicts total knee replacement based on HES-linked incidence data, in UK Biobank.
IV. GWAS will be performed in UK Biobank to identify genetic influences on knee OA progression.
1 Lo GH Periarticular bone predicts knee OA progression. Semin Arthritis Rheum 2018
2 Lespessailles E Bone texture analysis on radiographic images. Calcif Tiss Int 2007;80:97-102
3 Schousboe JT Association of Trabecular Bone Score With Vertebral Fractures J Bone Miner Res 2017;32:1554-8
Prof Tom Gaunt (lead), Prof Jules Hancox,
The orderly sequence of electrical excitation of the heart measured at the body surface as the electrocardiogram (ECG) arises due to the combined activity of multiple ion channel proteins and electrogenic transporters. Mutations to the underlying genes lead to arrhythmia disorders, whilst more common single nucleotide polymorphisms (SNPs) produce more subtle effects that underlie some of the observed population differences in ECG parameters. Genome wide association studies (GWAS) have identified SNPs in a number of ion-channel and non-ion channel genes that associate with ECG traits [1, 2], providing the opportunity to explore the causal relationship between ECG traits and other phenotypes. An approach called Mendelian randomization (MR)  pioneered in Bristol offers potential to exploit SNPs as causal “anchors” to determine whether ECG traits cause other phenotypes, and also whether other phenotypes causally affect ECG traits. MR has been widely applied, eg in demonstrating the likely inefficacy of HDL raising treatment without requiring drug development .This methodology can exploit a wealth of publicly available datasets integrated with cutting-edge statistical methods in the MR-Base platform for systematic MR (http://www.mrbase.org/) developed in Bristol. An additional analytical platform (LD Hub, http://ldsc.broadinstitute.org/) exploits the same data to enable evaluation of genetic correlation between phenotypes.
The aims of this project are (1) to identify causal factors influencing ECG traits and (2) to identify phenotypes causally influenced by ECG traits. The specific objectives include:
1. Collating published data on genetic associations with ECG traits
2. LD score regression (using LD Hub) to analyse genetic correlation between ECG traits and other traits.
3. Mendelian randomization (MR) analyses to identify risk factors, drug targets and lifestyle exposures that alter specific ECG traits.
4. MR analyses to identify phenotypes and health outcomes that are causally influenced by ECG traits
5. Performing in vitro analyses of pharmacologically tractable molecular pathways
Collating published data: data from GWAS of ECG traits (such as QT-interval, QRS-duration, PR-interval) will be collated, processed and uploaded to the MR-Base/LD Hub database.
LD score regression: These analyses will use LD Hub to analyse genetic correlation between ECG traits and other traits in the database. The genetic correlation indicates the extent to which two traits share a common genetic basis, and enables common molecular pathways to be identified. These will be subject to pathway analysis.
Mendelian randomization: Two different categories of analysis will be performed. In the first, genetic variants related to specific druggable genes, lifestyle exposures (eg smoking) and other traits (eg blood pressure) will be tested for their influence on ECG traits to identify novel risk factors and potential intervention targets for abnormal ECG traits. In the second category genetic variants related to ECG traits will be tested for their influence on other phenotypes and health outcomes to determine whether ECG traits causally influence other aspects of health.
Performing in vitro analyses: For genetic variants and pathways that are pharmacologically tractable, causality will be tested using electrophysiological recording from appropriate isolated cardiac cell preparations and pharmacological agents to disrupt or enhance gene product function.
1. Arking DE, Pulit SL, Crotti L, van der Harst P, Munroe PB, Koopmann TT, Sotoodehnia N, Rossin EJ, Morley M, Wang X et al: Genetic association study of QT interval highlights role for calcium signaling pathways in myocardial repolarization. Nat Genet 2014, 46(8):826-836.
2. Gaunt TR, Shah S, Nelson CP, Drenos F, Braund PS, Adeniran I, Folkersen L, Lawlor DA, Casas J-P, Amuzu A et al: Integration of Genetics into a Systems Model of Electrocardiographic Traits Using HumanCVD BeadChip. Circulation-Cardiovascular Genetics 2012, 5(6):630-638.
3. Davey Smith G, Ebrahim S: 'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol 2003, 32(1):1-22.
4. Burgess S, Harshfield E: Mendelian randomization to assess causal effects of blood lipids on coronary heart disease: lessons from the past and applications to the future. Curr Opin Endocrinol Diabetes Obes 2016, 23(2):124-130.
Prof Tom Gaunt (lead), Prof Peter Flach, Dr Ben Elsworth
Population health research is being transformed by the increasing wealth of complex data. New high-dimensional epidemiological datasets provide novel opportunities for systematic approaches to understanding the relationships between risk factors and disease outcomes. Moving beyond individual hypothesis testing to fully exploit the available data requires new approaches to data mining, including the development of machine learning approaches, natural language processing and application of ontologies/knowledge representation.
This studentship will work on data mining within a novel platform (EpiGraphDB) being developed within the MRC IEU. EpiGraphDB integrates genomic and population health data with information mined from the scientific literature and from a range of bioinformatic databases.
The aim of this studentship is to develop and apply data mining methods within a complex graph database representing epidemiological relationships between traits and other relevant biological information.
1. Develop efficient methods for identification of informative sub-graphs utilising a range of methods
2. Develop approaches to knowledge representation within the graph database to support more effective data mining
3. Develop approaches to the triangulation of different types of evidence to prioritise findings
4. Identify novel risk factors for disease
5. Identify potentially spurious established risk factors
The student will be encouraged and supported in developing their own research ideas.
A wide range of research methods may be used, including machine learning, network analysis, natural language processing, causal inference. The exact methods will depend on the background and interests of the successful candidate.
Dr Sarah Lewis (lead), Dr Evie Stergiakouli, Dr Gemma Sharp
Orofacial clefts are the most common types of birth defects worldwide and occur in around 650 live births in the UK. Although we know that Aaround 30% of clefts arise as a result of genetic syndromes and follow a monogenic model of inheritance, but the majority of clefts follow a multifactorial model with both genetic and prenatal environmental risk factors that are still largely unknown. A better understanding of the causes of orofacial clefts will be essential to inform better prediction and prevention strategies, and to improve outcomes amongst those born with a cleft. Previous genome wide association studies (GWAS) of non-syndromic cleft lip and/or palate have identified several new genetic loci (Leslie et al, 2016). However, as has been shown in other diseases, further GWAS in different populations and with larger sample sizes are likely to uncover new variants. In addition, existing cleft GWAS have compared genes in people with a cleft to people without. However, maternal genes may also determine whether a child develops a cleft or not since these influence the prenatal environment. This project will use a combination of GWAS analyses and Mendelian randomization to identify genetic and modifiable causes of cleft.
Aim: The overall aim of this project is to identify causal genetic and lifestyle factors for cleft
1) To carry-out a GWAS using genetic data from the Cleft Collective to identify genetic variants associated with cleft
2) To carry-out a GWAS of mothers whose children were born with cleft compared to control mothers to identify maternal risk variants
3) To perform Mendelian randomization analyses to determine whether Mother’s BMI and folic acid intake and other nutrients are risk factors for cleft
Generate analysis plan and conduct GWAS of child’s genetic variants and mother’s genetic variants with child’s cleft phenotype as the outcome.
Stratify by cleft sub-types and repeat the above GWAS analyses.
To identify genetic instruments for nutritional factors hypothesized to cause cleft lip and or cleft palate.
To perform two sample Mendelian randomization to identify causal risk factors for cleft.
Dr Evie Stergiakouli (lead), Dr Emma Anderson, Dr Laura Howe
Patients with childhood neurodevelopmental disorders, such as attention-deficit/hyperactivity
disorder (ADHD) and autism spectrum disorder (ASD), could be at increased risk of developing
Alzheimer’s disease but this has not been investigated adequately due to the lack of long-term
follow-up data. Utilizing the existing information generated by genetic studies and large population
cohort studies we will test if the genetic risk factors causing childhood neurodevelopmental
disorders are also implicated in Alzheimer’s disease. We will also use rapid high-throughput
analysis methods developed in the MRC IEU to take advantage of publicly available information on
the genetics of complex diseases to test if childhood neurodevelopmental disorders can cause
Alzheimer’s disease. This project could have implications for identifying patients at increased risk of
dementia and supporting a particularly vulnerable group of patients.
We have the expertise to train a student in applying polygenic risk score analysis and Mendelian randomization with the aim to investigate if there is genetic overlap between childhood neurodevelopmental disorders and Alzheimer’s disease and if childhood neurodevelopmental disorders can cause Alzheimer’s disease, using genetic data. These methods have not been applied to study the genetic overlap of Alzheimer’s disease with childhood measures and neither have the causal effect of childhood ADHD and ASD on Alzheimer’s disease been investigated in a framework that takes horizontal pleiotropy into account. Research on Alzheimer’s disease in children is important because it avoids issues of selection bias which can occur when studying
The specific hypotheses to be tested are:
1. There is shared genetic susceptibility between childhood neurodevelopmental disorders (including ADHD and ASD) and Alzheimer’s disease.
2. ADHD and/or ASD are causally associated with Alzheimer’s disease.
To test the first hypothesis we will calculate polygenic risk scores for ADHD and ASD in adults from the general population and test whether they are associated with trajectories of cognitive decline from mid- to late-life and brain imaging measures. Polygenic risk scores for Alzheimer’s disease will also be calculated to test whether they are associated with trajectories of ADHD and ASD symptoms in children from the general population.
To test the second hypothesis we will conduct 2-sample MR and perform sensitivity analyses to test and adjust for pleiotropic effects.
Callahan BL, Bierstone D, Stuss DT, Black SE. Adult ADHD: Risk Factor for Dementia or Phenotypic Mimic? Frontiers in Aging Neuroscience. 2017;3(9):260
Golimstok A, Rojas JI, Romano M, Zurru MC, Doctorovich D, Cristiano E. Previous adult attention-deficit and hyperactivity disorder symptoms and risk of dementia with Lewy bodies: a case-control study. European Journal of Neurology. 2011;18(1):78
Lambert JC, Ibrahim-Verbaas CA, Harold D, Naj AC, Sims R, Bellenguez C, et al. Metaanalysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nature Genetics. 2013;45(12):1452-8.
Demontis D, Walters RK, Martin J, Mattheisen M, Als TD, Agerbo E et al. Discovery of the first genome-wide significant risk loci for ADHD. BioRxiv. Jun. 3, 2017; doi: http://dx.doi.org/10.1101/145581
Grove J, Ripke S, Als TD, Mattheisen M, Walters R, Won H, et al. Common risk variants identified in autism spectrum disorder . BioRxiv. Nov. 25, 2017; doi: http://dx.doi.org/10.1101/224774
Dr Evie Stergiakouli (lead), Dr Yvonne Wren, Dr Sarah Lewis
Approximately 1000 children are born in the UK each year with a cleft of the palate. Surgical repair takes place within the first year of life, but the impact of the cleft and the surgical repair can have long lasting consequences on speech and language development as well as other outcomes. While some children have cleft lip/palate as part of a syndrome, most are non-syndromic and multifactorial in nature. The Cleft Collective Cohort Studies was set up to provide a data resource for researchers and clinical academics to address questions related to risk factors and outcomes for individuals affected by cleft. The dataset includes DNA and recent funding has facilitated genotyping of the entire sample. Combined with information from parent report and clinical records, there is a rich dataset of genetic, speech, language and environment information.
Speech and language problems are heritable as shown by family and twin studies (Newbury and Monaco 2010). The genetic mechanisms underlying susceptibility to speech and language disorders are multifactorial in nature, involving both genetic factors and environmental influences. Recently genome-wide association studies have started to identify specific genetic variants contributing to speech and language impairments (Reader et al. 2014). This project provides a unique opportunity to contribute to our understanding of why some children born with cleft palate are more likely to have problems with speech and language than others. This information will be of use to clinicians in terms of knowing who to prioritise for treatment and/or provide preventative intervention. Moreover, it will enable parents to be fully informed about their child’s prognosis and options for management.Recent work by the Cleft Collective team using polygenic risk scores and bidirectional Mendelian randomisation found strong evidence of genetic overlap between nonsyndromic cleft lip and palate and philtrum width (Howe et al, 2018). The associations of cleft with education have also been studied using similar methods.
The aim of this project is to use quantitative genetics methods and causally informative designs to investigate genetic influences on speech and language among those born with a cleft palate and to compare this with genetic risk factors for speech and language impairment in the general population. The project will also apply causally informative designs to investigate whether having a cleft palate causes adverse outcomes in speech as opposed to being correlated with them . The candidate will have access to detailed speech and language data from the Avon Longitudinal Study of Parents and Children and the Cleft Collective which are both based in Bristol.
This a highly interdisciplinary project which involves applying cutting-edge genetic epidemiological methods to clinical phenotypes that have not been at the forefront of research on genetics. It also highly original, since it will apply causally informative designs based on genetics findings to investigate if a phenotype (in this case cleft lip/palate) is causing adverse outcomes as opposed to being correlated with them. The supervisors involved come from completely different backgrounds (clinical SLT, cleft, genetics, epidemiology, linguistics) and the candidate would have the opportunity of developing into one of the few experts with in-depth understanding of these fields. The data required for this project have already been collected or are publicly available making it less likely for setbacks to appear. One of the unique features of this project is that the candidate will be able to gain a good understanding of running a cohort study by attending weekly Cleft Collective meetings on questionnaire planning, public and patient involvement, research governance and funding applications.
Dr Evie Stergiakouli (lead), Dr Gemm Sharp, Dr Sarah Lewis
Approximately 1000 children are born in the UK each year with a cleft lip and/or palate. Surgical repair takes place within the first year of life, but the impact of the cleft and the surgical repair can have long lasting consequences. While some children have cleft lip/palate as part of a syndrome, most are non-syndromic and multifactorial in nature. The Cleft Collective Cohort Studies (http://www.bristol.ac.uk/dental/cleft-collective/) was set up to provide a data resource for researchers and clinical academics to address questions related to risk factors and outcomes for individuals affected by cleft. The dataset includes DNA and recent funding has facilitated genotyping of the entire sample. Combined with information from parent report and clinical records, there is a rich dataset of genetic and behavioural development outcomes.
Recent studies have suggested that children with nonsyndromic clefts have a higher risk of neurodevelopmental disorders, including ADHD, autism and psychotic disorders (Tillman et al. 2018). In addition, nonsyndromic clefts have been found to be associated with poor academic achievement (Richman et al. 2012). Recent work by the Cleft Collective team using polygenic risk scores and bidirectional Mendelian randomisation did not identify evidence of genetic overlap between nonsyndromic cleft lip and/or palate and educational attainment (Dardani et al. BioRxiv 2018). This indicates that the risk is unlikely to be explained by familial influences such as inherited genetic factors.
This project provides a unique opportunity to contribute to our understanding of why some children born with cleft lip and/or palate are more likely to have neurodevelopmental problems than others. This information will be of use to clinicians in terms of knowing who to prioritise for treatment and/or provide preventative intervention. Moreover, it will enable parents to be fully informed about their child’s prognosis and options for management.
The aims of the project are:
To describe neurodevelopmental outcomes in children born with cleft lip and/or palate in the UK and compare them to children of the same age from the general population.
To investigate genetic influences on neurodevelopmental disorders among those born with a cleft lip and/or palate and to compare this with genetic risk factors for neurodevelopmental disorders in the general population.
To investigate whether having a cleft causes adverse neurodevelopmental outcomes as opposed to being correlated with them.
For the first aim longitudinal data on neurodevelopmental disorders from the Cleft Collective will be utilized and epidemiological analysis will be employed.
For the second aim we will use quantitative genetics methods and causally informative designs to investigate genetic influences on neurodevelopmental disorders among those born with a cleft lip and/or palate.
For the third aim will also apply causally informative designs to investigate whether having a cleft causes adverse neurodevelopmental outcomes as opposed to being correlated with them.
Tillman et al. J Am Acad Child Adolesc Psychiatry 2018;57(11):876–883
Dardani et al. BioRxiv 2018 doi: https://doi.org/10.1101/434126
Dr Evie Stergiakouli (lead), Dr Dheeraj Rai,
Neurodevelopmental disorders start in childhood and include Attention Deficit Hyperactivity Disorder (ADHD), autism spectrum disorder (ASD) and learning/intellectual disabilities. These conditions are common, collectively affecting around 5-10% of children and often associated with long term disabilities. Although previously considered as childhood-limited, the difficulties associated with these conditions can be life-long and impair many aspects of adult life, including physical health and wellbeing. Although neurodevelopmental disorders are considered as categorical diagnoses, there is evidence to suggest that they behave as continuous traits in the general population.
Neurodevelopmental disorders and traits are highly heritable with a contribution from common genetic variants. These variants can be utilized to investigate associations and causal links between neurodevelopmental disorders and physical health outcomes in adults with a neurodevelopmental disorder or with a high genetic risk for a neurodevelopmental disorder.
This projects aims to investigate the link between neurodevelopmental disorders and traits and poor physical health outcomes and to test for a causal relationship by using epidemiological methods.
The student will have the opportunity to receive training in:
· epidemiological analysis within large population cohort studies including ALSPAC and the Stockholm Youth Cohort
· genetic epidemiological analysis including polygenic risk score analysis and LD score regression in large datasets including ALSPAC and the UK Biobank
· 2-sample mendelian randomization analysis and other causal methods developed in the MRC Integrative Epidemiology Unit to assess and take into account pleiotropic effects.
Dr Sarah Lewis (lead), Dr Dheeraj Rai,
Background: There is a body of evidence that heavy metal exposure in the prenatal period and in the first two years of life may lead to impaired neurodevelopment. Previous studies have suggested adverse effects of arsenic and lead exposure (even at low levels) on children's cognitive function, including lower IQ scores, impaired attention and memory, and behavioural problems. In addition, copper and methylmercury have been shown to have negative effects on the developing brain. The causal nature of these associations is still unclear, particularly for a wider range of neurodevelopmental disorders such as autism and attention deficit/hyperactivity disorder (ADHD). It is also unclear whether any effects of heavy metals on the brain are long lasting and whether they could be related to negative behaviours or mental health issues later in adolescence or adulthood.
Aim: The overall aim of this project is to use Mendelian randomization to determine whether early life heavy metal exposure is causally related to neurodevelopmental disorders and with mental health problems and problem behaviours in adolescents and young adults.
1) To identify genetic instruments for lead, arsenic, copper and methylmercury
2) To carry-out association analyses of prenatal (and early life) heavy metal exposure with symptoms and diagnoses of behavioural (conduct disorder), neurodevelopmental (autism, ADHD, intellectual disability); and mental health (depression, anxiety and psychotic experiences) problems in a population based birth cohort (ALSPAC)
3) To perform Mendelian randomization analyses to determine whether these associations are likely to be causal.
Generate analysis plan and draw possible causal diagrams.
Carry-out standard multivariate logistic regression analyses based on observational data.
To perform two sample Mendelian randomization to determine whether heavy metals are causal risk factors for conduct disorder, depression and anxiety in adolescents and young adults.
Smith GD and Ebrahim S. Int J Epidemiol. 2003;32(1):1-22.
Lanphear, B. P. et al. 2005. Environmental Health Perspectives 2005; 113, 894-899.
Daniels, J. L., Longnecker, M. P., Rowland, A. S., and Golding, J. Epidemiology 2005; 15, 394-402.
Dr Matthew Ridd (lead), Dr Sarah Sullivan, Dr Ketaki Bhate
Acne is an inflammatory skin disorder comprising papules/pustules, comedones, hyper-pigmentation and scarring. Almost all teenagers are affected to some degree, with 20% being moderately-to-severely affected. There is accompanying psychosocial morbidity and the physical impairment/disfigurement caused by hyper-pigmentation or scarring can be permanent. Attendance in both primary and secondary care consume considerable NHS resources. However, there little is published on natural history and conflicting evidence surrounding the relationships between acne and diet, psychological-stress and obesity.
Further research is needed to better understand both risk factors for the development and persistence of acne; and the psychological consequences of having acne. This work could provide evidence-leading to healthcare improvements and better understanding of the link between acne and mental health in adolescence, which is a vulnerable period for mental health disorders.
This study has three aims:
1. To investigate how common acne is
2. To investigate risk factors for acne onset and persistence
3. To study the psychosocial consequences of having acne
Using data from the Avon Longitudinal Study of Parents and Children (ALSPAC) birth cohort, first the prevalence, incidence-rate and cumulative-incidence of acne between will be estimated using data from study clinics where acne was examined in detail by trained healthcare professionals. The sex, age, ethnic-group and socioeconomic distributions of young people with acne according to disease severity and comparing with those who do not have acne will be described. Persistence of acne across examinations at ages 9-13 will be described.
Data will be used to test hypotheses that: dietary factors such as dairy-rich or high glycaemic-index diets, psychological- stress and obesity early in childhood are positively associated with early onset of acne, and the progression and severity of acne; and the risk of depression, low self-esteem and time-off-school are increased in those patients who have had acne.
Bhate K, Williams H C. Epidemiology of acne vulgaris. BJD 2013; 168: 474-485.
Layton A, Eady EA, Peat M, et al. Identifying acne treatment uncertainties via a James Lind Alliance Priority Setting Partnership. BMJ Open 2015; 5: e008085 DOI: 10.1136/bmjopen-2015-008085
Dr. Josine Min (lead), Prof. Caroline Relton, Prof. Jon Mill Dr Eilis Hannon
This studentship will provide cross-disciplinary training in state-of-the-art genetic and genomic epidemiological approaches (under the supervision of Dr. Josine Min and Prof Caroline Relton at the Medical Research Council Integrative Epidemiology Unit at the University of Bristol and Prof Jon Mill and Dr. Eilis Hannon at the University of Exeter Medical School) to address questions about the molecular mechanism underlying established disease risk factors. The student will combine epigenetic, genetic and causal inference analyses in large-scale epidemiological datasets.
To date the majority of studies used to link behavioral phenotypes such as cigarette smoking, and alcohol use to health outcomes typically employ questionnaire data. Multiple DNA methylation (DNAm) sites are strongly associated with (behavioural) traits. DNAm derived scores have been used to predict (or proxy for) these traits providing greater precision and biological proximity than self-reported measures. The DNAm derived smoking score is a widely used biomarker of lifetime exposure to tobacco smoke and may explain the molecular mechanism of the long-term risk of diseases following smoking cessation. There is growing interest in conducting genome-wide association studies (GWAS) and Mendelian randomization (MR) analysis on DNAm scores to identify novel genetic and causal factors influencing behavioural traits.
The overall aim of this PhD is to identify genetic variants and biological pathways associated with disease risk factors using DNAm scores. The specific risk factors/diseases for this project would depend on the candidate's research interests, but could include cell counts, smoking or alcohol use. Training will involve the analysis of large genetic and epigenetic datasets, MR, causal inference and mediation approaches. The Genetics of DNA Methylation Consortium (GoDMC; http://www.godmc.org.uk/) has collected genetic and DNAm data across multiple cohorts offering the student an excellent platform for these analyses.
1) Novel methodology can be used (and potentially developed) to construct DNAm scores on disease risk factors
2) GWAS on DNAm derived phenotype datasets will be conducted followed by meta-analyses. There will be several challenges with this type of analysis including heterogeneity of datasets in age, sex and tissue type.
3) To understand what aspect of the phenotype is captured by the DNAm phenotype, GWAS meta-analysis results will be compared to GWA results of detailed (self-reported) phenotypes (eg. in UK Biobank) and methylation quantitative loci from blood and brain.
4) MR analysis will be used to investigate causal relationships between DNAm derived measures and self-reported measures and other diseases/risk factors.
5) The heritability component of DNAm derived phenotypes will be estimated.
1. Joehanes R, Just AC, Marioni RE, Pilling LC, Reynolds LM, Mandaviya PR et al. Epigenetic Signatures of
Cigarette Smoking. Circ Cardiovasc Genet. 2016 Oct;9(5):436-447.
2. Lu AT, Xue L, Salfati EL, Chen BH, Ferrucci L, Levy D et al. GWAS of
epigenetic aging rates in blood reveals a critical role for TERT. Nat Commun.
2018 Jan 26;9(1):387.
Dr Carol Joinson (lead), Dr Abigail Fraser, Dr Jon Heron
Depression is a leading contributor to disease burden in young people with devastating effects on social functioning, academic achievement and increased risks of substance abuse, self-harm and suicidal behaviour. Women are around twice as likely to experience depression than men and this unequal sex ratio emerges during adolescence.
There is strong evidence that the rise in depression in girls during adolescence is more strongly related to pubertal maturation than increasing age, but the contribution of puberty to the emergence of depression in boys is poorly understood.
The profound biological and psychosocial changes of puberty are believed to increase the risk of depression, but few studies have sought to identify underlying mechanisms. In particular, there is little understanding of the mechanisms that generate the unequal sex ratio in depression during puberty. It is unlikely that the rise in depression during puberty is explained by biological mechanisms alone. Psychosocial theories propose that individual and social factors explain more of the variance in adolescent depression than hormones.
This PhD project has the potential to shed light on the complex causal pathways that explain the relationships between pubertal development and the rise in depression during adolescence. The aim is to examine potential psychosocial mediators of the relationship between pubertal development and depression in girls and boys. Possible mediators include risky behaviours, relationships with family and friends, peer victimisation, body image, cognitive functioning, and educational attainment. The project will take advantage of existing data from the Avon Longitudinal Study of Parents and Children (ALSPAC) to answer research questions including the following:
1. What psychosocial factors explain the link between pubertal development and depression?
2. Are particular psychosocial factors more important for girls than boys?
3. Can these factors explain the greater rise in depression during puberty in girls compared with boys?
The project will use a causal mediation approach to examine the extent to which psychosocial factors mediate the link between pubertal development and depression. Causal mediation methods provide a powerful set of techniques for understanding causal pathways between an exposure and outcome. These methods permit the examination of continuous and binary/categorical mediators and outcomes; multiple mediators; interactions between exposures and mediators or between mediators, and intermediate confounding of the mediator-outcome relationship.
Hyde JS1, Mezulis AH, Abramson LY. Psychol Rev. 2008;115(2):291-313.
Sequeira ME, Lewis SJ, Bonilla C, Smith GD, Joinson C. Br J Psychiatry. 2017;210(1):39-46.
VanderWeele TJ. Annual Review of Public Health. 2015;37:17-32.
Wang H, Lin SL, Leung GM, Schooling CM. Pediatrics. 2016;137(6). pii: e20153231.
Jie Zheng (lead), Professor Tom Gaunt, Dr Robert Scott
Drug target identification and validation is one of the key research questions in the drug development process. The randomised controlled trial (RCT) is the gold standard method to infer causal relationship between drug targets and inductions. But conventional RCT is costly and time consuming. Mendelian randomization (MR) has been described as a naturally occurring RCT, in which randomization in an RCT is analogous to assignment of genetic variants in a MR study. Recently, large-scale omics provided a timely opportunity to systematically validate the drug targets in a hypothesis free manner. Previously, we showed that drug target / proteins with MR evidence are considerably more likely to predict drug trial success, which open up an area of applying MR in omics (e.g. proteome and transcriptome). Now, it is a good timing to better design the MR studies to validate the right drug target in the right tissue to prevent progression of disease.
1) Identify tissue specific causal relationship between proteins and human diseases
2) Build causal network between gene expressions and proteins and further linking the network with disease incidence and treatment (i.e. slow or prevent the progression of disease) in the right tissue.
2) Evaluate adverse effects and repurposing opportunities for approved drugs and drug trials under development using MR.
3) Use MR and machine learning approach to predict drug trial effect size.
This PhD will involve developing a causal network between multiple layers of molecular phenotypes in different tissues and further underlying their biological mechanisms in disease. For example, integrating tissue-specific proteomics datasets with progression of diseases will inform the target for disease treatment as well as detect biomarkers for early disease prognosis and prevention.
We will then integrate the MR and drug trial evidence to evaluate adverse effects and repositioning opportunities for drugs and targets. Moreover, we will use MR to predict drug trial effect size, in which nature language processing approach will be applied to systematically search the drug trials from literatures and public databases.
This project will be predominantly statistical and computational in nature. Analyses will be conducted using R, Python and some bespoke software, operated in a Linux system. Therefore, some coding experiences will be essential.
Furthermore, this project will directly be linked with therapeutic intervention, for example, there is potential for engaging with industry, e.g. GSK, to validate and translate findings.
J Zheng, V Haberland, D Baird, et al. Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases. BioRxiv, 627398
G Hemani, J Zheng, B Elsworth, et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife 7, e34408
Dr Abigail Fraser (lead), Prof Deborah Lawlor,
The placenta is the organ that provides all essential life supporting functions for a growing fetus, including nutrient and gas exchange, excretion product removal, hormone exchange and protection from maternal host infections and immune reactions. Pre-eclampsia, preterm delivery and fetal growth restriction are all characterised by placental abnormalities and are also associated with increased risk of cardiovascular disease in both mothers and their offspring. Yet clinical and epidemiologic placental research to date is scant and has been hindered by modest sample sizes, the lack of pathology data from ‘healthy’ pregnancies, potential assessor bias due to unblinding of pathologists to clinical diagnoses and unavailability of follow-up beyond delivery.
To investigate the relationship(s) of placental weight, dimensions and placental cotyledons with maternal and offspring short and long term cardiometabolic health.
The student will be supported and encouraged to tailor this broad aim to their own interests and to refine their research questions, e.g. focus on pregnancy outcomes, long term outcomes, maternal health, offspring health and/or expand to other health outcomes such as neurocognitive development.
The project will use existing data from the Avon Longitudinal Study of Parents and Children (ALSPAC) in the first instance. These include placental data of a subset of placentas collected in the early 1990s that are linked to detailed and repeatedly assessed measures of cardiometabolic health in both mothers and their offspring, who are now nearly 30 years of age.(1, 2) Existing data from the MOMI database will also be available for analyses,(3) as will newly generated histologic data on the ALSPAC placentas that is currently being generated.
1. Barker D, Osmond C, Grant S, Thornburg KL, Cooper C, Ring S, et al. Maternal cotyledons at birth predict blood pressure in childhood. Placenta. 2013;34(8):672-5.
2. Holroyd CR, Osmond C, Barker DJ, Ring SM, Lawlor DA, Tobias JH, et al. Placental Size Is Associated Differentially With Postnatal Bone Size and Density. Journal of bone and mineral research : the official journal of the American Society for Bone and Mineral Research. 2016;31(10):1855-64.
3. Catov JM, Scifres CM, Caritis SN, Bertolet M, Larkin J, Parks WT. Neonatal outcomes following preterm birth classified according to placental features. American journal of obstetrics and gynecology. 2017;216(4):411.e1-.e14.
Dr Matthew Suderman (lead), Professor Caroline Relton, Professor George Davey Smith Professor Jonathan Mill
Medicine has benefited from the use of molecular biomarkers for estimating health risks, diagnosing disease, selecting treatments and predicting treatment response. Omic technologies make possible the comprehensive exploration of available biomarkers by quantifying a large proportion of the molecules that make up human cells. Unfortunately, omic profiles tend to be expensive and are typically generated from peripheral tissues such as blood or saliva that may not be directly relevant to many health-related questions.
There is evidence, however, that methylome profiles of peripheral tissues do capture much useful biological variation, including variation found in other -omes (e.g. proteomes, transcriptomes, metabolomes) in a wide range of tissue types. We will use the small but growing collection of datasets that include multi-tissue, multi-omic profiles from the same individuals to derive models of this variation in methylomes of peripheral tissues that can then be used to discover novel biomarkers of disease risks and outcomes.
To use multi-omic, multi-tissue datasets to generate DNAm models of biological variation that can be used to generate improved biomarkers in peripheral tissues of mental illness, cardiometabolic health and cancer risk.
1. Assemble datasets containing multiple matched omic profiles and matched methylomic profiles in multiple tissues. The supervisory team has access to a variety of such datasets, e.g. from ALSPAC, BiB and HUNT cohorts.
2. Use machine learning methodologies to construct methylome models of biological variation in the proteomes, metabolomes and transcriptomes of peripheral tissues and in methylomes of non-peripheral tissues.
3. Apply these models in methylomes of peripheral tissues and apply machine learning methodologies in the output to construct biomarkers of mental illness, cardiometabolic health and cancer risk.
4. Evaluate the performances of the resulting biomarkers to determine where they may provide independent contributions to existing clinical biomarkers.
Braun (2019). Transl Psychiatry 9(1): 47.
Hannon (2015). Epigenetics 10(11): 1024-1032.
Huang (2016). Epigenetics 11(3): 227-236.
Lu (2019). Aging (Albany NY) 11(2): 303-327.
Slieker (2013). Epigenetics Chromatin 6(1): 26.
Smith (2015). Am J Med Genet B Neuropsychiatr Genet 168B(1): 36-44.
Dr Emma Vincent (lead), Dr Borko Amulic, Dr Caroline Bull Dr Ruth Mitchell
Neutrophils are the most abundant white blood cells and they are essential for the inflammatory response to pathogens. Dysregulated neutrophil responses, however can lead to pathological inflammation and this has been proposed to drive pathogenesis of a large variety of both infectious and non-infectious diseases. For instance, in mouse models of cancer neutrophils can both suppress anti-tumour immunity and directly lead to re-awakening of dormant cancer cells leading to metastasis. Altered circulating neutrophil counts have been suggested as prognostic markers in cancer, cardiovascular disease, diabetes, malaria and viral infections. Neutrophils have also been proposed as therapeutic targets in these diseases. It remains unclear, however, whether this is a causal relationship and which specific neutrophil responses contribute to the underlying pathogenic mechanism in these disorders.
The overall aim of this PhD will be to understand how changes to neutrophil biology influence disease risk.
The project has 3 main aims:
1) To use a hypothesis free approach to identify diseases causally associated with alterations in neutrophil numbers.
2) Determine which neutrophil inflammatory mediators (cytokines and other secreted factors) influence disease risk.
3) Identify any neutrophil specific drivers of disease.
We will use methods in genetic epidemiology and in particular Mendelian randomization to determine which diseases are driven by changes in circulating neutrophil numbers. We will also use Mendelian randomization to evaluate the causal associations between the levels of cytokines (produced by neutrophils) and the diseases observationally associated with neutrophil biology. We will then undertake multiple-trait colocalization (moloc) analyses in order to identify neutrophil specific drivers of disease. Laboratory based methods will be used to investigate the mechanisms underlying the associations. The project is flexible and there will be opportunity for the applicant to conduct laboratory work if they wish to do so, equally a laboratory component is not compulsory if the applicant does not wish to pursue it.
The work will be carried out at the MRC Integrative Epidemiology Unit and in the School of Cellular and Molecular Medicine at the University of Bristol.
Amulic B et al. Annu Rev Immunol. 2012;30:459-89.
Coffelt SB et al. Nature Reviews Cancer 16, 431 (2016).
McGowan et al. Human Molecular Genetics. 2019. https://doi.org/10.1093/hmg/ddz155
Smith GD and Ebrahim S. Human Genetics. 2008. https://doi.org/10.1007/s00439-007-0448-6
Dr Santi Rodriguez (lead), Dr Daniel Lawson, Professor Tom Gaunt Dr Laurence Howe
Mendelian Randomization (MR) has proven to be a powerful tool to determine whether there is a causal relationship between an exposure (e.g. a risk factor) and an outcome (e.g. a disease) using genetic variants as causal “anchors” . MR-Base (www.mrbase.org) is an existing, very powerful, well-established and widely used platform for causal analyses  developed in the IEU. It enables MR analyses under a 2-sample MR framework . This approach uses two different population samples: one to estimate the genotype-exposure association and the other to estimate the genotype-outcome association. So far, most 2-sample MR analyses have been derived from GWAS conducted in Europeans. This assumes population homogeneity, i.e. that the effect of population variation is negligible. On the other hand, ethnic differences in disease risk and prevalence, including cardiovascular disease, have been reported . We have recently presented evidence showing how population stratification is important in the design and interpretation of post-GWAS genetic epidemiological analyses, including MR studies . This evidence suggests that consideration of population genetic variation can improve design and interpretation of MR studies potentially leading to improvement of causal analysis of human disease traits.
We aim to test whether combining Mendelian Randomization and Population Genetics can improve causal analyses of human disease traits
Objective 1.- To investigate the impact of population structure on the design of 2-sample MR studies using experimental data from large epidemiological cohorts
Objective 2.- To develop and trial methods for accounting for these impacts
Objective 3.- To implement MR-Base with a new module (MR-Base Global)
Objective 4.- To apply MR-Base Global in the causal inference of human disease traits
The following study designs and methodologies will be used in order to accomplish each objective:
Obj 1.- A number of approaches will be used to characterise population genetic variation, including homogeneity 2 tests, phylogenetic analyses, Principal Component Analyses (PCAs), and diversity indices such as FST, GST, and Nei's distance index . We will also explore the potential use of other genetic markers (in addition to single SNPs) as indicators of population structure, covariates or instrumental variables in MR studies.
Obj 2.- Based on results obtained from Obj 1, we will develop methods to quantify, correct for, and exploit both within- and between-population genetic variation, as identified by analyses conducted on primary data, on the interpretation of 2-sample MR using summary stats derived from those same data. We will analyse what additional information and/or methodology may be required to work with summary level data to capture the information included in primary datasets about population genetic features of MR instruments.
Obj 3.- Bioinformatic approaches will be used to implement the MR-Base database, its web interface and the R package in order to integrate population genetics and MR information to offer an approach to interpret causality while taking account of population structure.
Obj 4.- The performance of MR-Base Global will be tested by analysing causality of risk factors on disease outcomes individually in different populations. We will also use trans-population 2-sample MR (including ancestries known to be discordant to diseases such as cardiovascular disease). We will prioritise our analyses to two different scenarios: (1) traits showing strong evidence for causation and (2) known variants for phenotypic effects that are used in MR but have considerable population stratification (such as ADH1B, ALDH2, and lactase persistence, as test cases).
1. Davey-Smith, G. and S. Ebrahim, 'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? Int.J Epidemiol, 2003. 32(1): p. 1-22.
2. Hemani, G., et al., The MR-Base platform supports systematic causal inference across the human phenome. Elife, 2018. 7.
3. Pierce, B.L. and S. Burgess, Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators. Am J Epidemiol, 2013. 178(7): p. 1177-84.
4. Gazzola, K., L. Reeskamp, and B.J. van den Born, Ethnicity, lipids and cardiovascular disease. Curr Opin Lipidol, 2017. 28(3): p. 225-230.
5. Lawson, D.J., et al., Is population structure in the genetic biobank era irrelevant, a challenge, or an opportunity? Hum Genet, 2019.
6. Meirmans, P.G. and P.W. Hedrick, Assessing population structure: F(ST) and related measures. Mol Ecol Resour, 2011. 11(1): p. 5-18.
Dr Tom Richardson (lead), Professor George Davey Smith, Dr Eleanor Sanderson
An individual’s risk of complex disease is typically determined by a combination of many risk factors and exposures, along with their inherited genetic predisposition. However, the majority of health research investigates the impact of individual risk factors in isolation. This project will involve applying techniques in causal inference, such as multivariable Mendelian randomization, to evaluate whether multiple factors confer disease risk independently or along the same causal pathway.
We recently applied this approach to demonstrate that HDL cholesterol is not protective against coronary heart disease risk when accounting for other lipid-related traits such as apolipoprotein B1. Furthermore, we have shown that childhood obesity increases later life risk of coronary heart disease and type 2 diabetes, but only via a causal pathway that includes adulthood body mass index2. Findings such as this, along with those which will be investigated during this project, have important implications for social policy in terms of how we can improve disease prevention.
- Along with the help of your supervisors, develop a catalogue of risk factors which influence the same disease outcomes (e.g. cardiovascular diseases, psychiatric disorders and other others that the student is interested in).
- Evaluate how multiple factors associated with the same outcome influence disease risk using multivariable and network Mendelian randomization.
- Where appropriate assess the proportion of disease risk mediated from one risk by another along the same causal pathway.
- Undertake extension analyses to evaluate which genes indirectly confer disease risk due to their influence on risk factors assessed.
The PhD student will develop skills in how to appropriately handle and analyse large-scale datasets using cutting-edge techniques in causal inference. This will include datasets from the UK Biobank study (a cohort of ~half a million adults) and the Avon Longitudinal Study of Parents and Children (ALSPAC), along with summary-level data from large-scale consortia. Along with the help of the supervisory team, the student will devise specific hypotheses that they are interested in pursuing.
We will use various approaches which use germline genetic variants to infer causal relationships between multiple risk factors and disease outcomes. This will include (but not limited to) multivariable Mendelian randomization (MR)3, network MR4 and factorial MR5. Analyses will also be extended to investigate the influence of multiple molecular biomarkers on disease risk. For example, this will allow us to infer whether a gene’s expression influences disease risk due to its direct impact on a risk factor. Furthermore, it will allow us to investigate whether networks of genes influence disease risk via the same biological pathway. This will have meaningful translatable benefit drug validation purposes.
Professor Marcus Munafo (lead), Dr Hannah Sallis,
Patients with psychosis are around 2.7 times more likely to have been exposed to childhood trauma1. Given that exposure to childhood trauma does not always lead to psychotic symptoms, this relationship could be mediated by other factors. Identifying modifiable mediating factors of the trauma-psychosis pathway could inform treatments and prevention strategies2. Patients with psychosis and a history of exposure to trauma show less resilience to daily life stresses3, this could be a potential mediating psychological mechanism4.
Previous work has identified an association between genetic liability to schizophrenia and the likelihood of experiencing childhood trauma5. This project will expand on this to investigate whether genetic liability is also associated with mediating factors (such as resilience) in the trauma-psychosis pathway, and whether these mediating pathways are specific to certain types of childhood trauma.
This project will:
1. Investigate the relationship between childhood resilience to daily life stresses and severity of psychotic symptoms in early adulthood.
2. Investigate the effect of childhood resilience on the relationship between traumatic experiences and likelihood of developing psychosis and severity of symptoms.
3. Explore the effect of genetic confounding by estimating the association between genetic liability for schizophrenia on childhood resilience
• Literature review of the current research into the relationship between resilience, trauma and psychosis.
• Use data from cohort studies such as the Avon Longitudinal Study of Parents and Children to analyse the effect of prospective measures of psychological resilience to daily life stresses in early childhood on likelihood and severity of psychotic symptoms in early adulthood.
• Use prospective measures of psychological resilience, prospective and retrospective measures of exposure to a range of types of trauma and contemporaneous measures of psychotic experiences to estimate the impact of early life psychological resilience on the relationship between childhood trauma and psychotic symptoms in early adulthood.
• Derive polygenic scores to investigate the association between genetic liability for schizophrenia on childhood resilience.
1. Varese, F., et al. (2012). Schizophr Bull, 38(4), 661-671
2. Croft, J., et al. (2019). JAMA Psychiatry. 76(1):79-86
3. Ered, A., et al. (2017). Eur Psychiatry, 43, 9-13
4. Krabbendam, L., et al. (2014). Schizophr Bull, 40(2), 248-251
5. Sallis, H., et al. (2019). medRxiv
Prof Tom Gaunt (lead), Dr Colin Campbell,
Genomics England have sequenced over 100,000 genomes of patients in the NHS, with plans to expand this to millions of patients over the next few years. This ambitious strategy is creating a world-leading genomics dataset that offers excellent opportunities for the next generation of data scientists. Analysing the data to create new insights into human disease, develop new diagnostic tests and identify intervention targets has the potential to have a very real impact on human health.
We have previously developed a range of predictors that enable us to predict the functional effect of both germline genetic mutations (FATHMM) and somatic mutations (CScape). These machine learning tools use a range of genomic features to estimate the likelihood that any particular genetic variant is functional, whether it is in gene coding sequence or elsewhere in the genome.
This project will focus on the development and application of these algorithms for clinical genomics in the context of the Genomics England sequencing initiative.
1. Improve predictive performance of the algorithms
2. Incorporate new predictive features
3. Develop approaches to prioritising functional variants across the whole
4. Develop approaches to dealing with haploinsufficiency and compound
5. Engage with the clinical genetic community to inform development of
algorithms and seek routes to implementation
6. Engage with the pharmaceutical industry to identify potential applications in drug target prioritization
Algorithm development: The project will involve the development and modifica- tion of machine learning algorithms to predict the functional effect of genetic variants. You will work with modern machine learning libraries (eg Scikit-Learn, Tensorflow, PyTorch) in conjunction with the Python programming language using the high-performance computing facilities at the University of Bristol.
Data acquisition: You will work with genome sequencing data from the Genomics England 100,000 genomes project. Machine learning algorithms will be trained using data from ClinVar and COSMIC and other database of known functional genetic variants. New predictive features will be acquired from a range of genomics data sources, including the ENCODE and Roadmap Epigenomics projects.
Engagement: you will meet with others working with Genomics England data and with clinical geneticists to discuss the methods you are developing to ensure that they meet the requirements for diagnostics. You will also meet with pharmaceutical companies to discuss your methods and how they might be used to prioritise drug targets.
Prof Tom Gaunt (lead), Dr Colin Campbell,
Over the last decade genome-wide association studies (GWAS) have completely transformed our understanding of genetic epidemiology. Thousands of published research studies have identified hundreds of thousands of associations between genetic variants and human traits. However, the vast majority of research has taken a univariate approach to considering these relationships. We have built a database of thousands of GWAS from published studies and our own GWAS of UK Biobank, enabling us to take a more systematic approach to exploring the interplay between genetic variants and human traits.
Canonical correlation analysis (CCA) is a multivariate method that allows us to find linear combinations of genetic variants and human traits which have maximum correlation with each other. In previous work ([Seone et al, 2014](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4199483/)) we demonstrated the potential of this approach to identify pleiotropy and explore the molecular aetiology of cardiovascular disease. A newer approach (metaCCA; [Cichonska et al, 2016](https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/27153689/)) allows CCA to be performed using only summary-level GWAS results (instead of individual-level data), opening up the opportunity to apply this approach within our extensive database of GWAS.
This project will focus on the application of metaCCA to a wealth of GWAS data to explore pleiotropic relationships and investigate molecular aetiology of complex traits and diseaes. Students interested in methods development will also have the opportunity to explore alternative approaches or improvements to the method.
1. Apply canonical correlation analysis to summary-level GWAS data
2. Develop visualization and interpretation approaches to the results from CCA analysis
3. Validate metaCCA against CCA using individual-level data
4. Develop improved approaches to integrative analysis of GWAS data
The project will utilize metaCCA and CCA to investigate associations between genetic variants and traits. The student will develop programming skills in both R and Python, and will use the high-performance computing facilities at the University of Bristol to carry out computationally-intensive analyses. There will be a core focus on developing new understanding of the molecular aetiology of complex traits, but a student interested in developing further skills in data science will have the opportunity to work on new/improved methods, approaches to data visualization and potentially development of web resources to make results interactive.
Prof Tom Gaunt (lead), Dr Ben Elsworth, Dr Yi Liu
Over 30 million publications are indexed in the PubMed database, representing a wealth of scientific knowledge on human health and disease. However, this largely unstructured data resource is difficult to systematically exploit, leading to potential duplication of effort and missed opportunities. Recent developments in natural language processing (NLP) are transforming our ability to extract knowledge from text using deep neural networks.
In previous work we have implemented a platform for identification of mechanistic intermediates between risk factors and diseases (MELODI; [Elsworth et al, 2018](https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/29342271/)). Our approach attempts to find semantic triples (knowledge about the relationships between two entities, for example a risk factor and disease), and connect these into mechanistic pathways. However, there is substantial scope to expand this work using new methods and data.
This project will focus on implementing new approaches to extracting knowledge from the scientific literature, and integrating that knowledge with evidence derived from Mendelian randomization (MR) analysis of large cohorts such as UK Biobank and pathway data from public pathway databases.
1. Develop and apply new approaches to identifying risk factors, diseases and the relationship between them from biomedical text
2. Integrate newly derived knowledge with information from existing methods and other evidence to validate the approach
3. Develop methods to triangulate evidence from the literature with information from pathway databases and causal analysis approaches
Computational work: you will have the opportunity to develop your data science skills, learning to use Python and NLP toolkits such as NLTK and Google BERT in the high performance computing environment at the University of Bristol. You will integrate knowledge extracted from the literature with other evidence in our [EpiGraphDB](http://www.epigraphdb.org) platform, using Neo4J and developing our custom API to enable you to integrate data and perform analyses.
Data: Full-text scientific literature data will be extracted from EuropePMC for the subset represented there. PubMed will be used to extract data from all other publications indexed (data to include full abstract, title, MeSH keywords and meta-data). Other data from [EpiGraphDB](http://www.epigraphdb.org) will also be used.
Professor Nicholas Timpson (lead), Dr Laura Corbin,
In non-smokers, being overweight is associated with a 51% increase in mortality compared with people who have always been a normal weight (PMID: 26421898). This increase is due to an increased risk of diseases such as cancer, cardiovascular disease and type 2 diabetes, but there is little understanding of the mechanisms driving these effects.
Determining how increased weight leads to an increased risk of disease is difficult. Body mass index is associated with a wide range of confounding factors, making isolating its effect difficult in observational analyses. Getting people to achieve and maintain significant weight change experimentally is difficult. In addition, until recently, we have only been able to look at the effect of weight on a small number of metabolic pathways.
Despite being a potentially valuable source of information, randomised control trial (RCT) data is not routinely incorporated into epidemiological investigations of risk factors relevant to population health. Here, we will utilise metabolomics data from blood samples taken before and after two weight loss interventions within RCTs to give greater insight into how increased weight causes disease. Bariatric surgery produces a weight loss of ~25% of body weight, sustained for up to 15 years (PMID: 23235396). Very low calorie (VLC) diets combined with continued contact have also been shown to produce maintained weight losses of around 10% of body weight (PMID: 30852132). The large-scale analysis of metabolites (‘metabolomics’) using techniques such as mass spectrometry is a developing field with commercial platforms able to deliver semi-quantitative data on >1,000 metabolites from a single sample.
This study provides a unique opportunity for a student to use data from two clinical trials to determine how weight loss effects metabolic pathways in an intervention-specific manner. Students will investigate how metabolic signatures of weight change relate to downstream disease and the role of body mass index as a causal risk factor for disease.
This study aims to:
1. Examine the changes in metabolomic signatures seen with weight loss.
2. Compare the metabolomic signatures of each intervention.
3. Look for metabolomic signatures present at baseline that predict treatment efficacy.
We have access to blood samples from two RCTs. The Diabetes Remission Clinical Trial was designed to determine whether a structured weight management programme can produce type 2 diabetes remission (PMID: 26879684). Metabolite data derived from baseline and 1-year samples is available now. The By-Band-Sleeve study (http://www.by-band-sleeve.bristol.ac.uk/) was designed to assess the efficacy of three bariatric surgery approaches and is the largest trial of its kind to date. Recruitment for this trial is due to finish by 09/2019 with the aim of having a first freeze of data available for analysis shortly after. Population-based metabolomics datasets are available for complementary analyses.
Statistical approaches will fall into two broad categories: (1) Supervised and unsupervised machine learning algorithms applied to the data available within the project (e.g. clustering, self-organising maps); (2) Integration of publicly available data, linking metabolites to relevant disease outcomes and biological pathways (e.g. enrichment and pathway analysis).
Ann Surg. 2013 Jan;257(1):87-94. doi: 10.1097/SLA.0b013e31827b6c02.
Long-term outcomes after bariatric surgery: fifteen-year follow-up of adjustable gastric banding and a systematic review of the bariatric surgical literature.
Lancet Diabetes Endocrinol. 2019 May;7(5):344-355. doi: 10.1016/S2213-8587(19)30068-3. Epub 2019 Mar 6.
Durability of a primary care-led weight-management intervention for remission of type 2 diabetes: 2-year results of the DiRECT open-label, cluster-randomised trial.
BMC Fam Pract. 2016 Feb 16;17:20. doi: 10.1186/s12875-016-0406-2.
The Diabetes Remission Clinical Trial (DiRECT): protocol for a cluster randomised trial.
Rogers CA et al (2014) The By-Band study: gastric bypass or adjustable gastric band surgery to treat morbid obesity: study protocol for a multi-centre randomised controlled trial with an internal pilot phase. Trials, 15(53).
Rogers CA et al (2017) Adaptation of the By-Band randomized clinical trial to By-Band-Sleeve to include a new intervention and maintain relevance of the study to practice. Br J Surg, 104(9): 1207–1214. doi: 10.1002/bjs.10562
Dr Gemma Sharp (lead), Dr Laura Johnson, Dr Clare Llewellyn (Associate Professor of Obesity, University College London)
Vegetarian, vegan and plant-based diets are increasingly popular and promoted as being healthier and more environmentally-friendly. A healthy diet during pregnancy has been associated with better health outcomes for the offspring. However, without careful planning, which might involve supplementation, low/zero animal product diets can be low in nutrients like iron, vitamin B12, vitamin D and calcium. In pregnant women, nutrient intake, circulating levels and metabolism have been associated with various offspring health outcomes, but it is unclear whether these associations are causal or are confounded by other factors (for example, we know that vegetarianism/veganism is strongly socially and culturally patterned). The biological mechanism mediating any causal effect is also unknown.
This project will apply causal inference techniques and 'triangulate' evidence to study associations between maternal vegetarian/vegan/plant-based diets and related nutrient levels during pregnancy and a range of offspring health outcomes throughout childhood. It will also explore the role of DNA methylation and offspring metabolomics as potential mediating mechanisms.
Triangulation of evidence
Causal inference techniques that will be employed include multivariable regression, Mendelian randomization (using genetics as a proxy for maternal diet and/or nutrient levels), paternal negative controls (comparing maternal and paternal estimated effects), sibling studies (which control for familial confounding), children-of-twins studies (which control for genetic confounding between parents and children, amongst other things) and cross-cohort comparisons (comparing estimated effects in populations with different confounding structures).
The project will involve conducting genome-wide association studies (GWAS) to identify genetic instruments of diet and nutrients for use in Mendelian randomization. It will also involve conducting epigenome-wide association studies (EWAS) and metabolome-wide association studies (MWAS) to identify CpG sites and metabolites that might be causal mediators of any associations.
Combining data sources through meta-analysis
The data used for these analyses will come from international cohort studies (e.g. UK Biobank, ALSPAC, Born in Bradford, Generation R, MoBa), capitalising on and maintaining strong existing cross-cohort collaborations in multiple consortia. Summary statistics from cohort-specific analyses will be meta-analysed to increase statistical power.
Piccoli et al. BJOG. 2015. Vegan-vegetarian diets in pregnancy: danger or panacea? A systematic narrative review. https://www.ncbi.nlm.nih.gov/pubmed/25600902
Sebastian et al. Nutrients. 2019. The Effects of Vegetarian and Vegan Diet during Pregnancy on the Health of Mothers and Offspring. https://www.ncbi.nlm.nih.gov/pubmed/30845641
North et al. BJU Int. 2000. A maternal vegetarian diet in pregnancy is associated with hypospadias. The ALSPAC Study Team. Avon Longitudinal Study of Pregnancy and Childhood. https://www.ncbi.nlm.nih.gov/pubmed/10619956
Prof Russ Jago (lead), Prof Kate Tilling, Dr Mark Kelson (Exeter) - M.J.Kelson@exeter.ac.uk
Everybody ages. With our aging population, it is important that we understand how to remain healthy as we get older. One lifestyle component that likely has an important role is physical activity. Little is known about how physical activity is affected by the transition into retirement. This mixed method PhD will use novel approaches to provide evidence about the impact of retirement on physical activity trajectories.
The project consists of three linked studies. The first study will be a systematic review of physical activity and how it changes with retirement to provide a synthesis of the current evidence base. The second study will model the trajectory of physical activity in several cohorts to see how patterns of physical activity change with time after retirement, using novel causal methodology (instrumental variables, regression discontinuity). The final study will use qualitative methods to understand the key factors underlying the different trajectories.
This project has three linked scientific aims.
1) To systematically review the evidence to identify how patterns of physical activity change during the transition from work to retirement.
2) To use a mix of sophisticated analysis techniques to identify how patterns of physical activity change and the key variables that may impact on the change in physical activity.
3) To use qualitative methods to identify strategies that could be implemented to help people to remain active during the transition to retirement.
The project is also designed to facilitate the development of the student via training in a variety of advanced research methods including, evidence synthesis, data science (in terms of processing accelerometer data) and qualitative methods.
The project will consist of 3 studies.
Study 1 will be systematic review of studies that have examined physical activity during the transition to retirement. The review will examine changes in physical activity level, differences in the impact of retirement on physical activity by age of retirement, socio-economic position, sex, pre-retirement occupation and country.
Study 2 will be a secondary analysis of data from cohort to examine how physical activity changes over time with retirement using sophisticated statistical modelling approaches such as multilevel modelling, instrumental variables analyses and regression discontinuity analyses.
Study 3 will use qualitative methods to examine the key personal, work, environment and psycho-social factors that impact on physical activity during the transition to retirement and how strategies could be developed to address any identified barriers to physical activity.
Dr Evie Stergiakouli (lead), Prof Anita Thapar, Prof Kate Tilling
Psychiatric disorders are common, and contribute more to the global burden of morbidity than any other disorder group. Most are thought to have their origins at least in part in childhood when they are preceded by "neurodevelopmental" impairments such as communication, motor, social and attentional difficulties. Genetic factors are known to be strongly important for psychiatric disorders such as ADHD, autism spectrum disorder (ASD) and schizophrenia. To date, as far as we know, no-one has looked at how early genetic impacts begin in the general population (i.e. pre-school age children) and the impacts on lifecourse trajectories. Environmental factors also play a part in the aetiology of complex disorders but these are rarely looked at together with genes. For example, exposures to toxins such as lead are associated with neurodevelopmental impairments and psychiatric disorder but the contribution of any single risk on its own is usually small and it is difficult to prove if they have a causal role.
The aim of the PhD is to examine the impact of composite measures of psychiatric genetic risk, intra-uterine exposures and their interplay on early neurodevelopmental impairments and later mental health trajectories
This will be achieved by applying quantitative genetics methods and Mendelian randomization using existing data available in ALSPAC, the Norwegian Mother, Father and Child Cohort Study (MoBa) and publicly available data. In particular it will involve developing methods to aggregate polygenic risk scores for different disorders and combining the genetic information with available information on non-genetic risk factors. These will then be used to test their impact on early neurodevelopmental traits and mental health trajectories.
Leppert et al. JAMA Psychiatry. 2019;76(8):834-842. doi:10.1001/jamapsychiatry.2019.0774
Riglin et al. JAMA Psychiatry. 2016 Dec 1;73(12):1285-1292. doi: 10.1001/jamapsychiatry.2016.2817.
Thapar Am J Psychiatry. 2018 Oct 1;175(10):943-950. doi: 10.1176/appi.ajp.2018.18040383. Epub 2018 Aug 16.
Rice et al. Dev Psychopathol. 2018 Aug;30(3):1107-1128. doi: 10.1017/S0954579418000421.
Prof Andrew Dowsey (lead), Prof Alastair Hay , Dr Ashley Hammond
Antimicrobial resistance is one of the greatest threats to the future of modern medicine. We know antibiotic use is a key contributor to rising resistance levels; what remains unclear is the impact antibiotic use and subsequent resistance rates are having on secondary care infections. A recent NHS report revealed that hospital admissions for sepsis, a life-threatening bacterial infection, rose by one third from 2017 to 2019. Around 50% of sepsis is caused by Escherichia coli, which is also responsible for most community-acquired urinary tract infections (UTIs). Due to rising antibiotic resistance rates in primary care, UTIs have become increasingly difficult to treat. UTI treatment failures could lead to development of sepsis. We must understand the complex relationship between community antibiotic use and admission rates for secondary care infections and antimicrobial resistance in order to develop more effective antibiotic stewardship policies in both primary and secondary care.
This project will use individual-level patient data from primary care practices regarding previous antibiotic exposure, publicly available data on primary care antibiotic prescribing and hospital admissions for infections, and local laboratory data on antibiotic susceptibilities for secondary care infections to investigate the complex relationships between primary care antibiotics, secondary care infection admissions and antimicrobial resistance.
The first step in this project will be to conduct a systematic review exploring what we already know about the relationship between community antibiotic prescribing, secondary care admissions for infections and antimicrobial resistance. We are already aware of at least three eligible studies, and we expect to identify more.
Associations between primary care antibiotic prescribing and secondary care infection admissions will then be investigated using both individual antibiotic prescription data and publicly available NHS Digital data on antibiotic prescribing in primary care and hospital admissions for infections in England. Data on antibiotic susceptibilities of secondary care infections will be collected directly from local laboratories. Total antibiotic prescribing will be investigated, as well as more specific drug-bug combinations, in order to explore whether certain patterns of prescribing might increase the risk of a secondary care infection.
Dr Tom Richardson (lead), Prof George Davey Smith, Prof Caroline Relton Dr Rebecca Richmond
Approximately 1 in 6 deaths worldwide are attributed to cancer (World Health Organization, 2018). It is therefore imperative that we develop our understanding of what causes different types of cancer to develop in patients, as well as identify biomarkers which can help us improve our capability to prevent and treat it. This project will apply techniques in causal inference and bioinformatics to large-scale, phenotypically rich datasets to better characterize the determinants of cancer. In doing so, the student will undertake leading research in the field of epidemiology, which will result in findings we envisage will have a meaningful clinical impact and help improve healthcare in cancer patients.
- Undertake a systematic review of available datasets concerning known and potentially novel risk factors for cancer outcomes, as well as those regarding progression (e.g. recurrence, metastases, mortality).
- Undertake genome-wide association studies (GWAS) to identify novel genetic variants related to identified risk factors.
- Evaluate causal relationships between modifiable risk factors and cancer risk and progression using an approach known as Mendelian randomization (MR).
- Dissect the causal pathways using mediation analysis for identified risk factors to discern whether they contribute independently or collectively towards cancer risk and progression.
- Harness molecular datasets (e.g. DNA methylation, gene expression) to highlight biomarkers which may be valuable in prioritising treatments and preventing disease progression in cancer patients.
This PhD project will involve undertaking applied analyses in causal inference to examine the relationship between risk factors and cancer risk and progression. Specifically, we will use an approach known as Mendelian randomization (MR), which leverages genetic information to evaluate evidence of causality. Furthermore, there will be an emphasis in understanding how multiple risk factors contribute to cancer risk i.e. independently of one and other or collectively along the same ‘causal pathway’. For this endeavour we will use various techniques in the field of MR, such as multivariable and network MR.
The other arm of the project will involve identifying novel molecular processes which may be important as diagnostic markers to help improve cancer prevention. The student will develop a skillset to harness large-scale molecular datasets (concerning gene expression and epigenetic traits such as DNA methylation) and undertake analyses to discern whether markers throughout the transcriptome and epigenomes play a mechanistic role in cancer risk and progression. We will use datasets from the cancer genome atlas program, as well as tissue- and cell-type specific datasets for this task, such as those available from the GTEx project. These analyses will also likely yield insight into promising therapeutic targets, as well as evaluating potential adverse side-effects of drugs. We envisage that this exciting project will allow the student to develop an exceptional research skillset which will provide a platform for a successful future career.
Dr Tom Richardson (lead), Prof Tom Gaunt,
Over 90% of drug compounds that enter clinic trials fail due to a lack of safety or efficacy (Plenge et al (2013)). The support of evidence from human genetic studies has been shown to drastically improve the success rates of drug targets (Nelson et al (2015)), which has led to huge interest and investment from pharmaceutical companies in this research area. This project will involve developing a suite of innovative biomedical approaches to elucidate and validate novel therapeutic targets using high dimensional genetic and molecular datasets. We envisage that this will lead to groundbreaking research and allow the student to build a strong skillset in bioinformatics and statistics.
- Use the EpiGraphDB platform being developed in the department (http://www.epigraphdb.org/) to develop hypotheses of target genes and disease outcomes to be studied.
- Collate various genetic and molecular datasets which can be used in this project. In particular, we will focus on tissue- and cell-type relevant datasets for a given outcome e.g. brain-derived gene expression datasets when evaluating psychiatric or cognitive disorders.
- Undertake systematic analysis to evaluate the causal relationship between many potential drug targets and disease outcomes.
- Dissect the causal pathways for target genes using mediation analysis to discern whether they contribute independently or collectively towards disease risk.
- Validate promising targets to discern whether they may result in adverse side effects.
This project will involve developing various computational pipelines to help validate drug targets for therapeutic intervention. During the PhD you will learn how to undertake large-scale analyses to validate millions of gene-disease combinations and build web resources to disseminate your findings. For example, we recently developed an application which allows users to investigate gene-based associations across different tissue types (found at http://mrcieu.mrsoftware.org/Tissue_MR_atlas/) (Richardson et al (2019)).
There will also be an emphasis on undertaking network and mediation analyses to discern how potential drug targets may interact, as well as examining whether novel targets are likely to yield a further reduction to disease risk on top of current pharmaceutical options (e.g. statins and coronary heart disease). Analysis pipelines will also be constructed to assess whether targeting a given gene may results in adverse side effects, by evaluating its association with hundreds of diverse traits (often referred to as a ‘phenome-wide analysis’). This project will also offer the opportunity to collaborate with colleagues from the pharmaceutical industry to gain valuable insight into drug validation as well as experience in an alternative research setting. Furthermore, we believe that it will provide an exceptional platform for a successful future career in biomedical research.
Dr Tom Richardson (lead), Prof Tom Gaunt,
Cardiovascular disease (CVD) poses one of the greatest threats to public health worldwide, accounting for more deaths than any other cause (World Health Organization, 2012). It is therefore imperative that we develop our understanding of what causes CVD, as well as identify intervention targets which can help us improve our capability to prevent and treat it. This project will apply techniques in causal inference and bioinformatics to large-scale, phenotypically rich datasets to better characterize the determinants of CVD. In doing so, the student will undertake leading research in the field of epidemiology, which will result in findings we envisage will have a meaningful clinical impact and help improve healthcare in CVD patients.
- Undertake a systematic review of available datasets concerning known and potentially novel risk factors for CVD. To assist with this task we will use the EpiGraphDB platform being developed in the department (http://www.epigraphdb.org/).
- Evaluate causal relationships between modifiable risk factors and CVD outcomes using an approach known as Mendelian randomization (MR).
- Dissect the causal pathways using mediation analysis for identified risk factors to discern whether they contribute independently or collectively towards CVD risk.
- Harness molecular datasets (e.g. epigenetic markers such as DNA methylation, gene expression & circulating proteins) to highlight biomarkers and intervention targets which may be valuable in prioritising treatments and preventing disease progression in CVD patients.
This PhD project will involve undertaking applied analyses in causal inference to examine the relationship between risk factors and CVD outcomes. Specifically, we will use an approach known as Mendelian randomization (MR), which leverages genetic information to evaluate evidence of causality. Furthermore, there will be an emphasis in understanding how multiple risk factors contribute to CVD risk i.e. independently of one another or collectively along the same ‘causal pathway’. For this endeavour we will use various techniques in the field of MR, such as multivariable and network MR.
The other arm of the project will involve identifying novel molecular processes which may be important as diagnostic markers to help improve CVD prevention. The student will develop a skillset to harness large-scale molecular datasets (e.g. gene expression and epigenetic traits such as DNA methylation) and undertake analyses to discern whether markers throughout the transcriptome and epigenome play a mechanistic role in CVD risk. We will use various tissue- and cell-type specific datasets for this task, such as those available from the GTEx project. This will allow us to investigate the mechanisms which result in conferred disease risk, for example examining the role of a gene’s expression in relevant tissue types (e.g. adipose, artery and heart tissues) These analyses will also likely yield insight into promising therapeutic targets, as well as evaluating potential adverse side-effects of drugs. We envisage that this exciting project will allow the student to develop an exceptional research skillset which will provide a platform for a successful future career.
Dr Luisa Zuccolo (lead), Dr Laura Johnson,
The World Health Organisation recommends exclusively breastfeeding for 6 months, but in the UK, only 1% of infants meet the guidelines (1). Supporting mothers to maintain breastfeeding is a key priority for promoting child health. Breastfeeding for longer is associated with a reduced childhood obesity risk in observational studies (2), but breastfeeding is socially patterned meaning associations may be confounded. Rapid infant growth is associated with a higher risk of later childhood obesity (3) and could drive associations of breastfeeding with later obesity via reverse causation. The maintenance of breastfeeding beyond 4 months is associated with a delayed peak in the rate of weight gain and an overall slower rate of growth (4). Additionally the most common reason mother’s report for stopping breastfeeding early is perceived milk insufficiency (1). Thus a shorter breastfeeding duration may cause rapid growth or a faster growth rate may cause breastfeeding to stop.
This project aims to disentangle the relationship between infant feeding and infant growth and childhood obesity. This evidence will contribute to developing decision support systems to help mothers with infant feeding choices. Specific aims are to:
1. Establish whether and to what extent rapid infant growth affects infant feeding patterns, in particular shorter breastfeeding duration
2. Establish whether and to what extent longer breastfeeding duration protects against excessive weight gain in infancy
3. Investigate the persistence of these relationships with respect to childhood obesity
This project will use several methods developed and/or expertly applied within the MRC IEU, including Mendelian randomization, longitudinal modelling of childhood growth, and triangulation of epidemiological evidence.
Mendelian randomization is a study design that mimics a randomised experiment, by comparing groups of individuals randomly allocated a genetic predisposition for a trait or behaviour. Randomisation minimises confounding and reverse causation, both of which could explain the observed associations between breastfeeding and infant growth.
This project will harmonise phenotypic data on infant feeding across multiple cohorts and combine with data on maternal/offspring genetics and infant growth outcomes. The Early Growth Genetics (EGG) consortium, a collaboration of birth cohorts with genotype data combined with phenotypes including infant feeding, growth and later childhood obesity (5), will contribute data to this project, and so will several large biobanks internationally including the Norwegian HUNT study, the China Kadoorie Biobank and the UK Biobank.
Analyses will include:
1) Comparing the infant feeding patterns of children with different genetic predispositions to growing faster or having overweight/obesity;
2) Comparing infant growth or obesity in children genetically predisposed to being breastfed for longer, to those whose genetic makeup predicts shorter breastfeeding duration;
3) Cross-cohort comparisons utilising cohorts in diverse settings e.g. where the confounding of infant feeding by social class differs.
4) Formal triangulation of the evidence produced through 1-3 with previous evidence from Randomised Controlled Trials of breastfeeding promotion interventions.
1. DoH. Infant Feeding Survey 2010; http://www.ic.nhs.uk/catalogue/PUB00648/infaseed-serv-2010-earl-resu-rep.pdf
2. Yan J et al. BMC Public Health. 2014;14(1):1267
3. Zheng M et al. Obesity Reviews. 2018;19(3):321-32
4. Johnson L et al. Int J Obesity (2005). 2014;38(7):980-7
5. Middeldorp CM et al. Eur J Epidemiol. 2019;34(3):279-300
Neil Davies (lead), Laurence Howe, George Davey Smith
Family based studies are becoming widely used to evaluate early-life environmental influences on complex traits (e.g. height, adiposity). For example, a recent study using parent-offspring data found genetic evidence of parental nurture effects on their offspring for traits including educational attainment and age at first birth .
Half-siblings provide a highly tractable tool for investigating the effects of the family environment . Although maternal and paternal half-sibling pairs are expected to share the same degree of autosomal genetic relatedness (kinship = 0.25), systematic group-level differences in the family environment are expected because maternal half-siblings are usually raised together but paternal half-siblings are usually raised apart.
In this PhD project, you will work with data on half-siblings from cohorts and family registries from the UK, Sweden, Norway and Finland (e.g. UK Biobank, HUNT). In some datasets the half-siblings are known, in others they will have to be identified using genetic data. Half-siblings will be used to evaluate effects of the family environment, such as genetic nurture effects, on complex traits. A wide range of complex traits could be potentially studied, including mental and physical health, cognitive ability, educational attainment and other social outcomes.
1) Use genetic data (autosomes, sex chromosomes and mitochondrial DNA) to identify half-siblings (paternal / maternal) in large biobanks.
2) Evaluate phenotypic similarities between paternal and maternal half-siblings to estimate effects of the family-environment on complex traits. This could be extended to comparisons with full siblings.
3) Extend family-based Mendelian randomization methods (see , ) to half-siblings to estimate genetic nurture effects.
The project will use a wide range of epidemiological and genetic techniques. The student will gain expertise in the following areas:
Genetic Epidemiology (e.g. genome-wide association studies)
Quantitative genetics (e.g. heritability, environmental variance)
Within-family epidemiological analyses
Additionally, the student will be trained to use data from large studies such as the UK Biobank and HUNT.
1) Kong et al “The nature of nurture: Effects of parental genotypes” 2018 Science 10.1126/science.aan6877
2) Kendler et al 2015 “Family environment and the malleability of cognitive ability: A Swedish national home-reared and adopted-away cosibling control study” PNAS 10.1073/pnas.1417106112
3) Davies et al 2019 “Within family Mendelian randomization studies” Human Molecular Genetics https://doi.org/10.1093/hmg/ddz204
4) Brumpton et al 2019 “Within-family studies for Mendelian randomization: avoiding dynastic, assortative mating, and population stratification biases” bioRxiv https://doi.org/10.1101/602516
Dr Rebecca Pearson (lead), Dr Hannah Sallis, Prof. Marcus Munafo
There are known links between parenting and mental health outcomes in children, however, the extent to which this may reflect shared genetics (genetic variance that is passed directly from mother to child), genetic nurture effects (influence of mothers non-transmitted genetics on manifested parenting) and child evocative genetic effects. This is key to understanding whether parenting is an effective intervention targeting.
This PhD proposes to utilise unique data and methods to address these questions
1. Explore the association between parenting and mental health/education outcomes across cohorts (including cross-cultural comparisons)
2. Separate out shared genetic, mother genetic nurture and child genetic effects on parenting factors
3. Investigate the influence of mother genetic liability for mental health problems, education and personality on parenting factors
4. Explore the influence of child genetic effects on parenting factors and maternal mental health
1. Observational associations between parenting factors and mental health/educational outcomes will be estimated in cross-cultural cohorts (ALSPAC, MCS, South African datasets). We will also investigate whether parenting mediates association between maternal characteristics (mental health, personality and education) and child outcomes. In ALSPAC-G2, we can explore rich behavioural micro-coding of parental behaviour to look at specific behaviours associated with maternal mental health.
2. M-GCTA will be applied to the parenting phenotypes in order to estimate variance explained by mother and child effects
3. The association between transmitted and untransmitted polygenic scores for a range of phenotypes (MDD, neuroticism, anxiety, education and IQ) with parenting and mental health will be investigated.
4. Child polygenic scores associated with parenting outcomes and mental health. Using genetic similarity scores using mother and child and father and child genetic data. Family designs
Deborah A Lawlor (lead), Tom Richardson, Simon Satchell Victoria Bills
The glycocalyx is a gel-like layer covering the inside surface of all blood vessels. It is essential for vascular integrity and cardiovascular homeostasis.(1, 2) Laboratory studies suggest damage to the glycocalyx increases risk of pregnancy and cardiovascular disorders and could be a valuable target for disease prevention and treatment .(1, 2) Large-scale epidemiological studies using causal methods are lacking.(1, 2) There are two ways of measuring the integrity of the glycocalyx: (i) plasma/serum measures of its constituents (e.g. heparin sulphate proteoglycans, hyaluronic acid and syndecan 1 (SND1) and (ii) microscopic measurement of the Perfused Boundary Region (PBR) of capillaries under the tongue. The PRB is a measure of the extent to which red blood cells have been able to infiltrate the glycocalyx.(3-5) Higher levels of glycocalyx products and of PRB indicate higher levels of damage to the glycocalyx.(4) These have been used in epidemiological studies (3, 5-7) but there is a need for prospective studies and estimates of causal effects
The aim of this project is to increase understanding of the causes of damage to the glycocalyx and its role in causing adverse pregnancy and cardiovascular outcomes. There is the potential for two or more PhDs in this area. Students could focus on pregnancy related outcomes, and the role of glycocalyx in placental blood vessels, study the role of the glycocalyx in early life development of cardiovascular risk factors, its role in adult cardiovascular disease onset and progression, or the determinants of glycocalyx damage
A multidisciplinary approach, integrating (triangulating(8)) evidence from population health and laboratory approaches will be used. Methods will include (i) systematically identifying and reviewing literature (including of genetic determinants of variating in glycocalyx damage); (ii) undertaking new genome-wide association studies; (iii) exploring bioinformatic resources, including using tools such as EpiGraphDB, and (iv) multivariable regression, negative control and Mendelian randomization analyses of cohort studies including ALSPAC(9-11), BiB(12) and large collaborations of cohorts will be used. Students will have the opportunity to collect new data on the PRB in cohorts and also undertake complementary laboratory analyses of placental tissue, including, labelling, imaging and image analysis of tissue sections and extraction of proteins and RNA for proteomics and transcriptomics.
1. Salmon AH, . Endothelial glycocalyx dysfunction in disease: albuminuria and increased microvascular permeability. J Pathol. 2012;226:562-74.
2. Yilmaz O, . The role of endothelial glycocalyx in health and disease. Clin Kidney J. 2019;12:611-9.
3. Valerio L, et al. Sublingual endothelial glycocalyx and atherosclerosis. PLoS One. 2019;14:e0213097.
4. Dane MJ, . A microscopic view on the renal endothelial glycocalyx. Am J Physiol Renal Physiol. 2015;308:F956-
5. Weissgerber TL . Early Onset Preeclampsia Is Associated With Glycocalyx Degradation and Reduced Microvascular Perfusion. J Am Heart Assoc. 2019;8:e010647.
6. Dogne S . Endothelial Glycocalyx as a Shield Against Diabetic Vascular Complications: Involvement of Hyaluronan and Hyaluronidases. Arterioscler Thromb Vasc Biol. 2018;38:1427-39.
7. Long DS, . Serum levels of endothelial glycocalyx constituents in women at 20 weeks' gestation who later develop gestational diabetes mellitus compared to matched controls: a pilot study. BMJ open. 2016;6:e011244.
8. Lawlor DA, . Triangulation in aetiological epidemiology. Int J Epidemiol. 2017.
9. Boyd A, l. Cohort Profile: ALSPAC. IJE. 2013;42:111-27.
10. Fraser A, Cohort Profile: ALSPAC mothers. IJE. 2013;42:97-110.
11. Lawlor DA, . Cohort Profile ALSPAC-G2. Wellcome Open Research. 2019;4:36 (https://doi.org/10.12688/wellcomeopenres.15087.1)
12. Wright J, . Cohort profile: BiB. IJE 2013;42:978-91.
Dr Jie Zheng (lead), Professor Tom Gaunt, Professor Hong Zhang Dr Yuemiao Zhang
Chronic kidney disease (CKD) is a long-term condition that affects 10-15% of the population, particularly those in older age. The lack of CKD intervention creates a huge social-economic burden. It is timely to promote a shift from "treatment in later stage" to "prevention in early stage”.
The overall aim of our project is to improve the prevention and treatment of chronic kidney disease (CKD) by integrating clinical, molecular and genetic evidence across multiple populations. The project will build on an existing collaboration between the University of Bristol, Peking University First Hospital, which is the best hospital specialised for kidney disease treatment in China as well as HUNT - The Nord-Trøndelag Health Study.
Previously genetic studies have presented the associations between genetic variants and CKD development based on case control designs, which we have collected samples for more than 1 million individuals from different populations, including Europeans, East Asians, Africans and Hispania. However, only few studies have focused on understanding the genetic variants influencing the progression of CKD (as measured by estimated glomerular filtration rate (eGFR) change over time). In this project, the student will have an opportunity to understand the causal risk factors influencing both CKD development and CKD progression, which the later will directly link with lifestyle and drug interventions for CKD treatment.
1. Identify the causal effects of risk factors on the development and progression of CKD in multiple populations using genetic data.
CKD is influenced by both environmental and genetic factors. We will use MR to estimate the causal effects of social risk factors (e.g. education, intelligence, well-being), lifestyle factors (e.g. smoking, drinking, sleeping and physical activity) and clinical risk factors (e.g. hypertension, diabetes, obesity and mental disorder) in multiple populations. The data been used will be from UK Biobank, Biobank Japan and HUNT study.
2. Identify the effect of progression of diabetes and hypertension on CKD progression
Since some risk factors will co-occurred for a long period of time. We would like to know the effect of changes of glucose and blood pressure level across time on the progression of CKD.
3. Integration of causal determinants of CKD with molecular phenotypes to improve CKD prevention and treatment.
We will identify the potential molecular factors (e.g. plasma proteins and expressions) mediate the causal relationship between candidate risk factors (e.g. diabetes) and CKD development and progression
Genome wide association study for disease incidence and progression
Genome wide association study will be used to identify genetic variants associated with the exposure (e.g. diabetes) and outcome (e.g. kidney diseaes) of interests.
We will mainly use an approach called Mendelian randomization (MR), which uses genetic variation as a natural experiment to estimate the causal effects of modifiable risk factors on later outcomes.
Trans-ethnic genetic correlation
Genetic correlation for the same phenotype across different populations will be estimated using POPCORN
Colocalization will be applied to top Mendelian randomization findings to further confirm the causality.
Siddhartha Kar (lead), Bethan Lloyd-Lewis, Richard Martin Kate Lawrenson (collaborator at the Cedars-Sinai Medical Center in Los Angeles)
Previously published genome-wide association studies (GWAS) and analyses of the UK Biobank have identified hundreds of genomic loci that are associated with (i) site-specific cancers and/or (ii) cancer-related traits, including putative cancer risk factors. Genomic loci that harbour a shared association with one or more cancer-related traits and with one or more cancer types offer compelling starting points for better understanding cancer biology. Examples of such loci include genomic regions with overlapping associations for (i) gastro-oesophageal reflux disease, Barrett's oesophagus (a pre-cancerous condition), and oesophageal cancer and (ii) the hormone-related traits of pregnancy duration, endometriosis, and ovarian cancer.
Unravelling the relationship between traits and cancers at such loci provides an unprecedented opportunity to uncover novel oncogenic mechanisms and genes that can be specifically targeted by new preventive and therapeutic interventions.
1. To identify site-specific cancers and cancer-related traits/risk factors that will be evaluated within the framework of this project (for example, breast cancer and hormone-related traits such as ages at menarche and menopause)
2. To develop a comprehensive, genome-wide atlas of genetic loci containing association signals shared between the cancer and its related traits using colocalisation
3. To fine-map causal genetic variation driving the shared association signals
4. To identify candidate target genes underlying the shared associations between the cancer and its related traits using gene-level and gene expression-based analyses
5. To experimentally manipulate the prioritised candidate target genes in relevant normal cell lines and examine its impact on malignant transformation
The student will receive exceptional training in the handling and statistical analysis of large-scale, high-dimensional genetic association data sets and in the interpretation of findings based on these data sets. The student will apply a range of state-of-the-art computational techniques including (i) multi-trait colocalisation analysis using HyPrColoc, (ii) multi-trait fine-mapping using Probabilistic Annotation INtegraTOR (PAINTOR), which leverages epigenomic annotations, (iii) gene expression quantitative trait locus (eQTL) analysis in the Genotype-Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA) project data sets using Summary Mendelian Randomisation (SMR), PrediXCan and Transcriptome-wide Association Study (TWAS) approaches, and (iv) gene- and pathway-level analysis using MAGMA.
An exciting feature of this project will be the close integration of laboratory work to follow-up the most promising findings, i.e., genomic loci where the same genetic variants underlie a risk factor trait association signal and a cancer susceptibility association signal and where both signals point to the same target gene. For a select handful of such loci, the target gene will be genetically perturbed in the most relevant normal cell lines and the cell lines subjected to assays to study changes in their invasiveness, migration, and proliferation potential as markers of oncogenic transformation. The student will receive training in these experimental methods.
It is envisioned that the work will lead to multiple high-profile and highly interdisciplinary academic publications that will be led by the student, providing an excellent foundation for future scientific leadership.
Professor Jonathan Sterne (lead), Professor Caroline Sabin (UCL), Dr Fiona Burns (UCL) Dr Ruth Mitchell
Genome-wide association studies (GWAS) of HIV have mainly addressed correlates of acquisition and disease progression: little work has been done in the context of combination antiretroviral therapy (ART). Suppression of viral replication by ART may be insufficient to prevent premature cardiovascular events or hepato-renal pathology, or non-AIDS cancers.
The NIHR BioResource is a federated database of healthy individuals and patients, consented for recall for research. High density genotyping is performed on the Affymetrix UK Biobank array. The HIV BioResource has collected host genome sequences from ~5200 participants with ongoing recruitment, storing blood samples and derivatives together with patient information (clinical history, lifestyle factors and potential predictors of CVD). It links with the UK Collaborative HIV Cohort Study (UK CHIC) and the UK HIV Drug Resistance Database (UK HDRD): two of the world’s largest and most productive HIV cohort studies: The combined dataset is a unique resource that will enable investigation of genetic contributions to switches in antiretroviral therapy; renal disease progression and other effects of ART.
This project will focus on identification of genetic predictors of the linked outcomes CD4:CD8 ratio; Dyslipidaemia; Diabetes; and Major cardiovascular events. Genetic prediction of these outcomes could facilitate early identification of individuals who might benefit from individualised treatments, increased monitoring or intensified prevention measures. The potential mediating role of nadir CD4 count will be investigated, as will moderating effects of specific ART drugs. Analyses will address potential biases introduced through selection of patients into the HIV Bioresource.
This project involves cleaning and analysing a novel dataset to study genetic predictors of outcomes of treatment with antiretroviral therapy: these are questions of global health importance with important implications for early identification of individuals who might benefit from individualised treatments, increased monitoring or intensified prevention measures. The data are derived from linking the NIHR HIV BioResource links (~6000 individuals) with the UK Collaborative HIV Cohort Study (UK CHIC) and the UK HIV Drug Resistance Database (UK HDRD), two of the world’s largest and most productive HIV cohort studies. The student will learn state of-the-art techniques for analysis of genome-wide association studies (eg LD regression), supported by colleagues in the world-leading MRC Integrative Epidemiology Unit (IEU). They will have full access, free of charge, to courses in research methods run by the MRC IEU and the University of Bristol Department of Population Health Sciences, as well as to specialist expertise and guidance in all areas required for successful completion of the project, including genetic epidemiology, Mendelian randomisation, epidemiology, statistical methodology and bioinformatics. The student will receive training in HIV clinical epidemiology at the UCL HIV Epidemiology and Biostatistics Group (co-led by Sabin), and in HIV medicine at the UCL Centre for Sexual Health & HIV Research, which shares accommodation with the largest clinic for sexual health and HIV disease in Europe.
Dr Evie Stergiakouli (lead), Dr Emma Louise Anderson,
There is evidence that dysfunction of the immune system could be implicated in neurodegenerative diseases including Alzheimer’s. Recent GWAS have identified genetic variants implicated in immunity as being associated with Alzheimer’s.
Utilizing the existing information generated by genetic studies and large population
cohort studies we will test if the genetic risk factors causing Alzheimer’s disease are also implicated in immunity. We will also use high-throughput analysis methods developed in the MRC IEU to take advantage of publicly available information on immune-related diseases and Alzheimer’s to test if immune-related disease can cause Alzheimer’s disease. This project could have implications for identifying patients at increased risk of dementia and supporting a particularly vulnerable group of patients.
We have the expertise to train a student in applying polygenic risk score analysis and Mendelian randomization with the aim to investigate if there is genetic overlap between Alzheimer’s disease and immune-related factors, including antibodies for several infectious diseases, in ALPSAC, UK Biobank and immune-related diseases cohorts.
The specific hypotheses to be tested are:
1. There is shared genetic susceptibility Alzheimer’s disease and susceptibility to infectious diseases.
2. Autoimmune diseases are causally associated with Alzheimer’s disease.
To test the first hypothesis we will calculate polygenic risk scores for Alzheimer’s disease in participants from the general population and test whether they are associated with antibodies for several infectious diseases and other immune-related factors.
To test the second hypothesis we will conduct multivariable 2-sample MR and perform sensitivity analyses to test and adjust for pleiotropic effects.
Kunkle, B.W. et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nat Genet 51, 414–430 (2019) doi:10.1038/s41588-019-0358-2
Jevtic et al. The role of the immune system in Alzheimer disease: Etiology and treatment. Ageing Research Reviews (2017); 40, 84-94
Dr Maria Carolina Borges (lead), Prof Deborah Lawlor, Prof Ge Zhang (University of Cincinnati) Dr Nicole Warrington (University of Queensland)
The metabolome is the complete set of small-molecule intermediates and products of metabolism in biological tissues/fluids (e.g. lipids, amino acids, glycolysis-related metabolites, carbohydrates, ketone bodies). Disruptions in the maternal metabolome during pregnancy (e.g. maternal hyperglycaemia) potentially affect several aspects of maternal health and fetal development. Understanding which metabolic pathways are implicated in adverse child and maternal health is key to inform effective interventions to improve maternal-child health; however, producing reliable evidence on that is challenging due to issues of confounding and reverse causation in observational studies and due to the scarcity of evidence from high-quality randomised controlled trials. Mendelian randomisation is a method that uses genetic variants robustly associated with modifiable exposures to generate more reliable evidence regarding which risk factors to intervene to produce health benefits. Mendelian randomisation can be used to discriminate causal from non-causal metabolites, which may facilitate targeted development of interventions and inform evidence-based recommendations for pregnant women.
To use Mendelian randomization to systematically assess the effect of changes in maternal metabolome during pregnancy on a wide-range of perinatal health outcomes (e.g. gestational hypertension, gestational diabetes, perinatal depression, need of induction of labour, caesarean delivery, early-membrane rupture, preterm delivery, birth weight, miscarriage, stillbirth).
1. To develop genetic instruments for multiple metabolites and test whether they predict metabolic changes during pregnancy
2. To use novel statistical methods for partitioning genetic effects at single loci into maternal and offspring genetic components
3. To use (one-sample and two-sample) Mendelian randomisation to probe the causal role of metabolic changes during pregnancy on perinatal health
4. To triangulate Mendelian randomisation findings with findings from other study designs (e.g. negative paternal control, randomised controlled trials)
Dr Laurence Howe (lead), Dr Neil Davies, Professor George Davey Smith
Every spring the clocks are moved forward one hour and every autumn the clocks go back an hour, a practice known as daylight saving time (DST) designed to increase daylight in the evening. However, this practice may be disruptive to population health, with previous studies reporting that clock shifts lead to increased incidence of depressive episodes , myocardial infarction  and strokes  in the weeks immediately following the clock change.
In March 2019, the European Parliament voted in favour of removing DST by 2021, although it is unclear whether this policy change will be enacted in the UK. The extent to which DST negatively affects health outcomes in the UK population is therefore highly relevant to the ongoing debate. The UK has large databases of clinically linked data such as the Hospital Episode Statistics (HES), the Clinical Practice Research Datalink (CRPD) and the UK Biobank. These datasets can be used to estimate the effects of DST on health outcomes by comparing the trends in incidence before and after clock shifts, a method known as regression discontinuity.
Purported effects of DST on cardiovascular outcomes may relate to sleep phenotypes (e.g. circadian rhythms, chronotype). Follow-up analyses could involve using observational and genetic data (Mendelian randomization) to evaluate how sleep-related phenotypes influence cardiovascular outcomes.
1) Use regression-discontinuity methods to estimate the effects of DST on a range of cardiovascular health outcomes (e.g. myocardial infarction, stroke) in large clinically linked datasets (e.g. CPRD, HES, UK Biobank).
2) Use observational and genetic data from UK Biobank to determine if certain sleep chronotypes are more vulnerable to effects of DST.
Dr Helen Bould (lead), Dr Carol Joinson (Developmental Psychologist), Dr Jon Heron (Statistical expertise)
Puberty is a key period for the onset of eating disorders and girls are at greater risk than boys (Klump, 2013; Harden et al. 2014. There is evidence that menarche is a risk factor for eating disorders, possibly due to associated increases in body weight (Abraham et al. 2009). There is also some evidence that girls who experience menarche earlier than their peers are at greater risk of eating disorders, but this finding is not consistent (see Klump 2013 for a review of these studies). Limitations of earlier research include cross-sectional study design, small sample sizes and inadequate adjustment for confounders. They have focussed on onset-of-menarche as a marker of puberty, whereas this project would use a range of measures to establish pubertal timing, including age at Peak Height Velocity (i.e. adolescent growth spurt). Previous work has also focussed on diagnosed eating disorders, whereas developing a better understanding of disordered eating behaviours across the spectrum may help us to prevent eating disorders more effectively. The project will go on to assess whether age at menarche is causally associated with disordered eating, using Mendelian Randomisation.
The aim of this project is to use Avon Longitudinal Study of Parents and Children (ALSPAC) data to increase understanding of the association between puberty and eating disorders. The project will examine the following research questions:
(i) Does the risk of eating disorders and disordered eating behaviour increase following puberty?
(ii) Do girls who go through puberty earlier than their peers have an increased risk of disordered eating behaviours?
(iii) Is there a causal relationship between age at menarche and increases in disordered eating behaviour?
1. Review the literature on eating disorders and the relationship with pubertal development and identify potential confounders.
2. Obtain relevant data from ALSPAC dataset.
3. Conduct multivariable regression analysis examining the research questions. Exposure variables: time since menarche, timing of menarche (categorical: early, normative, late) and continuous (age at menarche), age at Peak Height Velocity.
Outcome variables: Self-reported disordered eating and body dissatisfaction at ages 14, 16, 18 and 24 years.
4. Use Mendelian Randomisation analysis to improve causal inference by using genetic variants that are robustly associated with timing of menarche to explore causal effects on disordered eating.
Klump KL. Puberty as a Critical Risk Period for Eating Disorders: A Review of Human and Animal Studies. Horm Behav. 2013; 64(2): 399–410.
Harden KP, Kretsch N, Moore SR, Mendle J. Descriptive review: hormonal influences on risk for eating disorder symptoms during puberty and adolescence. Int J Eat Disord. 2014;47(7):718-26.
Abraham S, Boyd C, Lal M, Luscombe G, Taylor A. Time since menarche, weight gain and body image awareness among adolescent girls: onset of eating disorders? J Psychosom Obstet Gynaecol. 2009;30(2):89-94.
Siddhartha Kar (lead), Tim Robinson, Tom Gaunt
Cancer is a disease of the genome. Certain changes that are acquired over the course of life in the genomes of healthy cells in the human body (somatic genomic changes) dysregulate the fine balance between cell death and proliferation. These somatic genomic aberrations are the cornerstone of malignant cellular transformation. Targeting somatic genomic changes is fundamental to the practice of precision cancer medicine. We understand that common exposures and cancer risk factors such as ultraviolet light and smoking accelerate the acquisition of these changes. However, little is actually known about how everyday exogeneous and endogenous factors such as diet, obesity, and insulin resistance relate to, and likely drive, carcinogenic changes in the somatic genome. This is because it is difficult to measure lifelong trajectories of the factors retrospectively at cancer diagnosis and expensive to measure them prospectively in large numbers of individuals until some of them develop cancer. Such one-time "snapshot" measures, even where feasible, are prone to bias and confounding. Specific inherited or germline genetic variants have been found to be robustly associated with these exposures or factors. Since genetic variants are allocated at random at conception and fixed thereafter, they are less affected by bias and confounding. The factor-associated variants provide remarkable proxies for the lifetime levels of these factors even in patients in whom the factor itself has not been measured. These variants collected into polygenic scores can serve as instruments to evaluate association between the germline genetically inferred levels of the factor and somatic/tumour molecular features and mechanisms that operate within the cancer.
1. To identify tumour molecular features associated with common exposures or putative cancer risk factors
Genome-wide association studies involving hundreds of thousands of individuals have identified germline variants that are robustly associated with different factors, ranging from body mass index to blood-levels of protein markers. This variation will be leveraged to generate personalised life-course profiles of these factors in cancer patients using germline genotype data. The association of these profiles with tumour gene expression, methylation, copy number, and mutations will then be evaluated at the level of single genes and multi-gene biological pathways in >11,000 tumours that have been subjected to deep germline-somatic molecular and clinical phenotyping in The Cancer Genome Atlas (TCGA) project.
2. To investigate the association between common exposures or putative cancer risk factors and cancer drug sensitivity
Over 1,000 cancer cell lines from the Genomics of Drug Sensitivity in Cancer project have been screened for their response to >450 cancer drugs either approved for use in patients or in development. Germline genotypes from the cell lines will be used to index the factors and the association of each index with therapeutic response assessed.
The student will receive exceptional training in the handling and statistical analysis of large-scale, high-dimensional cancer genetic, genomic, transcriptomic, and epigenomic data sets and in the interpretation of findings based on these data sets. The student will apply a range of computational techniques including state-of-the-art Mendelian randomisation methods implemented in MR-Base, polygenic scoring approaches such as LD-Pred, and expression quantitative trait locus analysis using the R package Matrix eQTL. It is envisioned that the work will lead to multiple high-profile and highly interdisciplinary publications that will be led by the student, providing an excellent foundation for future scientific leadership.
The Cancer Genome Atlas project: Ding, L. et al. Cell 173, 305-320.e10 (2018).
The Genomics of Drug Sensitivity in Cancer project: Iorio, F. et al. Cell 166, 740–754 (2016).
Dr Caroline Taylor (lead), TBC,
Many neural tube defects can be prevented by increasing folic acid intake before and during pregnancy. However, only about one-fifth of women report that they have taken a folic acid supplement before pregnancy, and only three-fifths in early pregnancy. The Government have recently held a consultation on fortification of flour with folic acid in the UK to reduce the prevalence of neural tube defects in fetuses. Modelling exercises from the Foods Standards Agency in Scotland have explored the effectiveness and safety of fortification, indicating a substantial reduction in the prevalence of NTD. However, it is estimated that in the UK 8.5 million people, have gone ‘gluten-free’ and/or ‘low-carb’: preconceptual and pregnant women following these diets may not therefore not receive the benefit of fortification. In addition, it is likely that the promotion of folic acid supplementation before and during pregnancy will be substantially reduced, potentially further increasing the risk of NTD for these women.
1. To quantify the prevalence of following a gluten-free and/or low-carb in pre-conceptual and pregnant women
2. To quantify the effect of these diets on folic acid intakes from dietary assessments and data on supplement intakes, including modelling the likely effects after flour supplementation
3. To quantify the associations with biomarkers of folate status
4. To use qualitative methods to investigate the motivations for following these diets and awareness and attitudes to taking supplements
A mixed methods study comprising:
Quantitative study with pregnant women including collection and analysis of biosamples, and electronic dietary assessment.
Qualitative study with pregnant women following these diets including focus groups and individual interviews.
Gov.uk (2019) Proposal to add folic acid to flour: consultation document. https://www.gov.uk/government/consultations/adding-folic-acid-to-flour/proposal-to-add-folic-acid-to-flour-consultation-document
Walker D. Fortification of flour with folic acid is an overdue public health measure in the UK. Arch Dis Child 2016; 101(7): 593.
Barrett G et al. Why do women invest in pre-pregnancy health and care? A Qualitative investigation with women attending maternity services. BMC Pregnancy and Childbirth 2015; 15: 236
Dr Luisa Zuccolo (lead), Dr Carolina Borges,
Breastfeeding is the biological norm and sustainable. It is also potentially life-saving, particularly for premature babies and those without access to clean water. Strategies to support breastfeeding have been successful, but inequalities persist and rates remain low in high-income countries such as the UK.
Although the short-term effects of breastfeeding are well documented, several questions about the epidemiology of breastfeeding remain unresolved: (a) which factors cause failure to establish and sustain breastfeeding, (b) which maternal and infant long-term outcomes are affected by different breastfeeding practices, and (c) what the mechanisms behind these are.
This project will benefit mothers and babies by: (a) identifying targets to improve breastfeeding and lactation outcomes; (b) improving our understanding of (non-breastfeeding specific) parenting practices and behaviours for optimal child development and long-term maternal health; and (c) increasing our knowledge of human milk components which could be added to breastmilk substitutes.
This project will improve our understanding of both the determinants and the effects (to the mother and the infant) of successfully establishing and sustaining breastfeeding.
1. To identify modifiable determinants of breastfeeding traits
2. To identify genetic predictors of breastfeeding traits
3. To estimate causal effects of breastfeeding on maternal and offspring health
4. To explore mechanisms for the long-term effects of breastfeeding
Breastfeeding traits include initiation, successful establishment, duration, exclusivity, breastfeeding problems.
This project aims to answer the above questions by combining cutting-edge methods that improve causal inference in observational studies, e.g. Mendelian Randomization and causal mediation, with classic epidemiological designs, e.g. cross-context comparisons. Results from each of these methods are likely to suffer from different biases, sometimes in opposite directions. We will exploit this using a triangulation approach, consistent results will provide stronger evidence for causality.
A key and novel component of the project will also be the identification of genetic predictors of breastfeeding traits through well-powered genome-wide association studies, to inform the Mendelian randomization analyses.
In order to fulfill the study’s objectives, an international network of collaborating cohorts has been established (N>200,000) to analyse existing data on breastfeeding traits, and putative determinants and consequences.
Victora CG et al. Lancet 2016; 387(10033): 474-90.
Professor Paul Moran (lead), Dr Becky Mars, Dr Lindsey Hines
Self-harm and cannabis use have their peak incidence during adolescence and both are associated with poor health in later life (1) (2) (3). Whilst the majority of self-harming behaviour appears to resolve in adolescence (4), patterns of cannabis use established in adolescence tend to continue into adulthood (5). It is possible that adolescent reliance on self-harm is replaced with a subsequent reliance on cannabis or other indeed other unhealthy behaviours (e.g. excessive consumption of alcohol). Yet evidence supporting the occurrence of functional ‘substitution’ of these behaviours is lacking. This is an important gap in knowledge with implications for the management of self-harm.
This PhD project will shed light on the interrelationship between self-harm and cannabis use in two population cohorts, by addressing the following research questions:
1) Is frequency of cannabis use in early adolescence associated with the risk of self-harm at age 16 years?
2) Is cessation of self-harm during adolescence associated with a) increased risk of cannabis use; b) frequency of use in young adulthood?
3) Does adolescent self-harm predict the trajectory of cannabis use frequency in young adulthood?
4) Are there other plausible candidates for behavioural substitution in self-harm?
The project will take advantage of data from two population cohorts: the Avon Longitudinal Study (ALSPAC) and the Victorian Adolescent Health Cohort Study (VAHCS). Both datasets have data on contemporaneous measures of mental health, substance use, behaviour and social factors. The student will learn core data analytic skills before developing an appropriate statistical analysis plan under supervision. Longitudinal latent class analysis will be used to identify trajectories of self-harm and cannabis use. Key prospective associations with these trajectories will be explored, along with an investigation of the links between self-harm and cannabis use class membership over time.
1. Mars B, Heron J, Crane C, Hawton K, Lewis G, Macleod J, et al. Clinical and social outcomes of adolescent self harm: population based birth cohort study. BMJ. 2014;349:g5954.
2. Borschmann R, Becker D, Coffey C, Spry E, Moreno-Betancur M, Moran P, et al. 20-year outcomes in adolescents who self-harm: a population-based cohort study. Lancet Child Adolesc Health. 2017;1(3):195-202.
3. Gage SH, Hickman M, Heron J, Munafo MR, Lewis G, Macleod J, et al. Associations of cannabis and cigarette use with depression and anxiety at age 18: findings from the Avon Longitudinal Study of Parents and Children. PLoS One. 2015;10(4):e0122896.
4. Moran P, Coffey C, Romaniuk H, Olsson C, Borschmann R, Carlin JB, et al. The natural history of self-harm during adolescence and young adulthood: population-based cohort study. The Lancet. 2012;379(9812):236-43.
5. Patton GC, Coffey C, Carlin JB, Degenhardt L, Lynskey M, Hall W. Cannabis use and mental health in young people: cohort study. BMJ. 2002;325(7374):1195-8.
Prof Julian Higgins (lead), Dr Alexandra McAleenan, Prof Kate Tilling
Triangulation, in which multiple methods are strategically used to answer a single question, is a currently developing area. Lawlor, Tilling and Davey Smith (2016) explained how causal inferences can be strengthened by integrating results from several approaches with different key sources of potential bias. The statistical methods for combining the results from multiple sources of evidence within a triangulation framework are, however, underdeveloped. This PhD seeks to develop, illustrate and evaluate such methods.
At its simplest, triangulation involves comparison and combination of studies of the same exposure-outcome effect that use different designs or analytic methods. For example, randomized trials, Mendelian randomization studies and traditional multivariable regression analyses of observational evidence might all tackle a question relating to the same exposure-outcome effect. The studies may produce different effect estimates because they are (i) asking subtly different questions (e.g. in relation to the period or patterns of exposure), (ii) compromised by different biases and/or (iii) subject to chance. Triangulation combines these issues in a statistical model and assesses the extent to which the observed data fit together – an approach known as multiparameter evidence synthesis. Another form of triangulation arises when some (or all) studies address only a component of the underlying question. For example, if the exposure-outcome effect occurs through an intermediate, then studies of the exposure-outcome effect might be triangulated with a combination of studies (i) of the effect of exposure on the intermediate and (ii) of the effect of the intermediate on the outcome.
1) Develop statistical models to facilitate triangulation for the purposes of learning about causal effects of an exposure on an outcome based on diverse approaches to generating the evidence;
2) Develop methods for assessing consistency of findings across evidence sources;
3) Develop methods to synthesise results across diverse studies, accounting for differences in the research questions, biases and chance; and
4) Illustrate the methods through application to important causal questions in epidemiology.
The project will primarily explore Bayesian methods, because they are flexible and allow incorporation of external information through prior distributions. The work is inherently interdisciplinary, bringing together detailed understanding of clinical aspects of studies, strong epidemiological thinking, state-of-the-art statistical methods and statistical computation. In addition to working on novel statistical methods, the student may explore other methodological questions, such as what sources of information are available about biases, to inform prior distributions. The project will develop quantitative skills, and specifically in statistical methods, evidence synthesis and causal inference. Applications are likely to involve data from two major general population cohorts: UK Biobank and the Avon Longitudinal Study of Parents and Children (ALSPAC). These will be used as inputs into triangulation exercises, or for confirmation of findings arising from triangulation exercises based on published results.
Lawlor DA, Tilling K, Davey Smith G. Triangulation in aetiological epidemiology. Int J Epidemiol. 2016;45(6):1866-86.
Dr Gemma Sharp (lead), Prof Kate Tilling, Prof Debbie Lawlor, Internal and external experts on climate change
Climate change will affect human health through factors such as poor sanitation, a lack of food and safe drinking water, changing disease patterns, migration, frequent extreme weather events and rising temperatures.
Pregnant women, the developing fetus, and young children are considered amongst the most vulnerable and marginalised members of society, and they could therefore be uniquely sensitive to the effects of climate change.
Research is required to characterise the potential effects of climate change on these groups, and to identify and coordinate attempts to reduce adverse health outcomes.
This exciting project aims to explore how epidemiological data and approaches can be best applied to study the effects of climate change. The project focuses on how exposure to multiple aspects of climate during pregnancy relate to maternal and offspring health across the lifecourse.
There are several options, depending on the student's skills and interests, as well as the availability of data and the feasibility of data linkage.
One option is to link health and demographic data from sources including UK Biobank, ALSPAC and Born in Bradford to detailed HadUK-Grid data from the UK met office, which contains climate data produced on a 1km x 1km grid resolution dating back to the 1880s (https://www.metoffice.gov.uk/research/climate/maps-and-data/data/haduk-grid/datasets). This will allow the student to conduct observational pheWAS (phenome-wide association studies) to explore associations between exposures including ambient temperature, precipitation, barometric pressure and hours of sunlight in early life to a wide range of health outcomes across the lifecourse.
More focused investigations can be carried out for key outcomes, including longitudinal modelling of climate exposures in relation to health outcomes and exploring critical periods when these outcomes appear to be especially sensitive to fluctuations in climate exposures.
Molecular mediation by DNA methylation and/or metabolomics could also be explored.
In addition to working with a team of highly experienced experts in epidemiology and population health, there will be multiple exciting opportunities for collaboration with climate experts, both within the University of Bristol and externally (including internationally). There will also be opportunities for training and gaining experience in public and policy engagement to help translate findings and create impact.
Climate change and the potential effects on maternal and pregnancy outcomes: an assessment of the most vulnerable – the mother, fetus, and newborn child. Rylander et al. 2013 Glob Health Action https://www.ncbi.nlm.nih.gov/pubmed/23481091
Exploring associations of maternal exposure to ambient temperature with duration of gestation and birth weight: a prospective study. Li et al 2018 BMC Pregnancy Childbirth https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311008/
Systematic review on adverse birth outcomes of climate change. Poursafa et al 2015 J Res Med Sci https://www.ncbi.nlm.nih.gov/pubmed/26109998
Dr Harriet Forbes (lead), Dr Dheeraj Rai,
Rates of Caesarean section (CS) are increasing in the UK and worldwide, with many countries going beyond the level of 10–15% recommended as optimal by the World Health Organization. In the UK around 25% of births are by CS, with 9% being planned (also called elective) CS and rates varying from between 6-17% by region.1 It has been suggested that children born by CS, rather than vaginal delivery may be more likely to have a later diagnosis of neurodevelopmental conditions such as autism spectrum disorder (ASD) and attention-deficit/hyperactivity disorder (ADHD).2 Hypothesised mechanisms for this association include: CS deliveries often occurring ‘before’ term, therefore shortening the in-utero brain development; greater chance of neurotoxicity related to neonatal exposure to general anaesthesia; and less exposure to the maternal microbiome.2 However, it is also possible that the observed associations are not causal because women being offered CS may have different characteristics from those having a vaginal delivery such as medical complications. Making causal inference using observational studies in the presence of confounding by indication is therefore challenging.
This PhD project will use UK electronic healthcare data to investigate whether there is a causal association between mode of delivery for childbirth and neurodevelopmental and psychiatric conditions in the child.
1. To investigate whether children born by CS deliveries are more likely to have a later diagnosis of neurodevelopmental conditions such as ASD and ADHD as compared to children born by a vaginal delivery.
2. To investigate whether the association varies according to whether the CS delivery was elective or emergency.
3. To investigate whether any association is likely to be causal and varies by specific maternal characteristics, such as age and body mass index.
The student will use the mother and baby link in the Clinical Practice Research Datalink, a large primary care database in the UK, along with linked hospital records. Training will be provided in how to handle large-scale routinely collected health data as well as appropriate use of traditional (regression methods) and more advanced causal inference methods, such as propensity score, instrumental variable and discordant sibling methods.
1. NHS Digital, Maternity Services Monthly Statistics England, March 2018, Experimental Statistics. 2018.
2. Zhang T, Sidorchuk A, Sevilla-Cermeno L, et al. Association of Cesarean Delivery With Risk of Neurodevelopmental and Psychiatric Disorders in the Offspring: A Systematic Review and Meta-analysis. JAMA Netw Open 2019:2:e1910236.
Dr Lavinia Paternoster (lead), Dr Ashley Budu-Aggrey, other members of the European BIOMAP consortium, as appropriate
Atopic dermatitis (or eczema) is highly heritable and genome-wide association studies have successfully identified genetic risk loci associated with incidence of the disease. We are part of a large European collaboration (BIOMAP) to investigate the mechanisms of disease endotypes, such as progression and remission, differing severities and comorbidities. In this project you will work with BIOMAP datasets, that include ALSPAC, UK Biobank and clinical datasets to investigate the genetic influences of specific endophenotypes of eczema. This will involve both genome-wide association analysis, as well as analyses focusing on established eczema loci and genetic risk scores.
By analysing select groups of individuals (i.e. only those with disease), there is the potential issue of introducing bias through selection/stratifying on a collider. Novel methods are being developed to overcome this challenge and you will have the opportunity to test and apply these methods in the context of eczema progression genetics, in order to produce robust results.
The identification of such genetic loci will be informative for future drug development pipelines, aiming to better treat those suffering with eczema.
To identify genetic loci robustly associated with endotypes of atopic dermatitis, and where possible to follow-up these loci to identify the genes of interest.
- Data cleaning and generation of phenotypes of interest
- Genome-wide association analysis
- Construction and analysis of genetic risk scores (GRS)
- Testing and evaluation of novel methods for limiting bias introduced by selection
- Triangulation of omics resources to identify the underlying causal genes
1. GWAS of eczema: Paternoster et al. 2015 PMC4753676
2. BIOMAP consortium webpage: https://www.biomap-imi.eu/
3. Paper describing selection bias: Paternoster et al. 2017 PMC5628782
Dr. Josine Min (lead), Dr. Haeran Cho, Dr. Claire Gormley, School of Mathematics and Statistics, University College Dublin Prof. Jonathan Rougier, Rougier Consulting Ltd Prof. Kate Tilling, MRC Integrative Epidemiology Unit
This studentship will provide cross-disciplinary training in state-of-the-art statistical and genomic epidemiological approaches (under the supervision of Dr. Josine Min at the Medical Research Council Integrative Epidemiology Unit and Dr. Haeran Cho at the School of Mathematics) to develop novel statistical methods for the analysis of high-dimensional epigenetic data in large-scale epidemiological datasets.
Epigenome-wide association studies (EWAS) aim to identify DNA methylation (DNAm) sites associated with phenotypes of interest (e.g., disease status, cholesterol levels, smoking history, body mass index and age, to name a few), where the challenge lies in the robust identification of DNAm differences and the interpretation of the results. In current EWAS approaches, each DNAm site is tested separately for association with a trait, exposure, or biological condition of interest, which raises the problem of controlling for genome-wide multiplicity. Clustering of the DNAm sites can significantly reduce the dimensionality of the subsequent EWAS. For example, a DNAm site that has a hypo-methylated distribution in a smoking cohort, but hyper-methylated distribution in the non-smoking cohort would be deemed a potential candidate for differential DNAm in association with smoking. It is commonly assumed that the distributional behaviour at DNAm sites can be characterised by a small number of behaviours, such as either `hypo- or hyper-methylated'(1). In existing approaches for EWAS, DNAm sites are often deemed to be differentially methylated if their mean DNAm levels differ. In this project, we aim at uncovering differentially methylated sites by exploring the distributional behaviour of DNAm at each DNAm site, similar in vein to Lock and Dunson(2).
The overall aim of this PhD is to develop a novel statistical model similarly to the successful application of principal component analysis as an estimation technique under high-dimensional factor models(3). Additional method development would depend on the candidate's research interests but could include development of novel methods to account for cell heterogeneity or novel methods for imputation of epigenetic features.
This project will equip the student with skills in the analysis of high-dimensional data, and development of scalable clustering techniques and algorithms, training in the analysis and interpretation of epigenetic data. The student will have the opportunity to spend a period circa 6 months, most likely at the start of Year 3, as a visiting student working in the School of Mathematics and Statistics in University College Dublin with Dr. Gormley. They will benefit from being housed with and exposed to the cohort and activities of PhD students there as part of the Science Foundation Ireland Centre for Research Training in Foundations of Data Science www.data-science.ie, @data_science_ie), and also to the PhD students in the national Insight Center for Data Analytics.
We propose to find a factorisation of DNAm profiles and to develop a novel generative statistical model similarly to the successful application of principal component analysis as an estimation technique under high-dimensional factor models(3). We will adopt a non-parametric approach, whereby structural assumptions that are natural to the application DNAm data can be imposed. At the University of Bristol, we have access to a large dataset ARIES(4) with DNAm of 1000 children measured at 3 time points and their mothers at two time points offering the student an excellent platform for developing these methods.
1 Houseman, E. A. et al. Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions. BMC Bioinformatics 9, 365, doi:10.1186/1471-2105-9-365 (2008).
2 Lock, E. F. & Dunson, D. B. Shared kernel Bayesian screening. Biometrika 102, 829-842, doi:10.1093/biomet/asv032 (2015).
3 Fan, J., Liao, Y. & Mincheva, M. Large Covariance Estimation by Thresholding Principal Orthogonal Complements. J R Stat Soc Series B Stat Methodol 75, doi:10.1111/rssb.12016 (2013).
4 Relton, C. L. et al. Data Resource Profile: Accessible Resource for Integrated Epigenomic Studies (ARIES). Int J Epidemiol 44, 1181-1190, doi:10.1093/ije/dyv072 (2015).