Data mining epidemiological relationships: integration of causal analysis with published evidence

Programme Overview

Population health research is being transformed by the increasing wealth of complex data. New high-dimensional epidemiological datasets provide novel opportunities for systematic approaches to understanding the relationships between risk factors and disease outcomes. This programme will build on our successes in collating data and implementing software to automate causal inference using Mendelian randomization (MR-Base,, and in literature mining (MELODI, We will implement a new graph database (EpiGraphDB) that integrates causal estimates with comprehensive data on relationships between traits, risk factors, biomarkers, intervention targets and diseases. Using EpiGraphDB we will develop new methods to explore the relationships between risk factors and disease, enabling new causal hypotheses to be generated and explored.

Aims and Objectives

We aim to: (a) systematically integrate biological contextual information with causal estimates generated using Mendelian randomization (b) develop novel approaches to identifying, validating and prioritising potential causal estimates in the context of a wide array of other information (c) utilise our database to inform the development of new Mendelian randomization methods that address pleiotropy (d) apply the data and approaches from (a) to (c) to investigate the causal risk factors in cardiovascular disease and cancer.

Research highlights

Dr Tom Gaunt


Edit this page