Bioinformatics cross cutting strand
Lead: Dr Tom Gaunt
The programme makes significant use of ’omics technologies to generate high-density molecular data on a large number of samples from different studies. The bioinformatics strand provides the core expertise to process, quality control, harmonise and integrate these data, including integration with public datasets to enhance their value and provide annotations and meta-data. We are building a web-based data exploration utility, enabling internal and external researchers to explore the data and view aggregate/summary statistics.
The strand capitalises on our established expertise (e.g. ARIES and other methylation projects; ALSPAC metabolomics; ProtecT genetics) to systematically process, clean and integrate data. We have developed bespoke laboratory information management systems to record laboratory processes from sample-tube to database, and have created a comprehensive data pipeline to generate analysis-ready normalised epigenetic and metabolomic datasets. The strand will develop the automated literature-mining process (MELODI and TeMMPO) for the systematic reviews in Work package 3.
A significant amount of molecular data already exists in the public domain, offering added value to the data generated in this Programme. Data to be exploited include: ENCODE, ArrayExpress, GWAS catalog, NIH Roadmap Epigenomics and others. This information will enable: (a) informed selection of variants for MR and recall-by-genotype; and (b) generation of new insights into mechanisms by considering associations in the context of broader functional data.
Data exploration interface
As part of the ARIES project, we have developed the ARIES-Explorer website which enables exploration of aggregate level methylation results. This site now receives around 50 unique visitors per day, greatly increasing the accessibility of the data. We propose expanding this platform to provide an exploration resource for data generated in this Programme.