Bioinformatics cross cutting strand
Lead: Prof Tom Gaunt
The programme makes significant use of ’omics technologies to generate high-density molecular data on a large number of samples from different studies. The bioinformatics strand provides the core expertise to process, quality control, harmonise and integrate these data, including integration with public datasets to enhance their value and provide annotations and meta-data.
Finally, we are developing machine learning approaches to the prediction of cancer outcomes.
The strand capitalises on our established expertise (e.g. ARIES and other methylation projects; ALSPAC metabolomics; ProtecT genetics) to systematically process, clean and integrate data. With work package 2 we have developed bespoke laboratory information management systems to record laboratory processes from sample-tube to database, have created a comprehensive data pipeline to generate analysis-ready normalised epigenetic and metabolomic datasets, and developed a database of methylation QTL results (mQTLdb). We have also developed automated literature-mining software (MELODI and TeMMPO) for the systematic reviews in Work package 3, and played a major role in the development of the MR-Base platform for Mendelian randomization, utilised by Work package 1.
A significant amount of molecular data already exists in the public domain, offering added value to the data generated in this Programme. Data to be exploited include: ENCODE, ArrayExpress, GWAS catalog, NIH Roadmap Epigenomics and others. This information will enable: (a) informed selection of variants for MR and recall-by-genotype; and (b) generation of new insights into mechanisms by considering associations in the context of broader functional data.
- MR-Base - Mendelian randomization analytical platform and database of GWAS results
- MELODI - mechanism-discovery tool based on literature mining
- TeMMPO - literature-based mechanism prioritisation, developed in partnership with WCRF
- mQTLdb - methylation QTL database
- CScape - classification of driver/neutral somatic mutations