Machine Learning with Omics Data
Health research is increasingly turning to high-throughput molecular datasets (also known as ‘omic’ datasets) to discover novel biomarkers of disease risk and outcome. Unfortunately, the size and complexity of these datasets makes them difficult to manage and prone to many pitfalls. In this course, we introduce you to the latest approaches from data science for interpreting and extracting useful and reliable biomarkers from these challenging datasets.
| Dates | 10 - 12 June 2026 | 
|---|---|
| Fee | £750 | 
| Format | Online | 
| Audience | Open to all applicants (prerequisites apply) | 
Course profile
This course aims provide an overview of the principles and methods of epidemiology and data science that are relevant to high-throughput omic studies and provide students with the knowledge and skills necessary to design and utilize population-based omic studies to gain insight and to derive robust biomarkers of exposures and health outcomes.
Please click on the sections below for more information.
This 3-day course will be online and consist of live lectures followed by practical sessions using R via Posit Cloud, consequently attendees do not need to install R on their computers.
By the end of the course participants should be able to:
- discuss the specific contributions of different omic data types for understanding and improving human health;
- choose and apply appropriate statistical and machine learning methods for interrogating omic data;
- derive reliable omic biomarkers for indexing exposure and predicting health outcomes;
- evaluate biomarker performance in terms of metrics appropriate to the context in which the biomarker will be used; and
- mitigate the ethical challenges of developing, interpreting and applying molecular biomarkers.
This course is intended for individuals engaged in population-based studies who wish to use omic datasets (e.g. epigenomic, transcriptomic, proteomic, metabolomic or genomic) to gain biological insights and to derive biomarkers of exposure and/or health outcomes. Attendees may have a background in epidemiology, genetics, statistics, public health or a clinical speciality. A basic knowledge of epidemiology is required and some understanding of molecular epidemiology terminology and machine learning would be advantageous. Practical knowledge of R is required as students will be processing large omic datasets in practical sessions.
Please note that this course attracts a highly multi-disciplinary audience. We do our utmost to accommodate this and ask that if in any doubt, prospective participants enquire prior to booking to check that the course is targeted at the right level for their needs.
The course will cover:
- examples of published omic analyses and models for epidemiological and medical applications;
- statistical methods for preprocessing, discovering patterns and testing associations in omic datasets;
- interpreting the biological relevance of omic patterns and associations;
- estimating the heritability and proportion of variation explained by omic data;
- approaches from machine learning for deriving reliable omic biomarkers for indexing exposures and predicting health outcomes;
- application and interpretation of appropriate metrics for evaluating biomarker performance; and
- ethical challenges of developing, interpreting and applying molecular biomarkers.
Dr Paul Yousefi is a data scientist who applies emerging methods in machine learning and statistical prediction to develop multi-dimensional genomic biomarkers of health risk factors, patterns of exposure, and emerging disease phenotypes.
Dr Matthew Suderman is a bioinformatician who specialises in the handling and integrated analysis of large molecular datasets for the discovery of biomarkers of disease risk and outcomes.
Dr Anza Shakeel is an expert in deep learning whose research focuses on using approaches from artificial intelligence to leverage large molecular and imaging datasets to develop predictors of health outcomes.
Dr Sarah Watkins is an epigenetic epidemiologist interested in how our environment shapes our health. Her research currently focuses on how environmental exposures like smoking and adversity influence DNA methylation and health outcomes and how structural racism can lead to health inequities.
To make sure the course is suitable for you and you will benefit from attending, please ensure you meet the following prerequisites before booking:
| Knowledge | You should be very familiar with the topics presented in our Molecular Epidemiology short course. This includes practical knowledge of using R to analyse high-throughput molecular data. It is recommended that you should have either completed the Molecular Epidemiology short course in this programme or have previous experience performing an omic-wide association studies, e.g. GWAS, EWAS, PWAS. | 
|---|---|
| Recommendation | Access to two screens will be useful for practical sessions where one screen can be used to view instructions and the other to carry out instructions and view outputs. | 
Before booking this course, please make sure you read the information provided above about the target audience and prerequisites. It is important that you have access to the relevant IT resources needed for the course and meet the knowledge prerequisites to ensure you can get the most from the course.
Bookings are taken via our online booking system, for which you must register an account. To check if you are eligible for free or discounted courses please see our fees and voucher packs page. All bookings are subject to our terms & conditions, which can be read in full here.
For help and support with booking a course refer to our booking information page, FAQs or feel free to contact us directly. For available payment options please see: How to pay your short course fees.
Participants are granted access to our virtual learning platform (Blackboard) 1 to 2 weeks in advance of the course. This allows time for any pre-course work to be completed and to familiarise with the platform.
To gain the most from the course, we recommend that you attend in full and participate in all interactive components. We endeavour to record all live lecture sessions and upload these to the online learning environment within 24 hours. This allows course participants to review these sessions at leisure and revisit them multiple times. Please note that we do not record breakout sessions.
All course participants retain access to the online learning materials and recordings for 3 months after the course.
University of Bristol staff and postgraduate students who do not wish to attend the full course may instead register for access to the 'Materials & Recordings' version of this course: Further information and bookings.
86% of attendees recommend this course*.
*Attendee feedback from 2024.
Here is a sample of feedback from the last run of this course:
"It was a great overview of the topic." - Course feedback, June 2024
"I found the lecture-practical-lecture sandwich really useful for consolidating the theory and then beginning to put it into practice." - Course feedback, June 2024
"The course material was very informative and useful, shared in an enthusiastic manner and of high quality by the teachers. They were very happy to answer questions and did so in a very helpful and knowledgable way." - Course feedback, June 2024
"Well designed practical sessions and good course content." - Course feedback, June 2024
"Very useful for a broad overview of ML applications to omics data" - Course feedback, June 2024