Turing Network Data Study Group Bristol

Bringing together some of the country’s top talent from data science, artificial intelligence, and wider fields, to analyse real-world data science challenges.

The Turing’s first Network Data Study Group took place in Bristol from 5 to 9 August 2019. Building on the popular Turing Data Study Groups held three times a year at The Alan Turing Institute, the Network Data Study Group in Bristol offered the opportunity for collaborative working and networking on a local level with the Turing’s partner universities.

Researchers were given an opportunity to put knowledge into practice and go beyond individual fields of research to solve real world problems. The event also offered participants the chance to forge new networks for future research projects, and build links with The Alan Turing Institute – the UK’s national institute for data science and artificial intelligence.

What are Data Study Groups?

  • Intensive five day 'collaborative hackathons', which bring together organisations from industry, government, and the third sector, with talented multi-disciplinary researchers from academia
  • Organisations act as Data Study Group 'Challenge Owners', providing real-world problems and data sets to be tackled by small groups of highly talented, carefully selected researchers
  • Researchers brainstorm and engineer data science solutions, presenting their work at the end of the week

Challenges

Our challenges and datasets were provided by partner organisations - known as Challenge Owners - for researchers to work on over the week. They were:

  • Bristol City Council - Get Bristol moving: tackling air pollution in Bristol city centre
  • Rothamsted Research - Tackling hidden hunger through soils
  • University of Bristol - Machine learning for protein folding
  • University of Surrey/Royal College of General Practitioners (RCGP) - Improving our ability to use routine data to inform the management of key disease areas
  • University of Bristol - Applying AI and machine learning to reveal the molecular basis of heart disease
  • University of Bristol Theatre Collection - The language of love: mining the correspondence of Oliver Messel 

Read more about the event from the perspective of one of the Challenge Owners, Danielle Paul, on the JGI Blog.

Please see below for further details on each challenge. 

Challenge descriptions

Bristol City Council 

Get Bristol moving: tackling air pollution in Bristol city centre

We are interested in data scientists mining our datasets to see if there are any interesting (or unexpected) patterns in the data that the Council could use to help improve congestion and improve air quality.  Historic datasets will be provided on air quality, data traffic congestion, traffic count, journey time, average speed, and traffic flow. For example, we know that school commuting traffic is a significant influence on air quality. Could the datasets available in Bristol be used to derive a relationship between school car traffic and NOx emissions that could be used to calibrate modelled NOx emissions from cars?

Rothamsted Research

Tackling hidden hunger through soils

Throughout the world many soils, and hence crops, are deficient in micronutrients. This translates into micronutrient deficiencies (‘hidden hunger’) in humans. ‘Mid infrared spectroscopy’ is a cost-effective technique for functional analysis of soils, but provides complex, difficult to interpret high-dimensional data. In this project the aim is to harness data science and AI methods to predict soil and plant nutrient content from large numbers of mid-infrared spectra and supporting metadata of African soils, thereby informing crop management for increased food quality and human health benefit.

University of Bristol

Machine learning for protein folding

Proteins are linear chains of amino acids, which, in computational terms, can be written as strings of letters representing the 20 different amino acids. These strings encode how the protein chains fold up into their functional 3D structures. Although predicting the 3D structure of a protein from the sequence alone is extremely difficult, thanks to abundant sequence and structural data (>4000 structures), this problem is becoming more tractable for one important class of protein, the coiled coil. The challenge is to interrogate this sequence dataset to predict structure, enabling the design of new coiled coils with potential applications.

University of Surrey/Royal College of General Practitioners (RCGP)

Improving our ability to use routine data to inform the management of key disease areas

It is essential to monitor blood pressure in various chronic diseases (e.g. heart disease, diabetes, etc). However, GPs tend to indicate certain biases in recording measurements, for example a preference for round numbers. We have 47 million blood pressure readings and 7 million glycated haemoglobin (HbA1c) readings (a measure of diabetes control) and we are interested in finding the true blood pressure and HbA1c trends from the inaccurate data, comparing trends for different groups of patients (e.g. on various medications). Participants will attempt to develop a predictive algorithm using machine learning that corrects suboptimal data allowing for better disease monitoring.

University of Bristol

Applying AI and machine learning to reveal the molecular basis of heart disease

This is an image processing challenge with potential outcomes that could directly benefit our basic understanding of cardiac muscle proteins. The proteins that we are looking at are susceptible to mutations that cause Hypertrophic Cardio Myopathy which affects 1 out of 500 people and is the leading cause of adult sudden death. To obtain high resolution molecular models of these proteins we need to collect hundreds of thousands of images of our protein from noisy Cryo-EM data. Automation of the protein identification step will overcome a significant bottle neck in the image processing workflow. Manual annotation of the dataset to be used in the Data Study Group took 6 months, this highlights the critical need for a robust machine-based approach.  
 
University of Bristol
  
The language of love: mining the correspondence of Oliver Messel 
 
Historical archives contain a wealth of information for which text-mining techniques could be used to reveal new insights. The University of Bristol’s Theatre Collection archive contains the personal handwritten correspondence of Oliver Messel (1904-1978): a designer, architect, “bright young thing”, campaigner and gay man living and working in England and Barbados. In this challenge, we invite participants to optical character recognition and text-mining techniques to and identify the linguistic patterns in his most personal correspondence – his letters to his lover, Vagn Riis-Hansen. The dataset contains high-quality images of 30 pages of Oliver’s handwritten letters, some of which have been (manually) transcribed, as well as a larger database of letters with notes by archivists.  

The Turing Network Data Study Group - a Challenge Owner's perspective

Read more about the event from the perspective of one of the Challenge Owners, Danielle Paul, on the JGI Blog.

Edit this page