View all news

National Compound Collection

18 June 2014

A small idea to search and collate chemical compounds in Bristol University’s School of Chemistry could lead to creation of new medicines, materials and agrichemicals, and is now growing to become a national project. Laura Broad and Tim Gallagher explain how this project would not have happened without the support from the Elizabeth Blackwell Institute.

In 2013 Laura Broad spent four months collecting and collating interesting chemical compounds on the shelves of the University of Bristol's School of Chemistry. Her task was to catalogue any compounds that were in a usable state and create a database of real compounds that could have a range of applications in the future.

Today, this project has evolved into the National Compound Collection and Dr Broad is coordinating a team of eleven data collectors, who search for compounds from 15 partner universities. But instead of looking in dusty boxes under researchers' desks, or iced up freezer compartments, these data collectors extract data (i.e. compound structures) from PhD theses and this encompasses molecules that were made, but may not exist anymore.

“We found that up to 80% of compounds described in PhD theses are not in any databases and that many samples had decomposed or simply did not exist anymore,” explains Dr Broad. “Our aim now is to catalogue these compounds, whether a sample exists or not, and create a powerful tangible database, where tangible means that the molecules can be (and once were) made. This will be a legacy for tax-payer-funded chemistry research from the past 30 years and could also help advance development of new compounds in a range of molecule-based industry sectors.”

Dr Broad's early work, led by Professor Tim Gallagher from the School of Chemistry at the University of Bristol, was funded by the Elizabeth Blackwell Institute. “This project started out as a small Bristol-based idea and finding funding for it from any other source would have been very difficult,” said Dr Broad. “But the EBI funding enabled us to get this project off the ground.”

The National Compound Collection is now in a pilot phase and is funded by the Royal Society of Chemistry. It involves 15 UK university chemistry departments and the British Library, and 12 data collectors who during the first half of 2014 will be manually extracting information on around 60,000 compounds contained in several hundred academic theses. Working closely with the RSC's e-Science team, the data collectors will input the information into the RSC's chemical structures database ChemSpider and the compounds will then be made available for in silico screening by groups from across industry and academia (e.g. BUDE at Bristol University). These user groups will assess (in silico) activity against a range of biological targets as well as provide an assessment of the diversity and uniqueness of the collection, relative to existing collections. In selected cases, the Pilot Project will also facilitate the synthesis of ‘real' physical samples of the most compelling in silico hits to enable biological activity to be assessed in relevant assays. 

The collection will comprise structures that span the diversity of synthetic research pursued over decades and as such will provide a means to access real, testable samples in previously untapped regions of chemical space (e.g. sp3-rich and chiral substructures). Not only will this help to accelerate research in molecule-dependent sectors such as medicines, materials and agrichemicals, the project will ensure that UK-funded academic research has a clearly defined route to delivering socio-economic impact. Since PhD theses are published documents, many of the IP issues associated with disclosing the structures are avoided and the intention is that the collection and associated structures will be openly accessible and viewable.

“We are excited that this project, which wouldn't have happened without the support from the Elizabeth Blackwell Institute, has been progressing fast and the next months will be vital to deliver the pilot study with maximum return and to plan for the next, and much bigger national phase,” said Professor Tim Gallagher. “Clearly, and if we are successful, there is no reason why a similar collection activity could not be done in other European Union countries and those collections could then be combined for mutual gain.  This project must address a range of concerns and challenges, especially around downstream IP and develop a sustainable funding model that will also be compatible with the realisation of a real i.e. comprehensive collection of physical samples and at the same time be an open access resource.”

Further information

To learn more about the funding available from the Elizabeth Blackwell Institute, including the Catalyst Fund, visit http://www.bristol.ac.uk/blackwell/funding/

Edit this page