The PURE Data competition

Using data science to identify and analyse interdisciplinary research at Bristol

The University Research Institutes (URIs) have a role in the university research strategy to promote and encourage cross-disciplinary research. The methods that we use (and explore) for doing this often involve running events such as interdisciplinary seminars/meetings, sandpits, seed-corn funds for grants in interdisciplinary areas etc.

Here one might consider what other resources we have at our fingertips to identify cross-disciplinary opportunities. We could, of course, use some data to this end and one obvious source might be the PURE (PUblications and REsearch) database that is self-generated by UoB.

The question is what would we use it for?

The Jean Golding Institute invited teams to analyse this dataset and to try to solve challenges such as:

  • How would we best visualise the data in PURE and how might we best use it to answer questions like which disciplines/schools are internally best connected in terms of research?
  • Which individuals/groups bridge the divides between schools and faculties?
  • If we were to take the PURE data and attempt to cluster the individuals into groups would these groups resemble the existing school / faculty structures? If not, what would they represent and would they be useful representations?
  • What is the best way to measure interdisciplinary research?

PURE data competition guidelines (PDF, 386kB)PURE data competition guidelines (PDF, 386kB)

The Results

The Jean Golding Institute launched the PURE data competition on 27th March 2017, the competition closed on 19th May, we received 8 entries. The standard was incredibly high and it was clear that teams had put in lots of effort on the competition. The entries were judged on the following criteria: originality, usability of entry, usefulness of findings, presentation and volume of work. The participants are currently working on a publication to share the methods and tools developed.

Winners - Ben Elsworth and Tom Gaunt (School of Social and Community Medicine)

Ben’s team created a piece of software called AXON (sample http://axon-jgi.biocompute.org.uk/) which allowed the user to interrogate the PURE database and pull out links between people, organisations and concepts harvested from within the abstracts of the outputs. The panel particularly liked the usability of the system and its ability to suggest links both new and potentially existing. It was also interesting that you could use concepts to link individuals. Ben is currently developing the tool so it can be used using current PURE data. 

Runner up - Ella Gale (School of Experimental Psychology)

organisational graph

Ella provided an extensive report along with the Python code developed for this project. She focused on different organisational units that exist in the current University structure and incorporated additional data on disciplines from Reuters for the individual publications. This allowed her to look at the multidisciplinarity of the different schools at Bristol and how the link together.

Runner up - Natalie Thurlby (Merchant Venturers' School of Engineering) and Emily Pole (School of Physics)

Natalie and Emily particularly looked at cross-faculty and cross-school collaborations and used these to look at the structure of the University. The report was easy to follow and threw up interesting findings about how closely the schools and faculties are in terms of where they publish and how interdisciplinary they are.

Runner up - Yi YuHaeron Cho and Jonty Rougier (School of Mathematics)

The team wrote an interesting report on their take on the interdisciplinarity of the university. They developed a brand new ‘flying saucer’ visualisation which was a really nice way to show the links between the various schools in Bristol and also used completion of triangles to identify pairs of schools that might work together and used innovative hive plots to display potential clusterings of schools. They also modelled co-authorship networks and identified key individuals that linked faculties together.

Flying saucer visualisation

 The other 4 entries were all commended for their efforts:

Commendation - Chris McWilliams (Merchant Venturers' School of Engineering)

Chris produced a web-based presentation that (optionally) gave source code for the visualisations displayed. The presentation was well done and had a good narrative and looked at both the network of collaborators and that of publications. He used the Louvain algorithm to identify the collaboration network and Shannon entropy to measure diversity. The panel thought the work was a great proof of concept which could be expanded on to identify specific results.

Commendation – Christos Ellinas and Naoki Masuda (Merchant Venturers' School of Engineering)

Christos and Naoki produced a nice report looking at the publishing culture and whether individuals are ‘lone wolfs’ or ‘social bees’. They looked at constructing networks of co-authors and coded points based on job title. They looked at distributions of an index Ik which measures how collaborative people are and observed the bimodality of this index.  They also produced nice plots of network diagrams at the organisation level identifying how collaborative organisations are with each other and looked at triangles to identify possible new collaborations in a slightly similar way to Yu et al.

Commendation – Kacper Sokol (Merchant Venturers' School of Engineering)

Kacper has produced a web-based presentation that includes interesting visualisations of the connections between the different organisational units in the University based on the data provided. He also looks at prediction for the future using tf-idf and cosine difference based around papers in the data provided and produces a new chord diagram. The visualisations are very nice and interactive and the source code is provided.

Commendation – Yu Chen, Tom Diethe, and Miquel Perello-Nieto (Merchant Venturers' School of Engineering)

Yu, Tom and Miquel have also produced a web-based presentation and accompanying source code. They have cleaned the data and produced a nice chord diagram of collaborations. They have looked at ordering organisational units into their collaborative nature and then used latent Direchlet allocation topic models on the data. These produce a really interesting plot giving information on topics and schools which looked fascinating however it would have helped to get more narrative on what this actually shows.

pure collage

Winners - Ben Elsworth and Tom Gaunt (School of Social and Community Medicine)

Runner up - Ella Gale (School of Experimental Psychology)

Runner up - Natalie Thurlby(Merchant Venturers' School of Engineering) and Emily Pole (School of Physics)

Runner up - Yi YuHaeron Cho and Jonty Rougier (School of Mathematics)

Edit this page