The study by academics at the University of Bristol’s Intelligent Systems Laboratory is published online in ACM Transactions on Intelligent Systems and Technology.
Social networks, such as Facebook and microblogging services like Twitter, have only been around for a short time but in that time they have provided shapshots of real life by forming, electronically, public expression and interaction.
The research by Professor Nello Cristianini and Vasileios Lampos in the University’s Intelligent Systems Laboratory, geo-tagged user posts on the microblogging service of Twitter as their input data to investigate two case studies.
The first case study looked at levels of rainfall in a given location and time using the content of tweets. The second case study collected regional flu-like illness rates from tweets to find out if an epidemic was emerging.
The study builds on previous research that reported a methodology that used tweets to track flu-like illness rates in several UK regions. The research also demonstrated a tool, the Flu Detector, which uses the content of Twitter to map current flu rates in several UK regions.
Professor Nello Cristianini, speaking about the research, said: “Twitter, in particular, encouraged their 200 million users worldwide to make their posts, commonly known as tweets, publicly available as well as tagged with the user’s location. This has led to a new wave of experimentation and research using an independent stream of information.
“Our research has demonstrated a method, by using the content of Twitter, to track an event, when it occurs and the scale of it. We were able to turn geo-tagged user posts on the microblogging service of Twitter to topic-specific geolocated signals by selecting textual features that showed the content and understanding of the text.”
Over several months, the researchers were able to gather a database of over 50 million geo-located tweets, which could then be compared to official data from the UK’s National Health Service on flu incidence by region.
The researchers deployed state-of-the art machine learning algorithms that automatically figured out which keywords in the database of tweets were associated with elevated levels of flu. In this way they were able to create a predictive model that transformed keyword incidence in tweets into an estimate of the severity of flu in that area.
While it is true that Twitter users do not represent the general population, this study indicates that Twitter can be used to track an event.
Future work could be focused on improving various subtasks in the methodology, enabling researchers to become ever more expert at pinpointing situations, such as a flu outbreak or electoral voting intentions.
Paper: Nowcasting Events from the Social Web with Statistical Learning, Vasileios Lampos, Nello Cristianini. ACM Transactions on Intelligent Systems and Technology (ACM TIST), accepted for publication September 2011.