Unit name | Applied Data Science |
---|---|
Unit code | COMSM0017 |
Credit points | 10 |
Level of study | M/7 |
Teaching block(s) |
Teaching Block 2 (weeks 13 - 24) |
Unit director | Professor. Peter Flach |
Open unit status | Not open |
Pre-requisites |
The unit assumes a good working knowledge in the key machine learning and data mining techniques, for instance as acquired in COMS30007 Machine Learning, and programming skills in a major language. |
Co-requisites |
None |
School/department | Department of Computer Science |
Faculty | Faculty of Engineering |
This unit introduces key data science concepts and their application to support data-driven approaches to problem solving.
The aim of this unit is to allow students to acquire fundamental skills covering the full data science pipeline, including the pre-processing, manipulation, integration, storage, exploration, visualisation and privacy. Students will study techniques to transform raw data into advanced representations that will enable a deeper understanding of the original data:
The students will also gain practical skills in handling structured and unstructured data, gaining hands-on experience of software tools widely used in real-world settings.
On completion of the unit, students will:
This unit involve lectures that will cover the recent advances in applied data science. The topics are addressed from a practical point of view, following the emphasis of a hands-on point of view. This will enable students from different backgrounds to be able to understand the fundamentals of the data science techniques that they will implement in the coursework.
In addition there will be weekly Q&A sessions in which students can get help, advice and feedback on their current progress with the coursework.
100% coursework.
Assessment will be through a significant data science project, which will be carried in groups of 4-5 students. The projects will be on the basis of real-life data provided by a number of domain experts. Groups will need to pitch for 2 projects after which the allocation is made. The groups do their software development on a platform such as Github, and can request formative feedback on their progress up to three times before the final submission, at a time of their choosing. 1-2 weeks before the final submission there will be a workshop where all groups present their proposed solution to the entire cohort and the domain experts and therefore will be able to incorporate any further formative feedback into their final submission. This final submission is due at the end of the teaching block and will be summatively assessed on all intended learning outcomes as they correspond to different stages of the data science pipeline.
Mining of Massive Datasets, Anand Rajaraman, Jeffrey David Ullman, Cambridge University Press, 2011.
Principles of Data Mining, David J. Hand, Heikki Mannila and Padhraic Smyth, MIT Press, 2001.
Information Visualization, Colin Ware, Morgan Kaufmann, 2012.
Additional reading material in the form of research papers, online resources, etc.