Skip to main content

Unit information: Advanced Data Analytics in 2021/22

Unit name Advanced Data Analytics
Unit code COMSM0088
Credit points 20
Level of study M/7
Teaching block(s) Teaching Block 2 (weeks 13 - 24)
Unit director Professor. Nabney
Open unit status Not open
Pre-requisites

EMATM0048 (SDPA) or EMATM0061 (SCEM)

Co-requisites

EMATM0044 (INAI) and COMSM0089 (INDA)

School/department Department of Computer Science
Faculty Faculty of Engineering

Description including Unit Aims

Visual analytics couples the visual representation of data with analytical processes to support complex decision making and understanding. A picture may be worth a thousand words, but only if it is well designed to represent data faithfully and meaningfully. This unit will enable students to create powerful analyses of data and communicate them effectively to non-specialists.

This unit extends the material taught in the co-requisite unit Introduction to Data Analytics by giving students a solid grounding in contemporary advanced machine learning. In visual analytics, such methods serve to as useful tools to change the data representation, e.g. through dimensionality reduction, or as a way of analysing visual data) in a framework of statistical pattern recognition; in text analytics, such methods serve to produce powerful analyses that traditional methods fail to deliver.

Machine learning topics covered by this unit include: principles of Statistical Pattern Recognition (probabilistic models for data, curse of dimensionality generalisation error, bias-variance dilemma); linear models (Probabilistic Principal Component Analysis; Discriminant Analysis); generalised dissimilarity mappings and neighbour embedding techniques; Gaussian Processes; latent variable models (Gaussian Mixture Models, Generative Topographic Mapping and Gaussian Process Latent Variable Model); Bayesian model regularisation and combination; feature selection; challenges of large datasets and potential solutions. The text analytics methods taught include rule-based approaches, traditional machine learning techniques, and also current leading techniques such as those based on deep-learning neural networks.

Throughout the unit there is a focus on understanding theory and modelling principles in order to apply them effectively to represent and analyse data

Intended Learning Outcomes

Students will be able to

  1. Apply established text analysis methods on large-scale text-data sources.
  2. Define the types and semantics of data.
  3. Build machine learning models for data and explain their operation in terms of a statistical pattern recognition framework.
  4. Use Bayesian regularisation and variational methods to fit models.
  5. Create user-focused visualisations of numerical, categorical, time series, and network data using visualization tools such as those available in the public domain via Python and Tablea

Teaching Information

Problem-based learning combining lecture elements with practical individual work.

Assessment Information

Mid-term coursework (30%): design and implement a system for automated analysis of a substantial text corpus and write a report on the findings from deploying this (ILO 1).

Final coursework (60%): Create a visualisation of key features of a medium-sized real-world dataset, analyse and evaluate the representation through a user trial, and report on conclusions relating them to the theory of information visualization (ILO 2, 3, 4, & 5).

Lab tests (10%): tests on the work, completed in the class.

Resources

If this unit has a Resource List, you will normally find a link to it in the Blackboard area for the unit. Sometimes there will be a separate link for each weekly topic.

If you are unable to access a list through Blackboard, you can also find it via the Resource Lists homepage. Search for the list by the unit name or code (e.g. COMSM0088).

How much time the unit requires
Each credit equates to 10 hours of total student input. For example a 20 credit unit will take you 200 hours of study to complete. Your total learning time is made up of contact time, directed learning tasks, independent learning and assessment activity.

See the Faculty workload statement relating to this unit for more information.

Assessment
The Board of Examiners will consider all cases where students have failed or not completed the assessments required for credit. The Board considers each student's outcomes across all the units which contribute to each year's programme of study. If you have self-certificated your absence from an assessment, you will normally be required to complete it the next time it runs (this is usually in the next assessment period).
The Board of Examiners will take into account any extenuating circumstances and operates within the Regulations and Code of Practice for Taught Programmes.

Feedback