Skip to main content

Unit information: Advanced Data Science for Scientific Computing in 2022/23

Please note: you are viewing unit and programme information for a past academic year. Please see the current academic year for up to date information.

Unit name Advanced Data Science for Scientific Computing
Unit code SCIF30003
Credit points 20
Level of study H/6
Teaching block(s) Teaching Block 4 (weeks 1-24)
Unit director Dr. Fey
Open unit status Not open
Units you must take before you take this one (pre-requisite units)

SCIF20001 - Intermediate Scientific Computing

Units you must take alongside this one (co-requisite units)

None

Units you may not take alongside this one

.

School/department Science Faculty Office
Faculty Faculty of Science

Unit Information

This unit is intended for students in the third year of the new “X with Scientific Computing” and Data Science degrees. This unit will give students an introduction to data-intensive science, big data and machine learning, along with reviewing the statistical methods which underpin data science. It will cover the challenges associated with handling large datasets, including:

  • preparation, management and storage of data
  • processing data across distributed file systems (e.g.Hadoop and MapReduce, and cloud computing systems)
  • SQL and NoSQL databases and statistical inference.

It will also give an introduction to experimental design and machine learning techniques, and review basic statistics, covering topics including:

  • choice of models (regression and prediction)
  • tuning parameters
  • model evaluation, under and overfitting of data
  • common machine learning algorithms.

In addition, some advanced data visualisation and dimensionality reduction techniques for multi-dimensional data will be explored in the context of data analysis.

Your learning on this unit

After completing this unit, students should be able to:

  1. Explain the basic steps involved in preparing and curating data and assess data using standard statistical descriptors.
  2. Process and analyse data on a distributed computing cluster and cloud-based services
  3. Set up and use simple SQL and NoSQL databases
  4. Explain different techniques for extracting information from data and select suitable regression models
  5. Describe the basic principles of machine learning, including choice of models and tuning of parameters
  6. Apply some of the more common learning and clustering algorithms used in machine learning
  7. Describe and implement advanced data visualisation techniques for multi-dimensional data sets.

How you will learn

The unit is taught through a flipped approach, using a combination of asynchronous online material to introduce the more mathematical or theoretical concepts, with structured asynchronous self-paced activities to allow students to develop understanding and put into practice what they have learnt, supported by synchronous online, and subsequently, if possible, face-to-face group workshops and office hours, as well as seminars and some lectures. The lectures will cover the more mathematic or theoretical concepts. We will make use of online forum and collaboration tools such as wikis to foster a collaborative and creative mindset. Feedback will be provided for coursework and formal assessments.

How you will be assessed

Formative assessment will be through a set of on-line tutorials and exercises. Summative assessment will be through four online tests (20%, ILOs 2, 3, 6), a set of three programming/data analysis exercises (45%, ILOs 1-3, 6, 7) and a research project (35%, ILOs 4-7).

Resources

If this unit has a Resource List, you will normally find a link to it in the Blackboard area for the unit. Sometimes there will be a separate link for each weekly topic.

If you are unable to access a list through Blackboard, you can also find it via the Resource Lists homepage. Search for the list by the unit name or code (e.g. SCIF30003).

How much time the unit requires
Each credit equates to 10 hours of total student input. For example a 20 credit unit will take you 200 hours of study to complete. Your total learning time is made up of contact time, directed learning tasks, independent learning and assessment activity.

See the Faculty workload statement relating to this unit for more information.

Assessment
The Board of Examiners will consider all cases where students have failed or not completed the assessments required for credit. The Board considers each student's outcomes across all the units which contribute to each year's programme of study. If you have self-certificated your absence from an assessment, you will normally be required to complete it the next time it runs (this is usually in the next assessment period).
The Board of Examiners will take into account any extenuating circumstances and operates within the Regulations and Code of Practice for Taught Programmes.

Feedback