Multivariate Analysis 3
To present various aspects of multivariate analysis, covering data exploration, modeling and inference.
Multivariate analysis is a branch of statistics involving the consideration of objects on each of which are observed the values of a number of variables. A wide range of methods is used for the analysis of multivariate data, both unstructured and structured, and this course will give a view of the variety of methods available, as well as going into some of them in detail.
Interpretation of results will be emphasized as well as the underlying theory.
Multivariate techniques are used across the whole range of fields of statistical application: in medicine, physical and biological sciences, economics and social science, and of course in many industrial and commercial applications.
Relation to other units
As with units MATH30013 (Linear and Generalised Linear Models) MATH 35120 (Experimental Design), and MATH 33800 (Time Series Analysis), this course is concerned with developing statistical methodology for a particular class of problems.
Applications will be implemented and presented using the statistical computing environment R (used in Probability 1 and Statistics 1).
To gain an understanding of:
- Scope of multivariate analysis;
- Multivariate normal distribution and Wishart distribution;
- Statistical inference for multivariate normal data;
- Principal components analysis;
- Scaling, classification and clustering;
- Implementation in the statistical computing environment R.
Self assessment by working examples sheets and using solutions provided.
- General introduction to multivariate data; revision of relevant matrix and linear algebra; linear transformations.
- Properties and decompositions of non-negative definite symmetric matrices.
- Principal components analysis; derivation of principal components as eigenvectors of covariance matrix; selection of a good low-dimensional representation; interpreting principal components; scaling problems.
- The multivariate Normal distribution: definition and properties. The standardised multivariate Normal distribution. Statistical inference for the mean of a multivariate normal with known variance, and when the variance matrix is estimated. Hotelling's T-squared statistic.
- Linear discriminant analysis; maximum likelihood and Bayesian allocation rules; probability of misclassification and its relation to Mahalanobis distance.
- Classification using cluster analysis; similarity, dissimilarity and distance measures; agglomerative algorithms for clustering; single linkage and the minimum spanning tree; complete linkage; dendrograms.
- Multidimensional scaling. Classical scaling; recovering a data matrix from Euclidean distances. Relationship with principal components. Ordinal scaling; defining and minimising the stress function. Least squares monotone regression and the pool-adjacent-violators algorithm.
Reading and References
There is no one set text. Any one of the following will be useful, particularly the first one (from which the notation for the course is taken):
- K V Mardia, J T Kent and J Bibby, Multivariate Analysis, Academic Press, 1979.
- W J Krzanowski, Principles of Multivariate Analysis: A User's Perspective. Clarendon Press, 1988.
- C Chatfield and A J Collins, Introduction to Multivariate Analysis. Chapman and Hall, 1986.
- Krzanowski, W. J. and Marriott, F. H. C. Multivariate Analysis, Parts I and II. Edward Arnold. 1994.
MATH11300 Probability 1, MATH 11400 Statistics 1, and MATH 11005 Linear Algebra & Geometry
Methods of teaching
- Lectures (including both theory and illustrative applications), exercises to be done by students.
Methods of Assessment
The pass mark for this unit is 40.
The final mark is calculated as follows:
- 100% from a 1 hour 30 minute exam in May/June
NOTE: Calculators of an approved type (non-programmable, no text facility) are allowed.
For information resit arrangements, please see the re-sit page on the intranet.
Further exam information can be found on the Maths Intranet.