# Stochastic Optimisation

## Unit aims

The underlying aim is to use a combination of models, techniques and theory from stochastic control and equilibrium selection to determine behaviour that is optimal with regard to some given reward structure.

## Unit description

Stochastic optimisation covers a broad framework of problems at the interface of applied probability and optimisation. The main focus of this unit is on Markov decision processes and game theory. Markov decision processes describe a class of single decision-maker optimisation problems that arise when applied probability models (eg Markov chains) are extended to allow for action-dependent transition distributions and associated rewards. Game theory problems are more complex in that they involve two or more decision makers (players), so the optimal action for each player will depend on the actions of other players. Here, we focus on Nash equilibria - strategies that are conditionally optimal in the sense that a player can not do better by changing their own strategy while other players stay with their current strategy

## Relation to other units

This unit is a first course  on stochastic optimisation.

## Learning objectives

Students who successfully complete this unit should be able to:

• recognise and construct appropriate formal Markov decision process (MDP) models and game theoretic models from informal problem descriptions;
• construct appropriate optimality equations for optimisation problems;
• understand and use appropriate computational techniques (including dynamic programming and policy and value iteration) to solve finite horizon, and infinite horizon discounted and average cost MDPs;
• understand the concept of a Nash equilibrium and an evolutionarily stable stategy;
• compute equilibrium policies for standard and simple non-standard games.

## Transferable skills

In addition to the general skills associated with other mathematical units, students will also have the opportunity to gain practice in the following: report writing, oral presentations, use of information resources, use of initiative in learning material in other than that provided by the lectures themselves, time management, general IT skills and word-processing.

## Syllabus

1. Markov decision problems (Markov controlled processes, reward/cost structure, optimal strategies).
2. Methods for Markov decision problems (linear programming, policy iteration, value iteration).
3. Static Games (Nash equilibrium, dominance, classification).
4. Population Games (Evolutionary game theory, evolutionary stable strategies, Nash equilibrium).

1. M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley, 2005.
2. D. P. Bertsekas, Dynamic Programming and Optimal Control, vol. 1 and 2, 2nd edition, Athena Scientific, 2005.
3. P. Whittle, Optimal Control: Basics and Beyond, Wiley, 1996.
4. R. Gibbons, A Primer in Game Theory, Prentice-Hall, 1992.
5. A. I. Houston and J. M. McNamara, Models of Adaptive Behaviour, Cambridge University Press, 1999.
6. J. Maynard Smith, Evolution and the Theory of Games, Cambridge University Press, 1982.

Unit code: MATHM6005
Level of study: M/7
Credit points: 10
Teaching block (weeks): 2 (19-24)

## Pre-requisites

MATH11300 Probability 1 and MATH11400 Statistics 1, while MATH20008 Probability 2 is desirable, but not essential.

None

## Methods of teaching

Lectures, supported by problem and solution sheets.

## Methods of Assessment

The pass mark for this unit is 50.

The final mark is calculated as follows:

• 100% from a 1 hour 30 minute exam in May/June

NOTE: Calculators of an approved type (non-programmable, no text facility) are allowed.

For information resit arrangements, please see the re-sit page on the intranet.

Please use these links for further information on relative weighting and marking criteria.

Further exam information can be found on the Maths Intranet.