Jack Hanslope

Working PhD Project Title:
Imitating Careful Experts to avoid Catastrophic Events
Academic Background
MMaths, University of Exeter, 2015-2019

General Profile

My undergraduate degree was an MMath in Mathematics from the University of Exeter; I graduated in 2019. I primarily focused on pure mathematics including logic, algebra, analysis and number theory and my dissertation was on elliptic curve cryptography. Before joining the CDT, I spent a year in industry building software which helped me to develop my programming skills. When I'm not studying or coding, I like to spend my free time sailing.

Research Project Summary:

My research is in the cross-over of inverse reinforcement learning and safe reinforcement learning.

Reinforcement learning is a very powerful framework for enabling an intelligent agent to learn to perform optimally in a given environment. The agent does this by learning a policy that maps states to actions in such a way as to maximize the reward accumulated. Reinforcement learning is heavily dependant on the reward function, which must be specified in advance by the researcher. In some domains, there is a very obvious choice reward function. One such domain is board games, where the agent usually receives plus one reward for winning and negative one reward for losing. In settings where the reward function is not obvious, researchers may attempt to learn it using a process called inverse reinforcement learning. When provided with a set of expert demonstrations (or an expert policy), an inverse reinforcement learner will attempt to recover the reward function that the expert is optimizing.

Safe reinforcement learning is reinforcement learning in scenarios where the agent must adhere to some safety constraints either during training, deployment or both. One way in which these constraints may be specified is in the form of catastrophic events; these are events that must be avoided at all costs. It will often be the case that such events will not be observed at all in expert trajectories meaning that supervised learning is inappropriate for such situations. In nature, many animals (including humans) learn to avoid catastrophic events by observing an expert. This expert will usually behave in a more careful way when they perceive themselves to be at risk of some catastrophic event occurring and so the student learns to avoid that catastrophe too.

One of the key difficulties a reinforcement learning agent must overcome is to be able to distinguish between which unseen scenarios are safe to explore and which are unsafe. My research will explore whether an inverse reinforcement learning agent is better able to learn to distinguish between safe unseen and unsafe unseen states by observing an expert that displays varying amounts of carefulness.

The applications for such research are numerous, with one being the domain of robotics. Suppose we want to train an inverse reinforcement learning agent to operate a robot by observing human demonstrations. The human operator of a robot will be more careful if another human enters the operational area of the robot to avoid harming the human. It would be beneficial for the reward function learned by inverse reinforcement learning to capture the fact that harming the human would be a catastrophic event.