Turing Seminar Series: Can large language models reason about qualitative spatial and temporal information?
Robert Blackwell, Senior Research Associate, Alan Turing Institute
43 Woodland Road, G.01 LT
Many claims have been made about the apparent intelligent behaviour of Large Language Models (LLMs), with some recent models being explicitly designed for reasoning tasks (e.g. OpenAI o1 and o3-mini). However, questions remain about the extent to which LLMs really demonstrate a capacity for common sense reasoning.
In natural language, spatial and temporal information is often represented qualitatively (using prepositions such as on, in, left of, north of, part of, under, touching, before, after, ...) and many qualitative spatial and temporal calculi have been developed to formalise such information (e.g. Allen’s Interval Algebra, Cardinal Direction Calculus, and Region Connection Calculus). These calculi have been extensively studied by AI researchers for decades, and we now use them to generate benchmarks to test state-of-the-art language models. By systematically examining the characteristics and failure modes of models, we aim to identify limitations and provide insights for researchers and policymakers concerning the reliability, safety, and utility of LLMs.
I hope that the talk will be of interest to researchers using LLMs for spatial applications, as well as those undertaking LLM evaluation.
Robert Blackwell is a Senior Research Associate at the Alan Turing Institute in London, where he works on the Fundamental AI Programme.
For more details please see the Turing Seminar Series Webpage