Seminar series
Welcome to the Trustworthy Systems Lab Seminar Series!
Here you can find all the upcoming and previous seminars in the series. The focus of the seminars is on promoting discussion and debate around the topic of Trustworthiness.
The format of the talks are 20-30 mins of presentation and 30 mins for discussion and questions. We usually hold these weekly on a Wednesday lunchtime at midday (12:00). This is an open group so please share widely and get in contact if you wish to observe, join the debate or give a presentation.
Please contact us to be added to the mailing list where you will be sent an invitation to the talks each week along with an update of the next talks.
Details of upcoming talks and speakers can be found below.
13th May 2026 - Less Talk, More Code: Energy–Accuracy Trade-offs and Babbling Suppression in Local LLMs
Developers increasingly rely on generative AI-based coding assistants such as GitHub Copilot and Claude Code in their workflows. Since many such tools are accessible via remote APIs, data privacy and security, as well as cost concerns drive client organizations towards locally-deployed language models. This talk will present a study examining the accuracy-energy trade-off in local LLM deployment. We evaluated 26 LLM families (including both Mixture of Experts and dense architectures) across common software development tasks on two hardware configurations, a commodity GPU and a high-performance AI-specific GPU, considering both full-precision and quantized variants. Our results demonstrate that larger models with higher energy requirements do not consistently yield proportional accuracy gains. Moreover, quantized models often outperform full-precision medium-sized models in both efficiency and accuracy. We also find that no single model excels across all software development task types. Finally, the number of active parameters of a model, output length, and quantization level jointly explain over 73% of the variance in inference energy consumption for Code Generation and almost 90% for Docstring Generation. Prompt size, however, has a negligible impact on energy usage.
In the aforementioned study, we noticed that models often babble, i.e., produce many more tokens than required. For example, when asked to generate a solution to a programming problem, models would often produce, besides the code of the solution itself, informal explanations, tests, comments, and more. These extra tokens require additional resources to be produced and, in the case of large LLMs made available as services, cost literal money. Since solutions are often produced early in the generation process, it is possible to avoid most of these extra tokens by checking whether an acceptable solution has already appeared and stopping early. We call this approach babbling suppression and we have conducted a study applying it to the generation of Python and Java code, with two benchmarks for each one, across ten locally-executable language models. Babbling suppression achieved reductions of up to 65% for Python and 62% for Java in the amount of energy consumed by the generation process, with no negative impact on accuracy. This technique can be applied as a plugin to a workflow, i.e., without requiring expensive model retraining.
Dr Fernando Castor
Fernando Castor is an Associate Professor in the Formal Methods and Tools group, University of Twente. His broad research goal is to help developers build more efficient software systems more efficiently. More specifically, he conducts research in Software Engineering, with emphasis on Software Maintenance, Energy Efficiency, and Code Understandability.