Research themes & foundations.
The RAI Lab works at the intersection of machine learning capability and accountability. Our research is anchored by four high-level themes, each grounded in the three foundational pillars of Responsible AI.
Responsible AI
Our central research direction. Responsible AI is the practice of designing, evaluating, and deploying AI systems with explicit attention to their broader impact — on individuals, institutions, and society. This theme integrates work across all three of our foundational pillars.
Explainability, Interpretability & Transparency
Methods that make AI behaviour legible: data-centric explanations, post-hoc interpretation, and transparency tooling for deployed systems.
Privacy & Safety
Frameworks that protect sensitive data, prevent training-data leakage, and ensure safe operation in user-facing settings.
Security & Robustness
Defences against adversarial inputs, robustness under distribution shift, and reliability across deployment conditions.
Trustworthy AI
Trust is what an AI system earns when its behaviour matches its claims. Our work investigates the conditions under which models can be reliably trusted — including fairness across subpopulations, accountability for outcomes, and verifiable behaviour across edge cases.
We develop evaluation frameworks, auditing methodologies, and intervention techniques that surface failure modes before deployment, with current applications in healthcare and high-stakes decision systems.
LLM-powered Systems
Large language models have transformed what AI systems can do — and introduced new questions about how they should be built. Our work on LLM-powered systems focuses on making them controllable, correctable, and grounded.
Current projects include retrieval-augmented generation (combining LLMs with structured retrieval to ground outputs in verified sources), model editing (correcting deployed models precisely without retraining), and task-driven control (steering LLM behaviour toward specified objectives).
Agentic AI
Agentic AI extends language models from passive responders into autonomous actors — systems that plan over multiple steps, invoke tools, and pursue objectives in complex environments.
Our research focuses on the safety and reliability of agent systems: how to make agent behaviour predictable, how to ensure agents remain aligned with user intent across long-running tasks, and how to design interfaces that let humans meaningfully oversee agent decisions.