Themis: XAI Framework for RL with Human Feedback

Andreas Chouliaras, Luke Connolly, Dimitris Chatzpoulos· June 24, 2026 View original

Summary

This paper introduces Themis, an explainable AI (XAI)-enabled framework for Reinforcement Learning with Human Feedback (RLHF) that combines transparency and alignment. It supports over 200 environments, trains reward models matching true signals, and offers a scalable cloud platform for collecting human feedback.

Training safe Reinforcement Learning (RL) systems presents inherent challenges, particularly in guaranteeing the avoidance of undesirable behaviors. Two highly effective strategies for mitigating these risks are enhancing transparency through explainability (XAI) and ensuring alignment via human feedback. Despite their individual promise, a comprehensive, publicly available framework that integrates both capabilities has been lacking. To bridge this gap, researchers introduce Themis, an XAI-enabled testing and evaluation framework specifically designed for Reinforcement Learning from Human Feedback (RLHF). Themis is highly versatile, supporting over 200 widely used environments, and is easily configurable for various experiments in RL, transparency, and alignment. The framework demonstrates its effectiveness by training reward models that either match or surpass the true reward signal of an environment using human preferences. Additionally, Themis includes a user-friendly, auto-scalable cloud-based platform for efficiently collecting human feedback and managing experiments, capable of supporting large participant groups across multiple experiments without requiring extra development effort.

Why it matters

Themis provides a crucial tool for developing safer, more transparent, and human-aligned AI systems, addressing key concerns in responsible AI deployment and accelerating the research and application of RLHF.

How to implement this in your domain

1Evaluate current RL system development for safety, transparency, and alignment gaps.
2Explore integrating Themis to incorporate explainable AI and human feedback into RL workflows.
3Utilize Themis's framework to train reward models that accurately reflect human preferences.
4Leverage the cloud-based platform for scalable and efficient collection of human feedback for RLHF.
5Apply Themis to enhance the trustworthiness and ethical deployment of autonomous agents in sensitive applications.

Who benefits

AI Ethics & GovernanceAutonomous SystemsHealthcareEducationRobotics

Key takeaways

Safe RL systems require both transparency (XAI) and human feedback (RLHF).
Themis is an XAI-enabled framework combining these for RLHF.
It supports diverse environments and trains high-performing reward models from human preferences.
A scalable cloud platform facilitates efficient human feedback collection and experiment management.

Original post by Andreas Chouliaras, Luke Connolly, Dimitris Chatzpoulos

"arXiv:2606.24622v1 Announce Type: new Abstract: Training safe Reinforcement Learning (RL) systems is inherently challenging, with no guarantee of avoiding unwanted behaviors. The most effective defenses against this are (i) transparency through explainability and (ii) alignment v…"

View on X

Originally posted by Andreas Chouliaras, Luke Connolly, Dimitris Chatzpoulos on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

Themis: XAI Framework for RL with Human Feedback

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Research

VISReg Enhances JEPA Training with Novel Regularization

Margaret Atwood Criticizes AI for "Garbage In, Garbage Out" Flaw

Podcast Explores Large Test-Time Compute and AI Model Budgets