Diagnosing RL Challenges for Clinical AI Agents in FHIR Environments

Ananya Mantravadi, Harshit Rajgarhia, Prasanna Desikan, Abhishek Mukherji· July 3, 2026 View original

Summary

Research audits MedAgentBench, identifying significant barriers to applying Reinforcement Learning (RL) for clinical protocol execution in FHIR environments, including capability ceilings and format-knowledge gaps. It proposes a taxonomy to predict RL learnability and suggests combining Supervised Fine-Tuning (SFT) with RL to overcome these issues.

The application of Reinforcement Learning (RL) to clinical protocol execution, such as checking lab values or placing FHIR orders, holds great promise due to the availability of verifiable rewards from clinical SMEs. However, this paper investigates the practical challenges of deploying RL in such environments, specifically auditing MedAgentBench v1/v2. The audit revealed a substantial "silent-finish ceiling" where inaction becomes the dominant RL strategy, leading to the creation of an improved MedAgentBench-v3 (MAB-v3) with a much lower ceiling. Training a Qwen3-8B model on MAB-v3 exposed two critical structural barriers: a "capability ceiling" where the base model has zero performance on many task types, preventing gradient-based learning, and a "format-knowledge barrier" where exact clinical codes are required but undiscoverable through exploration. Pure RL achieved only 18.2% pass rate compared to 34.1% for rule-based Supervised Fine-Tuning (SFT), with this gap entirely attributed to these barriers. The research introduces a taxonomy for decision, format-knowledge, and lookup tasks, which helps predict RL learnability. It concludes that a hybrid approach is necessary: SFT should be used to inject required codes and foundational knowledge, while RL can then be applied to learn complex conditional logic. This combined strategy is essential for effective clinical AI agent development.

Why it matters

For healthcare professionals and AI developers, understanding the specific limitations of RL in complex clinical environments is crucial for designing effective and safe AI agents. The proposed hybrid approach offers a practical pathway to overcome current barriers and accelerate AI adoption in clinical settings.

How to implement this in your domain

  1. 1Adopt a hybrid training strategy, combining Supervised Fine-Tuning (SFT) for foundational knowledge with Reinforcement Learning (RL) for conditional logic in clinical AI agents.
  2. 2Prioritize pre-training or fine-tuning LLMs with domain-specific clinical codes and FHIR format knowledge before applying RL.
  3. 3Utilize the proposed decision/format-knowledge/lookup taxonomy to assess the learnability of new clinical tasks for RL.
  4. 4Develop robust verification systems for RL agents in clinical settings to prevent "silent failures" and ensure patient safety.

Who benefits

HealthcareMedical TechnologyAI/ML EngineeringPharmaceuticals

Key takeaways

  • Pure Reinforcement Learning faces significant barriers in clinical FHIR environments due to capability and format-knowledge gaps.
  • Inaction can become a dominant strategy for RL agents if environments are not carefully designed.
  • A hybrid approach combining SFT for foundational knowledge and RL for conditional logic is more effective.
  • A new taxonomy helps predict which clinical tasks are suitable for RL and guides development strategies.

Original post by Ananya Mantravadi, Harshit Rajgarhia, Prasanna Desikan, Abhishek Mukherji

"arXiv:2607.01470v1 Announce Type: new Abstract: Clinical protocol-execution tasks -- checking a lab value, applying a threshold, placing a correctly structured FHIR order -- are natural candidates for RL from world feedback: once clinical SMEs encode decision logic into a verifie…"

View on X

Originally posted by Ananya Mantravadi, Harshit Rajgarhia, Prasanna Desikan, Abhishek Mukherji on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses