RL-Induced Tool Use Localized to Single LLM Feature

Andrii Shportko, Shubham Bhokare, Ahmed Zeyad A Alzahrani, Bowen Cheng, Gustavo Mercier, Jessica Hullman· June 26, 2026 View original

Summary

Researchers found that fine-tuning language models with Reinforcement Learning for tool use concentrates this capability into a minimal, steerable feature set called Dedicated Feature Crosscoders (DFC). This allows for runtime behavioral control and even transfers tool-calling ability to base models.

Fine-tuning large language models (LLMs) using Reinforcement Learning (RL) is known to enhance agentic behaviors, such as the ability to use external tools. However, the precise internal mechanisms and representational changes that enable these capabilities have remained largely unexplored. This research demonstrates that RL-induced tool-calling capability can be localized to a compact and steerable set of features, termed Dedicated Feature Crosscoders (DFC). By isolating these RL-specific features in a Qwen2.5-3B model, researchers observed significant improvements in tool correctness, boosting it by over 31 percentage points. Crucially, this method also enabled a "capability spillover," passively transferring tool-calling ability to the frozen base model by nearly 7 percentage points without further retraining. This finding suggests that DFC partitioning effectively concentrates RL-introduced capabilities into a minimal set, paving the way for runtime behavioral control of agentic LLMs.

Why it matters

This breakthrough offers a deeper understanding of how RL shapes LLM capabilities and provides a novel method for fine-grained, retraining-free control over agentic behaviors like tool use. Professionals can leverage this for more precise and efficient development of AI agents.

How to implement this in your domain

  1. 1Explore DFC techniques to analyze and understand the specific features enabling desired agentic behaviors in your LLMs.
  2. 2Implement runtime behavioral control mechanisms for LLMs by manipulating identified DFCs.
  3. 3Investigate transferring specific capabilities from fine-tuned models to base models using DFC-based "capability spillover."
  4. 4Develop more efficient RL fine-tuning strategies by focusing on the emergence and strengthening of DFCs.

Who benefits

AI/ML DevelopmentRoboticsSoftware EngineeringAutonomous Systems

Key takeaways

  • RL-induced tool use in LLMs can be localized to specific features.
  • Dedicated Feature Crosscoders (DFC) isolate these capabilities.
  • DFC manipulation improves tool correctness and enables runtime control.
  • Tool-calling ability can be transferred to base models without retraining.

Original post by Andrii Shportko, Shubham Bhokare, Ahmed Zeyad A Alzahrani, Bowen Cheng, Gustavo Mercier, Jessica Hullman

"arXiv:2606.26474v1 Announce Type: new Abstract: Fine-tuning through RL reshapes the internal representations of language models to enable agentic behaviors such as tool use, yet the mechanistic basis of these changes remains poorly understood. While RL substantially improves stru…"

View on X

Originally posted by Andrii Shportko, Shubham Bhokare, Ahmed Zeyad A Alzahrani, Bowen Cheng, Gustavo Mercier, Jessica Hullman on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses