LaGO Improves Online Reinforcement Learning with LLM Guidance

Kuan-Yen Liu, Ren-Jyun Huang, Ti-Rong Wu· June 24, 2026 View original

Summary

This paper introduces LaGO, a framework that uses a pretrained Large Language Model (LLM) as a latent action prior to softly guide online policy optimization in Reinforcement Learning (RL). Unlike direct LLM controllers, LaGO improves both reward and success rate on discrete and continuous control benchmarks by providing subtle guidance.

Large Language Models (LLMs) have demonstrated significant potential in planning and sequential decision-making tasks. However, directly using LLMs as controllers in Reinforcement Learning (RL) often proves unreliable due to the need for precise action generation. This approach can lead to instability and suboptimal performance in practical applications. To address these limitations, researchers propose Latent Action Guidance for Online Reinforcement Learning (LaGO). Instead of explicit planning or control, LaGO leverages a pretrained LLM to provide a "latent action prior," which softly guides the online policy optimization process. This subtle guidance helps the RL agent learn more effectively without the rigidity of direct LLM control. Experiments conducted on both discrete-control (CLEVR-Robot) and continuous-control (Meta-World) benchmarks show that LaGO consistently enhances both the reward obtained and the success rate compared to standard PPO. Notably, LaGO significantly increased success rates, suggesting that incorporating LLM knowledge as a guiding prior can substantially improve planning and decision-making in online RL environments.

Why it matters

This framework offers a more robust and effective way to integrate the powerful planning capabilities of LLMs into reinforcement learning, leading to more successful and efficient autonomous agents in complex environments.

How to implement this in your domain

  1. 1Evaluate existing reinforcement learning agents for potential performance bottlenecks in planning or decision-making.
  2. 2Explore integrating pretrained LLMs as latent action priors to softly guide RL policy optimization.
  3. 3Experiment with LaGO's approach to improve success rates and rewards in discrete and continuous control tasks.
  4. 4Consider fine-tuning or selecting stronger LLMs to provide more effective guidance for RL agents.
  5. 5Apply this guidance mechanism to develop more robust and efficient autonomous systems in robotics or industrial automation.

Who benefits

RoboticsAutonomous VehiclesGamingIndustrial AutomationLogistics

Key takeaways

  • Direct LLM control in RL can be unreliable due to precise action generation requirements.
  • LaGO uses LLMs as a latent action prior to softly guide online policy optimization.
  • This approach significantly improves reward and success rates in various control benchmarks.
  • Stronger LLMs provide more effective guidance, enhancing planning and decision-making.

Original post by Kuan-Yen Liu, Ren-Jyun Huang, Ti-Rong Wu

"arXiv:2606.24669v1 Announce Type: new Abstract: Large language models (LLMs) have shown strong potential for planning and sequential decision-making, but prior work often relies on using them as direct controllers, which requires precise action generation and can be unreliable in…"

View on X

Originally posted by Kuan-Yen Liu, Ren-Jyun Huang, Ti-Rong Wu on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses