Real-Time RL Agents Learn Optimal Planning Budgets

Aneesh Muppidi, Firas Darwish, Dylan Cope, Jo\~ao F. Henriques, Jakob Nicolaus Foerster· June 26, 2026 View original

▶ The 2-minute explainer

Summary

This paper addresses the challenge of deliberation time in real-time Reinforcement Learning (RL) environments, where the environment progresses while the agent plans. It introduces variable-delay real-time RL and proposes training a lightweight gating policy on top of a planner to select state-dependent planning budgets, outperforming fixed-budget baselines across various real-time games.

In most Reinforcement Learning (RL) scenarios, the environment conveniently pauses, waiting indefinitely for an agent to make its decision. However, real-time environments present a significant challenge: deliberation takes time, and the environment continues to evolve during an agent's planning phase. This research delves into this critical aspect of real-time RL, where the environment's progression is not halted by the agent's internal processing. Building on existing real-time formalizations, the authors introduce a new concept: variable-delay real-time RL. In this setting, the agent itself must decide how long to deliberate at each decision point, acknowledging that this planning time has consequences as the environment advances. For planning agents, the optimal deliberation time is highly dependent on the current state, and attempting to plan *how long to plan* can lead to an infinite regress or "paralysis." To overcome this, the paper proposes an innovative solution: training a lightweight gating policy that operates in conjunction with a planner. This gating policy learns to dynamically select state-dependent planning budgets, effectively determining the optimal amount of "thinking time" for the agent in any given situation. Tested across a suite of real-time games including Pac-Man, Tetris, Snake, Speed Hex, and Speed Go, this gating policy consistently outperformed both fixed-budget and heuristic-based baselines. The approach also demonstrated successful transferability to a real-time setup involving two different GPUs for the environment and agent, highlighting its robustness and practical applicability.

Why it matters

For professionals developing AI for time-critical applications like autonomous systems, robotics, or high-frequency trading, this research offers a crucial method to optimize decision-making under real-time constraints. It enables AI agents to intelligently manage their computational resources, leading to more responsive and effective performance.

How to implement this in your domain

1Analyze existing real-time AI systems to identify scenarios where planning time significantly impacts performance.
2Explore implementing a lightweight gating policy to dynamically adjust planning budgets for RL agents.
3Experiment with variable-delay real-time RL in simulations to understand its impact on agent behavior.
4Evaluate the performance gains of state-dependent planning budgets compared to fixed-time or heuristic approaches in real-world applications.

Who benefits

RoboticsAutonomous VehiclesGaming AIHigh-Frequency TradingIndustrial Automation

Key takeaways

Real-time RL requires agents to manage deliberation time as the environment progresses.
Variable-delay real-time RL allows agents to choose their planning budget.
A lightweight gating policy can learn optimal state-dependent planning budgets.
This approach improves performance over fixed-budget methods in time-critical tasks.

Original post by Aneesh Muppidi, Firas Darwish, Dylan Cope, Jo\~ao F. Henriques, Jakob Nicolaus Foerster

"arXiv:2606.26463v1 Announce Type: new Abstract: Deliberating takes time. In real-time settings, that time is not free. Standard reinforcement learning (RL) sidesteps this as the environment waits indefinitely for the agent's decision. Instead, we study real-time RL environments w…"

View on X

Originally posted by Aneesh Muppidi, Firas Darwish, Dylan Cope, Jo\~ao F. Henriques, Jakob Nicolaus Foerster on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

Real-Time RL Agents Learn Optimal Planning Budgets

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Engineering & DevTools

MCP and A2A Protocols Standardize Agentic Internet Development

VISReg Enhances JEPA Training with Novel Regularization

Ford's AI-Driven Layoffs Backfire Significantly