COMET Enhances RL Planning with Causal Object-Centric Models.

Rodion Vakhitov, Leonid Ugadiarov, Alexey Skrynnik, Aleksandr Panov· June 15, 2026 View original

Summary

COMET (Causal Object-centric Model for Efficient Tree search) is a new model-based reinforcement learning algorithm that performs Monte Carlo Tree Search in a slot-structured latent space. It improves planning efficiency and performance by binding actions to objects and using object-causal attention for decision-making.

Model-based reinforcement learning (RL) algorithms that use Monte Carlo Tree Search (MCTS) are powerful, but their efficiency can be limited in complex environments. This research introduces COMET (Causal Object-centric Model for Efficient Tree search), a novel algorithm that enhances MCTS by operating within a slot-structured latent space, leveraging object-centric representations. COMET integrates a frozen unsupervised object-centric encoder with a transformer-based world model. A key innovation is its action-slot fusion mechanism, which binds actions directly to objects for more accurate slot transition prediction. Furthermore, the policy and value heads employ object-causal attention, modulating token interactions based on learned per-slot relevance scores. This ensures that the decision-making process focuses on the most task-relevant entities. By adding an explicit object-level inductive bias to MuZero-style latent planning, COMET demonstrates superior performance. Across eight visually and dynamically diverse tasks from benchmarks like Object-Centric Visual RL, ManiSkill, Robosuite, and VizDoom, COMET achieved higher mean normalized scores during the early stages of training compared to both object-centric and monolithic baselines.

Why it matters

For professionals developing advanced AI agents, particularly in robotics, gaming, or simulation environments, COMET offers a more efficient and effective approach to model-based planning. Its object-centric and causal attention mechanisms can lead to faster learning and better performance in complex, multi-object scenarios.

How to implement this in your domain

  1. 1Explore object-centric representations for improving the efficiency of model-based reinforcement learning agents.
  2. 2Investigate the integration of unsupervised object-centric encoders with transformer-based world models in planning systems.
  3. 3Implement action-slot fusion mechanisms to bind actions directly to relevant objects for more precise state transitions.
  4. 4Apply object-causal attention in policy and value heads to focus decision-making on task-relevant entities.
  5. 5Benchmark COMET-style approaches against existing MuZero-style latent planning algorithms in complex visual and dynamic tasks.

Who benefits

RoboticsGamingAutonomous SystemsVirtual RealitySimulation

Key takeaways

  • COMET enhances model-based RL planning using causal object-centric models.
  • It performs Monte Carlo Tree Search in a slot-structured latent space.
  • Action-slot fusion and object-causal attention improve planning efficiency.
  • COMET achieves higher performance in early training stages across diverse tasks.

Original post by Rodion Vakhitov, Leonid Ugadiarov, Alexey Skrynnik, Aleksandr Panov

"arXiv:2606.14418v1 Announce Type: new Abstract: We introduce COMET (Causal Object-centric Model for Efficient Tree search), a model-based reinforcement learning algorithm that performs Monte Carlo Tree Search in a slot-structured latent space. COMET pairs a frozen unsupervised ob…"

View on X

Originally posted by Rodion Vakhitov, Leonid Ugadiarov, Alexey Skrynnik, Aleksandr Panov on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses