SceneBot Enables Humanoids to Track Complex Scene Interactions

Sirui Chen, Shibo Zhao, Zhen Wu, Jiaman Li, Guanya Shi, C. Karen Liu· June 29, 2026 View original

Summary

SceneBot is a new motion-tracking framework that allows humanoid robots to perform complex, contact-rich tasks by conditioning a single policy on both reference motions and per-link contact labels. It uses a novel hindsight scene reconstruction approach to infer interaction graphs from human motion, enabling generalization to unseen environments.

Current humanoid reinforcement learning policies are adept at free-space movements but struggle significantly with tasks involving physical contact with objects or uneven terrain. This limitation arises because purely kinematic tracking cannot resolve the physical ambiguities inherent in such interactions. Researchers have introduced SceneBot, a unified motion-tracking framework designed to overcome these challenges. SceneBot trains a single policy by providing it with both reference motions and explicit per-link contact labels, which define the expected interactions with the environment. This allows the humanoid to understand and execute complex behaviors involving contact. To address the scarcity of annotated interaction data, SceneBot employs a hindsight scene reconstruction method. This technique infers scene-interaction graphs directly from retargeted human motion. By training on 7.5 hours of this reconstructed, contact-rich data, SceneBot demonstrates the ability to generalize to new motions and environments, performing tasks like carrying a box upstairs, marking a significant step towards general humanoid control.

Why it matters

For robotics engineers and researchers, SceneBot represents a significant leap in humanoid control, enabling robots to perform more complex and practical tasks in unstructured environments, moving beyond simple free-space movements.

How to implement this in your domain

  1. 1Explore integrating SceneBot's contact-prompted control into existing humanoid robot platforms.
  2. 2Utilize the open-sourced code and data to replicate and adapt the framework for specific robotic applications.
  3. 3Develop custom scene-interaction graphs for novel tasks and environments.
  4. 4Test SceneBot's generalization capabilities in diverse physical simulation and real-world scenarios.
  5. 5Collaborate with researchers to extend SceneBot's capabilities to more nuanced human-robot interaction.

Who benefits

RoboticsManufacturingLogisticsHealthcareEntertainment

Key takeaways

  • Humanoid robots can now better handle contact-rich tasks using SceneBot's unified framework.
  • SceneBot conditions policies on both reference motions and explicit contact labels.
  • A hindsight scene reconstruction method generates necessary contact-rich training data.
  • The framework generalizes to unseen motions and environments, enabling complex, long-horizon tasks.

Original post by Sirui Chen, Shibo Zhao, Zhen Wu, Jiaman Li, Guanya Shi, C. Karen Liu

"arXiv:2606.27581v1 Announce Type: cross Abstract: Current humanoid reinforcement-learning policies excel at free-space motions but struggle with contact-rich tasks, as pure kinematic tracking cannot resolve the physical ambiguities of interacting with objects and uneven terrain.…"

View on X

Originally posted by Sirui Chen, Shibo Zhao, Zhen Wu, Jiaman Li, Guanya Shi, C. Karen Liu on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses