ResearchAI Research AI Engineering & DevTools

Coachable AI Agents Learn Styles for Interactive Gameplay.

Roberto Capobianco (Sony AI, Zurich, Switzerland), Harm van Seijen (Sony AI, North America, various locations), Nolan D. Bard (Sony AI, North America, various locations), Neil Burch (Sony AI, North America, various locations), Fatima Davelouis (Sony AI, North America, various locations), Josh Davidson (Sony AI, North America, various locations), Alisa Devlic (Sony AI, Zurich, Switzerland), Yunshu Du (Sony AI, North America, various locations), Ishan Durugkar (Sony AI, North America, various locations), Siddhant Gangapurwala (Sony AI, North America, various locations), Daniel Hernandez (Sony AI, North America, various locations), G. Zacharias Holland (Sony AI, North America, various locations), Sahil Jain (Sony AI, North America, various locations), Kenta Kawamoto (Sony AI, Tokyo, Japan), Raksha Kumaraswamy (Sony AI, North America, various locations), Patrick MacAlpine (Sony AI, North America, various locations), Dustin R. Morrill (Sony AI, North America, various locations), Declan Oller (Sony AI, North America, various locations), Francesco Riccio (Sony AI, Zurich, Switzerland), Akanksha Saran (Sony AI, North America, various locations), Craig Sherstan (Sony AI, Tokyo, Japan), Kaushik Subramanian (Sony AI, Zurich, Switzerland), Thomas J. Walsh (Sony AI, North America, various locations), Samuel Barrett (Sony AI, North America, various locations), Kizza N. Frisbee (Sony AI, North America, various locations), Mady Govil (Sony AI, North America, various locations), Johannes G\"unther (Sony AI, North America, various locations), Varun R. Kompella (Sony AI, North America, various locations), James A. MacGlashan (Sony AI, North America, various locations), Maxwell Svetlik (Sony AI, North America, various locations), Michael D. Thomure (Sony AI, North America, various locations), Jaden B. Travnik (Sony AI, North America, various locations), Kevin Waugh (Sony AI, North America, various locations), Elahe Aghapour (Sony AI, North America, various locations), Florian Fuchs (Sony AI, Zurich, Switzerland), Andreanne Lemay (Sony AI, North America, various locations), Shruti Mishra (Sony AI, Zurich, Switzerland), Takuma Seno (Sony AI, Tokyo, Japan), Peter Stone (Sony AI, North America, various locations), Michael Spranger (Sony AI, Tokyo, Japan), Peter R. Wurman (Sony AI, North America, various locations)· July 2, 2026 View original

Summary

This paper introduces a framework for creating coachable AI agents that can learn and exhibit specific "styles" while still completing their primary tasks in complex domains. By combining universal value function approximators (UVFAs) with tailored training, the framework allows real-time control over agent behavior, demonstrated in AAA video games and a humanoid test domain.

Reinforcement learning (RL) has been instrumental in developing advanced AI systems, from game-playing to robotics, typically resulting in agents that learn a single, near-optimal behavior for a given task. However, many applications require more flexible control, allowing users to influence *how* a task is performed, not just *if* it's completed. This desired modification of core task execution is referred to as "styles." A new framework addresses this need by enabling the creation of coachable agents that can exhibit diverse styles in complex environments. It integrates universal value function approximators (UVFAs) with carefully designed training scenarios, learning algorithms, and data augmentation techniques. This combination allows an end-user to select the agent's final behavior at runtime, providing flexible control over performance. The effectiveness of this framework has been demonstrated across varied domains, including the AAA video games Horizon Forbidden West and Gran Turismo, as well as an open-source humanoid walking simulation. In each case, the agents consistently adhered to the requested style modifications while successfully fulfilling their primary objectives, showcasing the framework's versatility and robustness.

Why it matters

Game developers, robotics engineers, and interactive AI designers can use this framework to create more engaging, customizable, and user-friendly AI systems that adapt to specific user preferences or dynamic operational needs.

How to implement this in your domain

1Explore UVFAs for developing flexible AI agent behaviors in your products.
2Design training scenarios that incorporate diverse "style" variations for agents.
3Implement data augmentation techniques to enhance style learning in RL.
4Develop user interfaces for real-time style selection and control of AI agents.
5Apply the framework to create more personalized experiences in games or simulations.

Who benefits

GamingRoboticsEntertainmentVirtual RealityEdTech

Key takeaways

RL agents can be made "coachable" to exhibit diverse styles beyond optimal task completion.
The framework combines UVFAs with tailored training and data augmentation.
Users gain real-time control over agent behavior and style.
Demonstrated success in AAA games and humanoid robotics.

Original post by Roberto Capobianco (Sony AI, Zurich, Switzerland), Harm van Seijen (Sony AI, North America, various locations), Nolan D. Bard (Sony AI, North America, various locations), Neil Burch (Sony AI, North America, various locations), Fatima Davelouis (Sony AI, North America, various locations), Josh Davidson (Sony AI, North America, various locations), Alisa Devlic (Sony AI, Zurich, Switzerland), Yunshu Du (Sony AI, North America, various locations), Ishan Durugkar (Sony AI, North America, various locations), Siddhant Gangapurwala (Sony AI, North America, various locations), Daniel Hernandez (Sony AI, North America, various locations), G. Zacharias Holland (Sony AI, North America, various locations), Sahil Jain (Sony AI, North America, various locations), Kenta Kawamoto (Sony AI, Tokyo, Japan), Raksha Kumaraswamy (Sony AI, North America, various locations), Patrick MacAlpine (Sony AI, North America, various locations), Dustin R. Morrill (Sony AI, North America, various locations), Declan Oller (Sony AI, North America, various locations), Francesco Riccio (Sony AI, Zurich, Switzerland), Akanksha Saran (Sony AI, North America, various locations), Craig Sherstan (Sony AI, Tokyo, Japan), Kaushik Subramanian (Sony AI, Zurich, Switzerland), Thomas J. Walsh (Sony AI, North America, various locations), Samuel Barrett (Sony AI, North America, various locations), Kizza N. Frisbee (Sony AI, North America, various locations), Mady Govil (Sony AI, North America, various locations), Johannes G\"unther (Sony AI, North America, various locations), Varun R. Kompella (Sony AI, North America, various locations), James A. MacGlashan (Sony AI, North America, various locations), Maxwell Svetlik (Sony AI, North America, various locations), Michael D. Thomure (Sony AI, North America, various locations), Jaden B. Travnik (Sony AI, North America, various locations), Kevin Waugh (Sony AI, North America, various locations), Elahe Aghapour (Sony AI, North America, various locations), Florian Fuchs (Sony AI, Zurich, Switzerland), Andreanne Lemay (Sony AI, North America, various locations), Shruti Mishra (Sony AI, Zurich, Switzerland), Takuma Seno (Sony AI, Tokyo, Japan), Peter Stone (Sony AI, North America, various locations), Michael Spranger (Sony AI, Tokyo, Japan), Peter R. Wurman (Sony AI, North America, various locations)

"arXiv:2607.00642v1 Announce Type: new Abstract: Reinforcement learning has proven to be a valuable tool in the creation of advanced AI and robotic systems, contributing to everything from game playing to robotics to foundation models. Through trial-and-error, these AI systems typ…"

View on X

Originally posted by Roberto Capobianco (Sony AI, Zurich, Switzerland), Harm van Seijen (Sony AI, North America, various locations), Nolan D. Bard (Sony AI, North America, various locations), Neil Burch (Sony AI, North America, various locations), Fatima Davelouis (Sony AI, North America, various locations), Josh Davidson (Sony AI, North America, various locations), Alisa Devlic (Sony AI, Zurich, Switzerland), Yunshu Du (Sony AI, North America, various locations), Ishan Durugkar (Sony AI, North America, various locations), Siddhant Gangapurwala (Sony AI, North America, various locations), Daniel Hernandez (Sony AI, North America, various locations), G. Zacharias Holland (Sony AI, North America, various locations), Sahil Jain (Sony AI, North America, various locations), Kenta Kawamoto (Sony AI, Tokyo, Japan), Raksha Kumaraswamy (Sony AI, North America, various locations), Patrick MacAlpine (Sony AI, North America, various locations), Dustin R. Morrill (Sony AI, North America, various locations), Declan Oller (Sony AI, North America, various locations), Francesco Riccio (Sony AI, Zurich, Switzerland), Akanksha Saran (Sony AI, North America, various locations), Craig Sherstan (Sony AI, Tokyo, Japan), Kaushik Subramanian (Sony AI, Zurich, Switzerland), Thomas J. Walsh (Sony AI, North America, various locations), Samuel Barrett (Sony AI, North America, various locations), Kizza N. Frisbee (Sony AI, North America, various locations), Mady Govil (Sony AI, North America, various locations), Johannes G\"unther (Sony AI, North America, various locations), Varun R. Kompella (Sony AI, North America, various locations), James A. MacGlashan (Sony AI, North America, various locations), Maxwell Svetlik (Sony AI, North America, various locations), Michael D. Thomure (Sony AI, North America, various locations), Jaden B. Travnik (Sony AI, North America, various locations), Kevin Waugh (Sony AI, North America, various locations), Elahe Aghapour (Sony AI, North America, various locations), Florian Fuchs (Sony AI, Zurich, Switzerland), Andreanne Lemay (Sony AI, North America, various locations), Shruti Mishra (Sony AI, Zurich, Switzerland), Takuma Seno (Sony AI, Tokyo, Japan), Peter Stone (Sony AI, North America, various locations), Michael Spranger (Sony AI, Tokyo, Japan), Peter R. Wurman (Sony AI, North America, various locations) on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

More in AI Research

AI ResearchAI Engineering & DevTools

Human Feedback Guides Generative Meta-Learning for Robust Generalization.

This paper introduces Generative Meta-Learning with Human Feedback (GMHF), a framework that uses expert intuition to guide data synthesis and bridge the domain gap for machine learning models. GMHF employs a Conditional Neural ODE as a generative digital twin and an RL agent to refine latent physical parameters based on feedback, significantly reducing deployment loss and improving generalization under distribution shifts.

Midhun Parakkal Unni, Samuel KaskiJul 2, 2026

AI ResearchAI Engineering & DevTools

Valdi: Value Diffusion World Models for MPC

Valdi introduces Value Diffusion World Models, combining end-to-end online training for Model Predictive Control (MPC) with a latent diffusion dynamics model. Preliminary experiments show that Valdi, using a single diffusion step, matches deterministic MLP baselines in the CarRacing environment, highlighting a trade-off between predictive multimodality and control performance.

Christopher Lindenberg, Kashyap ChittaJul 2, 2026

AI Engineering & DevToolsAI Research

Task-Aware LLM Quantization Improves Efficiency and Performance.

This paper introduces TASA (Task-Aware Sensitivity Analysis), a two-level framework for mixed-precision quantization of large language models (LLMs) that optimizes calibration data composition and bit allocation. TASA addresses the "Perplexity Illusion" and the "Alignment-Diversity Tradeoff," enabling 3.5-bit models to match or surpass 4-bit baselines by jointly considering perplexity and reasoning-oriented sensitivity.

Fei Wang, Chao Xue, Taoran Liu, Li Shen, Ye Liu, ChangXing DingJul 2, 2026