AI Coaches Struggle with Explanations and Visual Grounding in Software Training

Meng Chen, Anya Ji, Tsung-Han Wu, Tobias Maringgele, David M. Chan, Alane Suhr, Amy Pavel· July 2, 2026 View original

Summary

A new multimodal dataset, DigitalCoach, reveals that state-of-the-art AI models, when coaching humans on computer use, provide more direct instructions but fewer explanations, error diagnoses, or knowledge checks than human experts. Models also struggle with visual grounding, leading to passive learning.

As AI agents become more adept at automating software tasks, their potential to teach humans how to use software is being explored. Researchers introduced DigitalCoach, a comprehensive multimodal dataset comprising 72 human expert-novice coaching sessions, totaling over 28 hours of screen and input recordings across five software applications. This dataset was used to evaluate the effectiveness of current AI models as computer use coaches. The findings indicate a significant difference in coaching styles: AI models tend to offer direct instructions but fall short in providing explanations, diagnosing errors, or checking for understanding, unlike human coaches. Furthermore, while AI models can generate human-like utterances when the coaching method is fixed, they demonstrate poor grounding in the visual context of the screen. Interactive evaluations confirmed that learners coached by AI agents often follow instructions passively without deep engagement, highlighting a critical gap in visual understanding and pedagogical approach.

Why it matters

Professionals developing AI-powered educational tools or internal training systems need to understand these limitations to design more effective and engaging learning experiences that go beyond mere instruction.

How to implement this in your domain

  1. 1Integrate multimodal input (screen recordings, user actions) into AI coaching systems to improve visual grounding.
  2. 2Develop AI models with explicit objectives for explanation generation, error diagnosis, and knowledge assessment.
  3. 3Design interactive learning modules that encourage active engagement rather than passive instruction following.
  4. 4Conduct user studies to compare AI-led coaching effectiveness against human-led coaching for specific software tasks.

Who benefits

EdTechCorporate TrainingSoftware DevelopmentCustomer Support

Key takeaways

  • Current AI coaches prioritize direct instructions over explanations and error diagnosis.
  • AI models struggle with visual grounding in real-time computer use coaching.
  • Learners tend to be passive when coached by current AI systems.
  • Future AI coaching agents need improved pedagogical strategies and multimodal understanding.

Original post by Meng Chen, Anya Ji, Tsung-Han Wu, Tobias Maringgele, David M. Chan, Alane Suhr, Amy Pavel

"arXiv:2606.31980v1 Announce Type: cross Abstract: Agents are increasingly capable of automating software tasks, but can they teach humans how to use software themselves? We introduce DigitalCoach, a multimodal dataset of 72 human expert-novice computer use coaching sessions consi…"

View on X

Originally posted by Meng Chen, Anya Ji, Tsung-Han Wu, Tobias Maringgele, David M. Chan, Alane Suhr, Amy Pavel on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

More in AI Research

AI ResearchAI Engineering & DevTools

Human Feedback Guides Generative Meta-Learning for Robust Generalization.

This paper introduces Generative Meta-Learning with Human Feedback (GMHF), a framework that uses expert intuition to guide data synthesis and bridge the domain gap for machine learning models. GMHF employs a Conditional Neural ODE as a generative digital twin and an RL agent to refine latent physical parameters based on feedback, significantly reducing deployment loss and improving generalization under distribution shifts.

Midhun Parakkal Unni, Samuel KaskiJul 2, 2026
AI ResearchAI Engineering & DevTools

Valdi: Value Diffusion World Models for MPC

Valdi introduces Value Diffusion World Models, combining end-to-end online training for Model Predictive Control (MPC) with a latent diffusion dynamics model. Preliminary experiments show that Valdi, using a single diffusion step, matches deterministic MLP baselines in the CarRacing environment, highlighting a trade-off between predictive multimodality and control performance.

Christopher Lindenberg, Kashyap ChittaJul 2, 2026
AI Engineering & DevToolsAI Research

Task-Aware LLM Quantization Improves Efficiency and Performance.

This paper introduces TASA (Task-Aware Sensitivity Analysis), a two-level framework for mixed-precision quantization of large language models (LLMs) that optimizes calibration data composition and bit allocation. TASA addresses the "Perplexity Illusion" and the "Alignment-Diversity Tradeoff," enabling 3.5-bit models to match or surpass 4-bit baselines by jointly considering perplexity and reasoning-oriented sensitivity.

Fei Wang, Chao Xue, Taoran Liu, Li Shen, Ye Liu, ChangXing DingJul 2, 2026