Xiaomi-GUI-0: A Real-World Mobile GUI Agent for Enhanced Stability.

Wanxia Cao, Chengzhen Duan, Pei Fu, Pengzhi Gao, Niu Lian, Fazhan Liu, Hui Liu, Heng Qu, Qinzhuo Wu, Zhehao Yu, Tongbo Chen, Shiqi Cui, Anan Du, Shukai Jia, Yuanfa Li, Yike Liu, Wenchao Lu, Haoyuan Sun, Jiatong Sun, Cheng Tan, Yajie Wang, Changqiao Wu, Tao Xiong, Jiahui Yang, Yuxuan Yuan, Ruoceng Zhang, Shaojie Zhang, Jian Zhu, Jian Luan, Cong Zou· July 1, 2026 View original

Summary

Xiaomi-GUI-0 is a new native multimodal GUI agent designed for real mobile environments, trained and evaluated in a closed-loop system to improve execution stability and abnormal-state recognition in real-world applications. It aims to bridge the gap between benchmark performance and actual usability by using physical devices as the primary execution environment.

Traditional GUI agents often struggle in real-world mobile applications because their training and evaluation environments, like offline trajectories or simulations, don't accurately reflect the complexities of live usage. Factors such as account states, permission dialogues, and payment authentication constantly introduce variability that benchmarks fail to capture. This discrepancy leads to a significant gap between reported performance and actual usability. To address this, researchers have developed Xiaomi-GUI-0, a native multimodal GUI agent specifically designed for real mobile environments. Its core innovation is a hybrid infrastructure where physical devices are the primary execution environment, supported by sandboxes. This setup ensures that data collection, training, deployment, and evaluation occur in a distribution closely mirroring real-world conditions. The agent utilizes a multi-source training data approach, incorporating high-frequency tasks, generalization data for diverse intents, and capability-enhancement data for reflection and memory. An error-driven data flywheel converts failure trajectories into corrective actions and recovery demonstrations. Through a three-stage progressive training pipeline, Xiaomi-GUI-0 achieves strong success rates on both public benchmarks and its in-house RealMobile evaluation, significantly boosting execution stability and abnormal-state recognition in practical tasks.

Why it matters

Professionals developing or deploying AI agents for mobile automation will find this relevant as it tackles the critical challenge of making agents robust and reliable in complex, real-world scenarios, moving beyond theoretical benchmarks. It offers a blueprint for building more practical and stable AI-driven mobile solutions.

How to implement this in your domain

  1. 1Adopt a real-device-dominant testing strategy for mobile AI agents to better simulate production environments.
  2. 2Implement an error-driven data feedback loop to continuously improve agent performance by learning from failures.
  3. 3Design training data to include high-frequency tasks, long-tail intents, and specific capability enhancements like reflection.
  4. 4Explore multi-stage training pipelines, combining supervised fine-tuning with reinforcement learning for robust agent development.
  5. 5Evaluate agent performance using in-house, real-world benchmarks in addition to public datasets.

Who benefits

Mobile DevelopmentE-commerceCustomer ServiceAutomotiveIoT

Key takeaways

  • Real-world mobile GUI agents require training and evaluation on physical devices to ensure practical usability.
  • Xiaomi-GUI-0 uses a hybrid infrastructure and error-driven data flywheel to improve agent stability.
  • The agent's multi-stage training pipeline enhances performance across diverse tasks.
  • Bridging the gap between benchmarks and real-world performance is crucial for practical AI deployment.

Original post by Wanxia Cao, Chengzhen Duan, Pei Fu, Pengzhi Gao, Niu Lian, Fazhan Liu, Hui Liu, Heng Qu, Qinzhuo Wu, Zhehao Yu, Tongbo Chen, Shiqi Cui, Anan Du, Shukai Jia, Yuanfa Li, Yike Liu, Wenchao Lu, Haoyuan Sun, Jiatong Sun, Cheng Tan, Yajie Wang, Changqiao Wu, Tao Xiong, Jiahui Yang, Yuxuan Yuan, Ruoceng Zhang, Shaojie Zhang, Jian Zhu, Jian Luan, Cong Zou

"arXiv:2606.31410v1 Announce Type: new Abstract: Graphical user interface (GUI) agents build on vision-language models to complete user tasks end-to-end in real applications through interface actions such as tapping, swiping, text entry, and navigation. However, existing GUI agent…"

View on X

Originally posted by Wanxia Cao, Chengzhen Duan, Pei Fu, Pengzhi Gao, Niu Lian, Fazhan Liu, Hui Liu, Heng Qu, Qinzhuo Wu, Zhehao Yu, Tongbo Chen, Shiqi Cui, Anan Du, Shukai Jia, Yuanfa Li, Yike Liu, Wenchao Lu, Haoyuan Sun, Jiatong Sun, Cheng Tan, Yajie Wang, Changqiao Wu, Tao Xiong, Jiahui Yang, Yuxuan Yuan, Ruoceng Zhang, Shaojie Zhang, Jian Zhu, Jian Luan, Cong Zou on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

More in AI Engineering & DevTools

AI ResearchAI Engineering & DevTools

Philosophical Foundations for Explainable AI in Healthcare Explored

This paper critically reviews the intersection of philosophy of science and explainable AI (XAI) in health sciences, examining what constitutes an adequate medical explanation. It identifies causality, trust, and epistemic adequacy as central axes for designing robust XAI systems in clinical decision-making.

Martina Mattioli, Marcello PelilloJul 1, 2026
AI ResearchAI Engineering & DevTools

New Metric Improves LLM Reinforcement Learning with Verifiable Rewards.

This research introduces the Relative Surprisal Index (RSI), an information-theoretic metric for adaptive token selection in Reinforcement Learning with Verifiable Rewards (RLVR) for LLMs. RSI-S, an entropy-adaptive filtering method based on RSI, improves reasoning accuracy by 2-3 percentage points by retaining tokens within a stable surprisal interval.

Outongyi Lv, Yanzhao Zheng, Yuanwei Zhang, Zhenghao Huang, Xingjun Wang, Baohua Dong, Hangcheng Zhu, Yingda ChenJul 1, 2026
AI Engineering & DevToolsAI Research

New ACE Module Boosts LLM Agent Context Management

Researchers introduce ACE (Adaptive Context Elasticizer), a plug-and-play module that dynamically manages historical information for LLM-based agents. ACE maintains a lossless message layer and adaptively orchestrates context, significantly improving performance across various agent frameworks without architectural changes.

Ning Liao, Zihao Long, Xiaoxing Wang, Xue Yang, Yaoming Wang, Ziyuan Zhuang, Xunliang Cai, Rongxiang Weng, Junchi YanJul 1, 2026