GUI Agents Outperform CLI, Skill Augmentation Boosts CLI

Xiao Zhou, Siyue Zhang, Yilun Zhao, Jinbiao Wei, Tingyu Song, Arman Cohan, Chen Zhao· June 24, 2026 View original

Summary

A new benchmark comparing GUI and CLI computer-use agents reveals that GUI agents currently achieve higher success rates, but skill augmentation significantly improves CLI agent performance. The study suggests GUI agents struggle with long-horizon workflows, while CLI agents are limited by skill coverage.

Computer-use agents can interact with software either through graphical user interfaces (GUI) or command-line interfaces (CLI). Existing evaluations often conflate interaction modality with other variables, making direct comparisons difficult. This research introduces a standardized benchmark of 440 desktop tasks across various applications and workflows, ensuring identical goals and states for both GUI and CLI agents, with actions restricted to their native modalities. In this controlled environment, the most effective screen-only GUI agent achieved a 59.1% success rate, surpassing the strongest original-skill CLI agent, which reached 48.2%. However, when the CLI agent's skills were augmented with verifier-guided enhancements, its success rate climbed to 69.3%. This indicates that the initial performance gap for CLI agents was largely due to incomplete skill coverage rather than inherent model limitations. The findings suggest that GUI agents face challenges with reliable grounded interaction over extended, complex workflows. Conversely, CLI agents are primarily constrained by the breadth and scalability of their skill interfaces. This distinction highlights different bottlenecks for each interaction paradigm, offering insights for future development of more capable computer-use agents.

Why it matters

For developers and researchers building autonomous agents, understanding the distinct bottlenecks of GUI and CLI interaction modalities is crucial for optimizing agent design and improving performance in real-world computer automation tasks.

How to implement this in your domain

1Prioritize skill coverage for CLI agents: Focus on expanding the breadth and depth of programmatic skills for CLI-based automation to overcome current limitations.
2Enhance grounded interaction for GUI agents: Develop more robust methods for GUI agents to reliably perceive and interact with visual elements over complex, multi-step workflows.
3Combine modalities strategically: Explore hybrid agent architectures that leverage the strengths of both GUI for visual tasks and CLI for structured, programmatic operations.
4Utilize benchmark for evaluation: Employ standardized benchmarks like the one presented to rigorously evaluate and compare agent performance across different interaction modalities.

Who benefits

Software DevelopmentIT AutomationRoboticsBusiness Process Automation

Key takeaways

GUI agents currently outperform original-skill CLI agents in desktop task execution.
Skill augmentation significantly boosts CLI agent performance, surpassing GUI agents.
GUI agents are limited by reliable grounded interaction in long workflows.
CLI agents' primary bottleneck is the coverage and scalability of their skill interfaces.

Original post by Xiao Zhou, Siyue Zhang, Yilun Zhao, Jinbiao Wei, Tingyu Song, Arman Cohan, Chen Zhao

"arXiv:2606.24551v1 Announce Type: new Abstract: Computer-use agents can execute software tasks through either graphical interfaces or programmatic command interfaces, but existing evaluations confound interaction modality with differences in tasks, initial states, verifiers, and…"

View on X

Originally posted by Xiao Zhou, Siyue Zhang, Yilun Zhao, Jinbiao Wei, Tingyu Song, Arman Cohan, Chen Zhao on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

GUI Agents Outperform CLI, Skill Augmentation Boosts CLI

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Engineering & DevTools

MCP and A2A Protocols Standardize Agentic Internet Development

VISReg Enhances JEPA Training with Novel Regularization

Ford's AI-Driven Layoffs Backfire Significantly