OSGuard Benchmark Evaluates Safety of Computer-Use AI Agents

OSGuard Benchmark Evaluates Safety of Computer-Use AI Agents.

Mina Mohammadmirzaei, Jeffrey Flanigan· June 16, 2026 View original

Summary

OSGuard is a new dual-granularity benchmark suite designed to evaluate the safety of computer-use AI agents, focusing on identifying unsafe shortcuts and actions even when agents achieve nominal task goals. It includes both action-level guardrail decisions and end-to-end risk-augmented execution scenarios.

As AI agents increasingly interact with computer environments, merely achieving a task goal is insufficient; ensuring safety is paramount. This research introduces OSGuard, a comprehensive dual-granularity benchmark suite specifically designed to assess the safety of computer-use agents. Unlike existing evaluations that focus solely on task success, OSGuard identifies instances where an agent might complete a task through an unsafe or undesirable shortcut, even under benign user instructions. The benchmark comprises two main components: an action-level evaluation and a risk-augmented execution suite. The action-level benchmark presents contextualized proposed actions, classifying them as allowed, unrelated, or unsafe relative to the instruction and interface state. The execution suite features modified OSWorld-derived tasks where latent hazards, such as destructive overwrites, are introduced. These tasks are paired with augmented evaluators that not only check for task completion but also enforce explicit state-based safety invariants, distinguishing safe completions from unsafe ones. Experimental results using OSGuard reveal that while current multimodal guardrails can perform well on isolated action judgments, significant gaps remain in ensuring reliable end-to-end safety during full task execution. This dual-granularity design provides a more precise diagnostic tool for understanding whether models can both recognize unsafe actions and improve overall task safety when deployed as guardrails.

Why it matters

For professionals developing and deploying AI agents that interact with operating systems and web environments, OSGuard provides a critical tool for rigorously testing and improving agent safety. It helps prevent unintended harmful actions, ensuring more reliable and trustworthy AI deployments.

How to implement this in your domain

1Utilize OSGuard to benchmark the safety performance of your computer-use AI agents.
2Integrate the action-level safety evaluation into your agent development lifecycle for proactive risk identification.
3Design and test guardrail mechanisms specifically to address the end-to-end safety gaps identified by OSGuard.
4Adopt the dual-granularity approach to diagnose and mitigate potential unsafe behaviors in agent deployments.

Who benefits

AI DevelopmentCybersecuritySoftware TestingRoboticsIT Operations

Key takeaways

OSGuard is a new benchmark for evaluating the safety of computer-use AI agents.
It identifies unsafe shortcuts and actions, even when agents achieve nominal goals.
The benchmark includes both action-level and end-to-end risk-augmented evaluations.
Current guardrails show gaps in ensuring reliable end-to-end safety, highlighting the need for better solutions.

Original post by Mina Mohammadmirzaei, Jeffrey Flanigan

"arXiv:2606.15034v1 Announce Type: new Abstract: Computer-use agents are increasingly evaluated by whether they complete realistic desktop and web tasks. However, task success alone can miss failures in which an agent reaches the nominal goal through an unsafe shortcut. We introdu…"

View on X

Originally posted by Mina Mohammadmirzaei, Jeffrey Flanigan on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

OSGuard Benchmark Evaluates Safety of Computer-Use AI Agents.

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Engineering & DevTools

AI-Powered Development Workflow Integrates Multiple Models

Proposing AI Usage Transparency for Credible Commentary

MCP and A2A Protocols Standardize Agentic Internet Development