Vera Framework Automates LLM Agent Safety Testing at Scale

Yunhao Feng, Ruixiao Lin, Ming Wen, Qinqin He, Yanming Guo, Yifan Ding, Yutao Wu, Jialuo Chen, Yunhao Chen, Xiaohu Du, Jianan Ma, Zixing Chen, Zhuoer Xu, Xingjun Ma, Xinhao Deng· July 3, 2026 View original

▶ The 2-minute explainer

Summary

Vera is an end-to-end automated safety testing framework for LLM agents that perform autonomous actions, addressing complex and evolving risks. It uses a three-stage pipeline for continuous risk discovery, combinatorial safety case generation, and evidence-grounded verification in isolated sandboxes, revealing significant weaknesses in production agent frameworks.

Large language model (LLM) agents are increasingly capable of performing autonomous actions using external tools, which introduces complex and rapidly evolving safety risks. Current safety testing methods often target expert-designed violations and rely on hard-coded rules, making them difficult to scale and adapt as agents evolve. To overcome these limitations, researchers have developed Vera, an automated, end-to-end safety testing framework that applies software engineering principles to non-deterministic agents. Vera operates through a self-reinforcing, three-stage pipeline. First, it continuously discovers and structures emerging risks into taxonomies of safety risks, attack methods, and tool execution environments, drawing from existing literature. Second, it generates executable safety cases by combinatorially composing elements from these taxonomies, each specifying a safety goal, an initial state, and a deterministic verification predicate based on observable artifacts. Third, Vera adaptively executes heterogeneous agents in isolated sandboxes, where a control agent manages multi-turn interactions, and evidence-grounded verifiers judge outcomes based on environment state and tool-call evidence, rather than the model's self-reported behavior. Evaluations on four production agent frameworks (OpenClaw, Hermes, Codex, Claude Code) exposed substantial safety weaknesses, with average attack success rates reaching 93.9% under multi-channel attacks. The release of Vera-Bench, comprising 1600 executable safety cases, underscores the necessity of modular, executable testing infrastructure for rigorous and maintainable safety evaluation of rapidly evolving agentic systems.

Why it matters

For professionals developing and deploying LLM agents, Vera provides a critical framework for systematically identifying and mitigating safety risks at scale, ensuring more robust and trustworthy AI systems in production.

How to implement this in your domain

  1. 1Adopt an automated, end-to-end safety testing framework like Vera for LLM agents in development.
  2. 2Implement continuous risk discovery and taxonomy structuring to keep pace with evolving agent capabilities.
  3. 3Develop combinatorial safety cases that cover a wide range of potential attack methods and execution environments.
  4. 4Utilize isolated sandboxes and evidence-grounded verifiers for objective assessment of agent behavior.
  5. 5Integrate safety testing into the CI/CD pipeline for LLM agents to ensure ongoing security and reliability.

Who benefits

AI DevelopmentCybersecuritySoftware EngineeringRoboticsAutonomous Systems

Key takeaways

  • LLM agents performing autonomous actions introduce complex and evolving safety risks.
  • Vera is an automated framework for scalable, evidence-grounded safety testing of LLM agents.
  • The framework uses continuous risk discovery, combinatorial safety cases, and sandbox execution.
  • Evaluations revealed significant safety weaknesses in current production agent frameworks.

Original post by Yunhao Feng, Ruixiao Lin, Ming Wen, Qinqin He, Yanming Guo, Yifan Ding, Yutao Wu, Jialuo Chen, Yunhao Chen, Xiaohu Du, Jianan Ma, Zixing Chen, Zhuoer Xu, Xingjun Ma, Xinhao Deng

"arXiv:2607.01793v1 Announce Type: new Abstract: LLM agents increasingly perform autonomous actions through external tools, leading to complex and evolving safety risks. However, existing safety testing targets expert-designed safety violations, and the corresponding outcomes are…"

View on X

Originally posted by Yunhao Feng, Ruixiao Lin, Ming Wen, Qinqin He, Yanming Guo, Yifan Ding, Yutao Wu, Jialuo Chen, Yunhao Chen, Xiaohu Du, Jianan Ma, Zixing Chen, Zhuoer Xu, Xingjun Ma, Xinhao Deng on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses