SkillAudit Evolves Agent Skills Without Ground-Truth Feedbac

SkillAudit Evolves Agent Skills Without Ground-Truth Feedback

Haowen Gao, Haoran Chen, Can Wang, Shasha Guo, Liang Pang, Zhaoyang Liu, Huawei Shen, Xueqi Cheng· June 15, 2026 View original

Summary

SkillAudit is a novel framework that enables the continuous evolution of AI agent skills post-deployment without requiring ground-truth feedback. It achieves this by auditing paired trajectories and using process-aligned contrastive evaluation to guide skill improvements.

AI agent skills, which are structured procedures guiding Large Language Models (LLMs) through specialized workflows, often require continuous refinement after deployment. Real-world usage frequently uncovers edge cases, API changes, or operational constraints that were not apparent during initial development. However, traditional skill evolution methods typically rely on privileged feedback, such as hidden test scores or environment rewards, which are often unavailable in practical deployment scenarios where only a task description and workspace data exist. This paper introduces SkillAudit, a framework designed to overcome this limitation by evolving agent skills without needing ground-truth feedback. Its core innovation is 'paired trajectory auditing,' where the same task is executed both with and without a candidate skill at each iteration. This approach isolates how the skill specifically alters the agent's behavior, providing insights into its impact. To translate these behavioral differences into actionable editing guidance, SkillAudit employs Process-Aligned Contrastive Evaluation (PACE). PACE consists of evaluators that map observed trajectory divergences to diagnostic signals, which are then linked to specific sections within the skill document. A fixed structural verifier, derived from the task specification, ensures that any proposed updates adhere to task constraints and prevents harmful modifications. SkillAudit routes edits through two pipelines: 'Refine' for broadly useful skills and 'Repair' for passages that conflict with the task, demonstrating significant performance gains across various professional domains.

Why it matters

Professionals developing and deploying AI agents can use SkillAudit to continuously improve agent performance and adaptability in real-world settings, even when traditional feedback mechanisms are absent. This leads to more robust and effective AI solutions that can evolve with changing operational demands.

How to implement this in your domain

1Integrate SkillAudit into existing AI agent development and deployment pipelines.
2Define clear task specifications to compile the structural verifier for agent skills.
3Implement paired trajectory auditing to systematically compare agent behaviors with and without candidate skill modifications.
4Utilize Process-Aligned Contrastive Evaluation (PACE) to generate diagnostic signals for skill improvement.
5Apply the Refine and Repair pipelines to iteratively enhance agent skills based on identified behavioral divergences.

Who benefits

Software DevelopmentAI/ML EngineeringBusiness Process AutomationRoboticsCustomer Service

Key takeaways

SkillAudit enables AI agent skill evolution without relying on ground-truth feedback.
It uses paired trajectory auditing to isolate and analyze skill-induced behavioral changes.
The framework significantly improves agent performance across diverse professional tasks.
A structural verifier ensures updates maintain task constraints and prevent harmful modifications.

Original post by Haowen Gao, Haoran Chen, Can Wang, Shasha Guo, Liang Pang, Zhaoyang Liu, Huawei Shen, Xueqi Cheng

"arXiv:2606.14239v1 Announce Type: new Abstract: Agent skills are structured procedural packages that guide frozen LLM agents in specialized workflows. Skills rarely remain sufficient after deployment: edge cases, API changes, and deployment constraints become visible only through…"

View on X

Originally posted by Haowen Gao, Haoran Chen, Can Wang, Shasha Guo, Liang Pang, Zhaoyang Liu, Huawei Shen, Xueqi Cheng on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

SkillAudit Evolves Agent Skills Without Ground-Truth Feedback

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Engineering & DevTools

MCP and A2A Protocols Standardize Agentic Internet Development

VISReg Enhances JEPA Training with Novel Regularization

Ford's AI-Driven Layoffs Backfire Significantly