COMFYCLAW Agent Evolves Skills for Image Generation Workflows

Zongxia Li, Dawei Liu, Fuxiao Liu, Yuhang Zhou, Xiyang Wu, Jingxi Chen, Jing Xie, Xiaomin Wu, Lichao Sun· July 3, 2026 View original

Summary

COMFYCLAW is an agentic framework that improves image generation workflows by evolving a skill library based on past trajectories, errors, and visual feedback. It formulates workflow construction as typed graph editing and uses a VLM verifier to suggest repairs.

This research introduces COMFYCLAW, an innovative agentic system designed to enhance image generation workflows, particularly within environments like ComfyUI. The system treats workflow creation as a process of typed graph editing, providing tools for each stage and automatically correcting invalid modifications. A key feature is its integration of a region-level vision-language model (VLM) that analyzes visual failures and translates them into actionable suggestions for repair. COMFYCLAW's core innovation lies in its ability to evolve a skill library. It distills insights from previous workflow executions, including successful trajectories, encountered errors, and verifier feedback, into reusable agent skills. This continuous learning mechanism significantly improves the agent's reliability and performance. Evaluations across various benchmarks, agent models, and image backbones demonstrate that COMFYCLAW consistently achieves superior image generation scores compared to baselines without skill evolution. Human annotations further confirm a preference for the system's output, highlighting the effectiveness of its skill evolution approach in complex visual workflow construction.

Why it matters

Professionals in creative industries or those developing AI tools can leverage this approach to build more robust and adaptable image generation systems, reducing manual intervention and improving output quality over time.

How to implement this in your domain

  1. 1Integrate VLM-based verification into existing generative AI pipelines to identify and diagnose visual errors.
  2. 2Develop a feedback loop mechanism that captures execution errors and user preferences to inform skill evolution.
  3. 3Implement a graph-editing interface for workflow construction, allowing agents to programmatically modify and optimize processes.
  4. 4Experiment with distilling successful workflow patterns into reusable "skills" for future automated tasks.

Who benefits

Creative ArtsGamingAdvertisingProduct DesignAI Development

Key takeaways

  • Skill evolution significantly enhances agent reliability and performance in complex visual workflows.
  • VLM verifiers can translate visual failures into actionable repair suggestions for AI agents.
  • Treating workflow construction as typed graph editing enables structured and automated optimization.
  • Continuous learning from past executions, errors, and feedback is crucial for agent improvement.

Original post by Zongxia Li, Dawei Liu, Fuxiao Liu, Yuhang Zhou, Xiyang Wu, Jingxi Chen, Jing Xie, Xiaomin Wu, Lichao Sun

"arXiv:2607.01709v1 Announce Type: new Abstract: Agents are increasingly used to construct workflows and assist humans in completing recurring tasks more efficiently. As these workflows become repeated and domain-specific, agent memory and reusable skills become increasingly impor…"

View on X

Originally posted by Zongxia Li, Dawei Liu, Fuxiao Liu, Yuhang Zhou, Xiyang Wu, Jingxi Chen, Jing Xie, Xiaomin Wu, Lichao Sun on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses