New Framework Improves Zero-Shot Composed Image Retrieval.

Gunho Jung, Jeong-Woo Park, Seon Bin Kim, Seong-Whan Lee· July 1, 2026 View original

Summary

PEC-CIR is a training-free framework that enhances zero-shot composed image retrieval by structuring query construction as a multi-stage reasoning pipeline. It uses a Planner-Executor-Critic architecture to extract constraints, generate candidates, and evaluate them, reducing generative errors and improving retrieval stability.

Composed image retrieval, which involves finding an image based on a reference image and a textual modification, is challenging in a training-free, zero-shot setting. Existing methods typically generate a single, unified textual query, often leading to semantic distortions or omissions because preserving reference attributes and integrating textual modifications can conflict. This research introduces PEC-CIR, a novel training-free framework designed to make zero-shot composed image retrieval more robust. PEC-CIR reframes query construction as a multi-stage reasoning process, moving beyond single-pass generation. It employs a Planner-Executor-Critic architecture: the Planner identifies explicit constraints, the Executor generates multiple potential target descriptions, and the Critic then evaluates these candidates for compliance with the identified constraints. By explicitly evaluating candidate queries before retrieval, PEC-CIR significantly reduces the propagation of generative errors, leading to improved retrieval precision and stability. This strategic planning and self-criticism approach offers a more reliable way to integrate visual and textual information for complex image search tasks.

Why it matters

This advancement provides a more robust and accurate method for image retrieval based on complex, multi-modal queries, which is crucial for applications like e-commerce, content management, and visual search engines. It enhances the ability of AI to understand nuanced visual and textual instructions.

How to implement this in your domain

1Integrate PEC-CIR's multi-stage reasoning into visual search engines for more precise results.
2Apply the Planner-Executor-Critic architecture to other complex multi-modal generation tasks.
3Develop tools that allow users to provide more nuanced, constrained queries for image and video content.
4Enhance content management systems with advanced retrieval capabilities based on combined visual and textual attributes.

Who benefits

E-commerceDigital Asset ManagementMedia & EntertainmentAdvertisingHealthcare (medical image search)

Key takeaways

PEC-CIR improves zero-shot composed image retrieval using a multi-stage reasoning pipeline.
Its Planner-Executor-Critic architecture extracts constraints, generates candidates, and evaluates them.
This framework reduces generative errors and enhances retrieval stability.
Strategic planning and self-criticism are key to robust multi-modal query construction.

Original post by Gunho Jung, Jeong-Woo Park, Seon Bin Kim, Seong-Whan Lee

"arXiv:2606.31222v1 Announce Type: new Abstract: Composed image retrieval requires identifying a target image from a gallery by integrating a reference image with a textual modification instruction. In a training-free zero-shot setting, this task relies on constructing a retrieval…"

View on X

Originally posted by Gunho Jung, Jeong-Woo Park, Seon Bin Kim, Seong-Whan Lee on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

New Framework Improves Zero-Shot Composed Image Retrieval.

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Research

Philosophical Foundations for Explainable AI in Healthcare Explored

New Metric Improves LLM Reinforcement Learning with Verifiable Rewards.

New ACE Module Boosts LLM Agent Context Management