New Framework Improves Zero-Shot Composed Image Retrieval.

Gunho Jung, Jeong-Woo Park, Seon Bin Kim, Seong-Whan Lee· July 1, 2026 View original

Summary

PEC-CIR is a training-free framework that enhances zero-shot composed image retrieval by structuring query construction as a multi-stage reasoning pipeline. It uses a Planner-Executor-Critic architecture to extract constraints, generate candidates, and evaluate them, reducing generative errors and improving retrieval stability.

Composed image retrieval, which involves finding an image based on a reference image and a textual modification, is challenging in a training-free, zero-shot setting. Existing methods typically generate a single, unified textual query, often leading to semantic distortions or omissions because preserving reference attributes and integrating textual modifications can conflict. This research introduces PEC-CIR, a novel training-free framework designed to make zero-shot composed image retrieval more robust. PEC-CIR reframes query construction as a multi-stage reasoning process, moving beyond single-pass generation. It employs a Planner-Executor-Critic architecture: the Planner identifies explicit constraints, the Executor generates multiple potential target descriptions, and the Critic then evaluates these candidates for compliance with the identified constraints. By explicitly evaluating candidate queries before retrieval, PEC-CIR significantly reduces the propagation of generative errors, leading to improved retrieval precision and stability. This strategic planning and self-criticism approach offers a more reliable way to integrate visual and textual information for complex image search tasks.

Why it matters

This advancement provides a more robust and accurate method for image retrieval based on complex, multi-modal queries, which is crucial for applications like e-commerce, content management, and visual search engines. It enhances the ability of AI to understand nuanced visual and textual instructions.

How to implement this in your domain

  1. 1Integrate PEC-CIR's multi-stage reasoning into visual search engines for more precise results.
  2. 2Apply the Planner-Executor-Critic architecture to other complex multi-modal generation tasks.
  3. 3Develop tools that allow users to provide more nuanced, constrained queries for image and video content.
  4. 4Enhance content management systems with advanced retrieval capabilities based on combined visual and textual attributes.

Who benefits

E-commerceDigital Asset ManagementMedia & EntertainmentAdvertisingHealthcare (medical image search)

Key takeaways

  • PEC-CIR improves zero-shot composed image retrieval using a multi-stage reasoning pipeline.
  • Its Planner-Executor-Critic architecture extracts constraints, generates candidates, and evaluates them.
  • This framework reduces generative errors and enhances retrieval stability.
  • Strategic planning and self-criticism are key to robust multi-modal query construction.

Original post by Gunho Jung, Jeong-Woo Park, Seon Bin Kim, Seong-Whan Lee

"arXiv:2606.31222v1 Announce Type: new Abstract: Composed image retrieval requires identifying a target image from a gallery by integrating a reference image with a textual modification instruction. In a training-free zero-shot setting, this task relies on constructing a retrieval…"

View on X

Originally posted by Gunho Jung, Jeong-Woo Park, Seon Bin Kim, Seong-Whan Lee on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

More in AI Research

AI ResearchAI Engineering & DevTools

Philosophical Foundations for Explainable AI in Healthcare Explored

This paper critically reviews the intersection of philosophy of science and explainable AI (XAI) in health sciences, examining what constitutes an adequate medical explanation. It identifies causality, trust, and epistemic adequacy as central axes for designing robust XAI systems in clinical decision-making.

Martina Mattioli, Marcello PelilloJul 1, 2026
AI ResearchAI Engineering & DevTools

New Metric Improves LLM Reinforcement Learning with Verifiable Rewards.

This research introduces the Relative Surprisal Index (RSI), an information-theoretic metric for adaptive token selection in Reinforcement Learning with Verifiable Rewards (RLVR) for LLMs. RSI-S, an entropy-adaptive filtering method based on RSI, improves reasoning accuracy by 2-3 percentage points by retaining tokens within a stable surprisal interval.

Outongyi Lv, Yanzhao Zheng, Yuanwei Zhang, Zhenghao Huang, Xingjun Wang, Baohua Dong, Hangcheng Zhu, Yingda ChenJul 1, 2026
AI Engineering & DevToolsAI Research

New ACE Module Boosts LLM Agent Context Management

Researchers introduce ACE (Adaptive Context Elasticizer), a plug-and-play module that dynamically manages historical information for LLM-based agents. ACE maintains a lossless message layer and adaptively orchestrates context, significantly improving performance across various agent frameworks without architectural changes.

Ning Liao, Zihao Long, Xiaoxing Wang, Xue Yang, Yaoming Wang, Ziyuan Zhuang, Xunliang Cai, Rongxiang Weng, Junchi YanJul 1, 2026