MLLMs Offer Low-Cost, Training-Free Concept Explanations

Darian Fern\'andez-Guti\'errez, Rafael Bello, Marilyn Bello, Natalia D\'iaz-Rodr\'iguez· June 30, 2026 View original

Summary

Researchers evaluated mid-scale Multimodal Large Language Models (MLLMs) for localized concept naming without specific training, achieving high accuracy in assigning semantic labels to image regions. This highlights the potential for cost-effective, concept-based Explainable AI (C-XAI) using existing MLLMs.

Explainable AI (XAI) often struggles with validating concept-based explanations due to a lack of detailed annotations. This paper explores whether existing Multimodal Large Language Models (MLLMs) can provide localized, concept-based explanations without requiring additional training. The study introduced a zero-shot evaluation protocol, called Concept Naming (CoNa), to assess MLLMs' ability to label bounding-box regions at both object and part levels. Experiments with various MLLMs demonstrated strong performance, achieving up to 88% object-level accuracy. This suggests that MLLMs can be a powerful, low-cost solution for generating human-understandable concept annotations directly from localized image regions.

Why it matters

Professionals can leverage off-the-shelf MLLMs for generating concept-based explanations, reducing the need for expensive, fine-grained concept annotations and accelerating the development of more transparent AI systems.

How to implement this in your domain

  1. 1Integrate mid-scale MLLMs into existing computer vision pipelines to automatically generate localized concept explanations for model predictions.
  2. 2Utilize zero-shot concept naming protocols to quickly prototype and evaluate concept-based XAI features without extensive data labeling.
  3. 3Explore MLLM capabilities for data annotation, using them to generate initial concept labels for new datasets, reducing manual effort.
  4. 4Apply this approach in domains requiring high transparency, such as medical imaging or autonomous driving, to better understand model decisions.

Who benefits

HealthcareAutomotiveManufacturingRetailAI/Tech

Key takeaways

  • Mid-scale MLLMs can perform localized concept naming in a zero-shot manner.
  • Training-free approaches offer a low-cost solution for concept-based XAI.
  • MLLMs can achieve high accuracy in assigning semantic labels to image regions.
  • This method reduces the need for extensive, fine-grained concept annotations.

Original post by Darian Fern\'andez-Guti\'errez, Rafael Bello, Marilyn Bello, Natalia D\'iaz-Rodr\'iguez

"arXiv:2606.29069v1 Announce Type: new Abstract: Concept-based Explainable AI (C-XAI) seeks human-understandable explanations grounded in semantic concepts, yet validation is limited by the scarcity of fine-grained concept annotations. We evaluate whether mid-scale Multimodal Larg…"

View on X

Originally posted by Darian Fern\'andez-Guti\'errez, Rafael Bello, Marilyn Bello, Natalia D\'iaz-Rodr\'iguez on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses