Multimodal LLMs Suffer Hidden Forgetting, Losing Evidence Gr

Multimodal LLMs Suffer Hidden Forgetting, Losing Evidence Grounding

Qianyu Chen, Canran Xiao, Runxuan Tang· July 3, 2026 View original

▶ The 2-minute explainer

Summary

Research reveals "hidden evidence-use forgetting" in continually adapted multimodal LLMs, where models retain answer accuracy but silently shift away from using appropriate visual or textual evidence. A new framework, RCL, is proposed to preserve both task learning and evidence reliance without replay or inference-time cost.

Multimodal large language models (MLLMs) are designed to adapt continuously to new tasks and domains. However, current evaluation metrics for continual learning primarily focus on whether the model's answers remain correct, often overlooking the stability of how these models ground their responses in multimodal evidence. New research identifies a critical issue termed "hidden evidence-use forgetting." This phenomenon occurs when an MLLM, despite maintaining high answer accuracy, subtly changes its reliance on visual, textual, or other forms of evidence. Essentially, the model might still get the right answer but for the wrong or less grounded reasons, indicating a loss of robust understanding. To address this, the researchers propose Reliance-Constrained Continual Learning (RCL), a replay-free framework. RCL works by freezing a previous model checkpoint as a behavioral reference and then jointly optimizing for task learning, prediction preservation, and crucially, reliance preservation. This method significantly improves performance and reduces evidence reliance drift across various multimodal benchmarks without adding any inference-time overhead.

Why it matters

For professionals deploying MLLMs in critical applications, ensuring not just correct answers but also transparent and stable evidence grounding is vital for trust, reliability, and auditability. Hidden forgetting poses a significant risk to model integrity.

How to implement this in your domain

1Evaluate existing MLLM deployments for "hidden evidence-use forgetting" by analyzing their reliance on different evidence channels over time.
2Consider integrating reliance-preserving techniques like RCL into continual learning pipelines for MLLMs.
3Prioritize model development that focuses on the stability of evidence grounding alongside accuracy metrics.
4Develop new internal metrics to track and mitigate modality reliance drift in continually updated multimodal systems.

Who benefits

HealthcareAutonomous VehiclesBFSIContent ModerationLegal

Key takeaways

Continual learning in MLLMs can lead to "hidden evidence-use forgetting."
Models may retain accuracy but lose stable grounding in multimodal evidence.
RCL framework preserves both task learning and evidence reliance.
Maintaining evidence paths is crucial for robust multimodal learning.

Original post by Qianyu Chen, Canran Xiao, Runxuan Tang

"arXiv:2607.02020v1 Announce Type: new Abstract: Multimodal large language models must continually adapt to evolving tasks and domains, yet standard continual learning metrics mainly measure whether old answers remain correct, leaving the stability of multimodal grounding largely…"

View on X

Originally posted by Qianyu Chen, Canran Xiao, Runxuan Tang on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

Multimodal LLMs Suffer Hidden Forgetting, Losing Evidence Grounding

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Research

New Methods for Log-Density-Ratio Estimation in Gaussian Models

Dynamic Support Learning Enhances Reinforcement Learning Value Estimation

Decomposer Recovers Music Programs from Symbolic MIDI Data