Adversarial Attack Manipulates MLLM Cascade Routing Decisions

Zhongye Liu, Yaopei Zeng, Yurui Chang, Lu Lin· June 16, 2026 View original

Summary

Researchers have introduced the Forced Deferral Attack (FDA), an adversarial image attack designed to manipulate multimodal large language model (MLLM) cascades. This attack lowers the confidence of the weaker, cheaper model, thereby forcing queries to be routed to the stronger, more computationally expensive model, without directly affecting the correctness of the answer.

Multimodal Large Language Model (MLLM) cascades are designed to optimize computational costs by initially querying a less powerful but cheaper model, deferring to a stronger model only when the weaker one expresses low confidence. This cost-saving mechanism, however, introduces a new security vulnerability: the weak model's confidence directly dictates compute allocation, creating an attack surface. An adversary could manipulate this confidence to ensure their queries are consistently processed by the more expensive, stronger model. Motivated by this, researchers have developed the Forced Deferral Attack (FDA). This adversarial image attack specifically aims to reduce the weak model's confidence, thereby compelling the cascade system to route queries to the strong model. The FDA learns a universal border trigger by optimizing an objective that pushes the weak model's token distribution on triggered inputs toward less concentrated targets, derived from its clean responses. Across various datasets, model families, and deferral metrics, FDA consistently increases strong-model routing. It also outperforms baseline attacks like image perturbation and prompt injection. These findings underscore a significant vulnerability in MLLM cascades, demonstrating that compute allocation can be manipulated to force unintended strong-model usage, even without directly compromising answer accuracy.

Why it matters

For organizations deploying MLLM cascades, this research highlights a critical security and cost-management vulnerability, necessitating robust defense mechanisms to prevent malicious actors from exploiting confidence scores to incur higher operational costs or degrade service quality.

How to implement this in your domain

  1. 1Assess your MLLM cascade systems for vulnerabilities related to confidence-based routing decisions.
  2. 2Develop monitoring systems to detect unusual patterns in query deferral rates to stronger models.
  3. 3Implement adversarial training or robustness techniques to make weak models less susceptible to confidence manipulation.
  4. 4Explore alternative or supplementary routing mechanisms that are not solely dependent on a single model's confidence score.

Who benefits

Cloud ComputingCybersecurityAI EngineeringTelecommunicationsMedia & Entertainment

Key takeaways

  • MLLM cascades are vulnerable to attacks that manipulate routing decisions.
  • The Forced Deferral Attack (FDA) lowers weak model confidence to force strong model usage.
  • FDA is an adversarial image attack that doesn't target answer correctness directly.
  • This vulnerability can lead to increased computational costs and potential service degradation.

Original post by Zhongye Liu, Yaopei Zeng, Yurui Chang, Lu Lin

"arXiv:2606.15308v1 Announce Type: new Abstract: While multimodal large language models (MLLMs) have shown strong visual reasoning abilities, serving a large model for every query is computationally expensive. MLLM cascades mitigate this cost by first querying a weak but cheaper m…"

View on X

Originally posted by Zhongye Liu, Yaopei Zeng, Yurui Chang, Lu Lin on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses