New Attack Method Targets RAG Systems by Editing Retriever Models

Xinru Liu, Xianglong Zhang, Di Cai, Zhumin Chen, Pengfei Hu, Xin Xin· June 18, 2026 View original

Summary

This paper introduces CAREATTACK, a model-centric attack framework that injects malicious knowledge into Retrieval-Augmented Generation (RAG) systems by directly editing open-source retriever model parameters. This method manipulates retrieved evidence to mislead LLM generation.

Retrieval-Augmented Generation (RAG) systems are vulnerable to malicious knowledge injection, which can manipulate the information retrieved and subsequently mislead the large language model's output. Traditional attacks often involve crafting malicious external knowledge bases, but these data-centric methods can be detectable. A new threat emerges from model-centric attacks, particularly as many RAG systems rely on open-source retriever models. Researchers have developed CAREATTACK, a framework that directly edits the parameters of these retriever models to inject malicious knowledge. CAREATTACK operates in two stages: conflict-aware retriever editing, which uses efficient parameter editing to promote malicious knowledge over benign passages and resolves conflicts, followed by attack-preserving anchor repair, which fine-tunes the edited retriever to maintain attack effectiveness while minimizing impact on non-target prompts. This method has been demonstrated on popular embedding models and reveals a significant, practical attack surface for RAG systems.

Why it matters

For professionals deploying RAG systems, this research highlights a critical security vulnerability that goes beyond data manipulation. Understanding model-centric attacks like CAREATTACK is essential for developing robust defenses and ensuring the integrity and trustworthiness of AI applications that rely on external knowledge retrieval.

How to implement this in your domain

  1. 1Conduct security audits on RAG systems, specifically focusing on the integrity of open-source retriever models.
  2. 2Implement robust monitoring for unusual behavior or outputs in RAG systems that could indicate knowledge injection.
  3. 3Develop and deploy defense mechanisms that detect and mitigate parameter-level manipulations in retriever models.
  4. 4Stay informed about new attack vectors and research in AI security to proactively protect RAG deployments.

Who benefits

CybersecurityAI DevelopmentCloud ServicesFinanceGovernment

Key takeaways

  • RAG systems are vulnerable to model-centric knowledge injection attacks.
  • CAREATTACK directly edits retriever model parameters to inject malicious knowledge.
  • This method manipulates retrieved evidence to mislead LLM generation.
  • It reveals a practical and underexplored attack surface for RAG systems.

Original post by Xinru Liu, Xianglong Zhang, Di Cai, Zhumin Chen, Pengfei Hu, Xin Xin

"arXiv:2606.18310v1 Announce Type: cross Abstract: Injecting malicious knowledge into retrieval-augmented generation (RAG) systems can manipulate retrieved evidence and mislead downstream generation, posing a serious security threat for AI applications. Existing RAG injection atta…"

View on X

Originally posted by Xinru Liu, Xianglong Zhang, Di Cai, Zhumin Chen, Pengfei Hu, Xin Xin on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses