Self-Evolving AI Agent Boosts Legal Case Retrieval Accuracy

Mingxu Tao, Jiawei Hu, Xian Zhou, Wenpeng Hu, Jiajun Cheng, Yunbo Cao, Zhunchen Luo, Guotong Geng· June 17, 2026 View original

Summary

A new self-evolving framework enhances BM25 for legal case retrieval by using an LLM-based agent to iteratively create, validate, and refine query rewriting rules. This approach, which requires no parameter training, significantly outperforms non-evolutionary baselines on Chinese legal case retrieval benchmarks.

Legal case retrieval presents significant challenges due to the intricate nature of legal language and the necessity for precise lexical alignment between queries and relevant cases. While dense retrieval models have made progress, traditional methods like BM25 often remain strong baselines in this specialized domain. This observation motivated researchers to develop a novel self-evolving framework aimed at enhancing BM25 without requiring any parameter training. The proposed framework utilizes an LLM-based agent equipped with an automatic evaluation environment. This agent is designed to iteratively generate query rewriting rules, plan validation experiments to test various rule combinations, and then eliminate ineffective rules based on historical feedback. This continuous self-evolution process allows the system to refine its rule set dynamically. Evaluations conducted on the Chinese legal case retrieval benchmark LeCaRD-v2 demonstrated that this framework significantly outperforms non-evolutionary baselines, including human-designed rules and greedy rule selection. The performance gains were particularly notable when the framework was powered by a high-capacity core LLM. Detailed analyses revealed that the LLM's ability to leverage past experimental results and its intrinsic knowledge for rule elimination are crucial to the success of this self-evolving mechanism.

Why it matters

This innovation is highly relevant for legal professionals and legal tech developers, offering a powerful, self-optimizing solution for improving the accuracy and efficiency of legal case retrieval. It reduces the manual effort in refining search queries and enhances access to critical legal precedents.

How to implement this in your domain

  1. 1Explore integrating self-evolving query rewriting agents into existing legal research platforms or document management systems.
  2. 2Pilot the framework on a specific legal domain or case type to assess its performance and rule generation capabilities.
  3. 3Develop internal guidelines for human oversight and feedback loops to guide the LLM agent's rule evolution process.
  4. 4Leverage the improved retrieval accuracy to enhance legal research, e-discovery, and compliance workflows.
  5. 5Investigate how similar self-evolving agent frameworks could be applied to other complex information retrieval tasks within an organization.

Who benefits

LegalTechLaw FirmsGovernment (Legal Departments)ComplianceResearch & Development

Key takeaways

  • A self-evolving LLM agent enhances legal case retrieval by refining query rewriting rules.
  • The framework improves BM25 performance without requiring parameter training.
  • It iteratively creates, validates, and eliminates rules based on experimental feedback.
  • The approach significantly outperforms traditional and human-designed rule baselines.

Original post by Mingxu Tao, Jiawei Hu, Xian Zhou, Wenpeng Hu, Jiajun Cheng, Yunbo Cao, Zhunchen Luo, Guotong Geng

"arXiv:2606.17220v1 Announce Type: new Abstract: Legal case retrieval remains challenging due to the complexity of legal language and the need for precise lexical alignment between queries and relevant cases. Although dense retrieval models have achieved notable progress, empirica…"

View on X

Originally posted by Mingxu Tao, Jiawei Hu, Xian Zhou, Wenpeng Hu, Jiajun Cheng, Yunbo Cao, Zhunchen Luo, Guotong Geng on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses