GRACE-RAG Enhances Closed-Domain RAG with Graph-Augmented Retrieval

Asit Desai, Aman Kumar, Prashant Devadiga· July 2, 2026 View original

Summary

This paper introduces GRACE-RAG, a novel Retrieval-Augmented Generation (RAG) architecture that uses graph-augmented retrieval to externalize structural reasoning, improving completeness and depth in closed-domain institutional settings. It enables lightweight deployment on self-hosted models by reducing dependence on large, proprietary systems.

Retrieval-Augmented Generation (RAG) systems are crucial for institutional question-answering, where responses must be grounded in authoritative documents. However, in complex domains with information spread across many documents, standard vector-only retrieval often yields fragmented evidence, burdening the LLM with extensive inference-time reasoning. GRACE-RAG addresses this by introducing a retrieval-governed, graph-augmented RAG architecture. This system shifts structural reasoning from the generative stage to a dedicated structured retrieval layer, resolving ambiguities offline. This design allows for deployment on lightweight, self-hosted models specifically calibrated for the institution's vocabulary. Experiments across various model capacities, including Mistral 24B, GPT OSS 120B, and Gemini 2.5 Flash, consistently showed GRACE-RAG improving response completeness, depth, and anticipatory coverage. Quality gains reached up to 20% with mid-scale models, indicating that the retrieval architecture's quality is more influential than model scale, leading to reduced computational and latency footprints without relying on proprietary solutions.

Why it matters

For organizations needing to deploy accurate, grounded RAG systems within their own infrastructure, GRACE-RAG offers a path to achieve high quality with smaller, more controllable models, reducing costs and data privacy concerns.

How to implement this in your domain

  1. 1Assess current RAG system limitations in handling entity-dense, heterogeneous document sets within closed domains.
  2. 2Explore the feasibility of integrating graph databases and knowledge graphs into your RAG retrieval pipeline.
  3. 3Pilot GRACE-RAG's graph-augmented retrieval approach with a specific institutional dataset to evaluate performance gains.
  4. 4Calibrate a lightweight, self-hosted LLM to your organization's specific vocabulary and domain knowledge.
  5. 5Develop metrics to measure the completeness, depth, and anticipatory coverage of RAG-generated responses.

Who benefits

GovernmentHealthcareFinancial ServicesLegalEducation

Key takeaways

  • Traditional RAG struggles with fragmented evidence in entity-dense, closed domains.
  • GRACE-RAG uses graph-augmented retrieval to externalize structural reasoning.
  • This architecture enables high-quality RAG with lightweight, self-hosted models.
  • It reduces computational costs and reliance on large proprietary LLMs.

Original post by Asit Desai, Aman Kumar, Prashant Devadiga

"arXiv:2607.00013v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) systems are widely used in institutional question answering settings where responses must be grounded in authoritative documentation (Gao et al., 2023). In entity-dense domains where relevant i…"

View on X

Originally posted by Asit Desai, Aman Kumar, Prashant Devadiga on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses