New Agentic Framework Automates Context-Aware Data Quality Assessment.

Hadi Fadlallah, Ibrahim Dhaini, Fatima Mubarak, Rima Kilany· June 15, 2026 View original

Summary

Researchers propose an agentic-retrieval framework that uses large language models to autonomously assess data quality based on natural-language descriptions of intended data usage. The system generates and validates executable validation logic, ensuring reliable and context-dependent data quality checks.

Data quality assessment is a critical but challenging task, often limited by its context-dependent nature and reliance on static rules or manual efforts. While large language models (LLMs) offer potential for automation, concerns about reliability and execution safety persist. This paper introduces a unified agentic-retrieval framework designed for autonomous, context-aware data quality assessment. The framework interprets natural-language descriptions of how data will be used, then employs a multi-agent workflow to derive context-specific assessment strategies and generate executable validation logic. To ensure operational reliability, the framework includes a feasibility validation stage. This stage evaluates the realism and executability of the generated assessment specifications, allowing for iterative refinement before execution. Once accepted, the validation logic runs deterministically, guaranteeing reproducible and auditable results. An end-to-end prototype demonstrated that assessment outcomes adapt meaningfully to different intended uses, while the feasibility gate reduced unrealistic rule generation.

Why it matters

For data professionals, this framework offers a significant leap towards automating and scaling data quality checks, ensuring that data used for analytics and decision-making is consistently fit for purpose. It reduces manual effort and improves the reliability of data-driven processes by adapting to specific usage contexts.

How to implement this in your domain

  1. 1Adopt the agentic-retrieval framework to automate data quality assessment in data pipelines.
  2. 2Define data usage scenarios in natural language to generate context-aware validation rules.
  3. 3Integrate the feasibility validation stage to ensure generated rules are executable and realistic.
  4. 4Implement the framework for reproducible and auditable data quality reporting in data governance.

Who benefits

Data AnalyticsBFSIHealthcareE-commerceGovernment

Key takeaways

  • Data quality assessment is challenging due to its context-dependent nature and manual processes.
  • An agentic-retrieval framework uses LLMs to automate context-aware data quality assessment.
  • It generates executable validation logic from natural-language usage descriptions.
  • A feasibility validation stage ensures reliability and allows for iterative refinement.

Original post by Hadi Fadlallah, Ibrahim Dhaini, Fatima Mubarak, Rima Kilany

"arXiv:2606.13692v1 Announce Type: cross Abstract: Data quality assessment is a critical prerequisite for effective data analytics and data-driven decision-making, yet it remains a challenging task due to the inherently context-dependent nature of data quality. Existing approaches…"

View on X

Originally posted by Hadi Fadlallah, Ibrahim Dhaini, Fatima Mubarak, Rima Kilany on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses