HistoriQA: New Multi-Hop QA Dataset for French History.

Aur\'elien Pellet (LRE), Julien Perez (EPITA, LRE), Marie Puren (LRE, CJM)· July 1, 2026 View original

▶ The 2-minute explainer

Summary

HistoriQA-ThirdRepublic is a new French-language multi-hop question answering dataset derived from parliamentary debates and newspapers of the French Third Republic (1870-1940). Developed with historians, it captures complex reasoning patterns like cross-source synthesis and temporal reasoning, providing a resource to evaluate retrieval-augmented and large language models in domain-specific historical contexts.

Historical research often involves complex reasoning, requiring the synthesis of information from multiple, heterogeneous sources and an understanding of temporal relationships. Traditional NLP benchmarks frequently fall short in capturing these nuanced patterns, creating a gap between general language model capabilities and the specific needs of historical scholarship. To address this, researchers have introduced HistoriQA-ThirdRepublic, a novel French-language dataset specifically designed for multi-hop historical question answering. This corpus is built from parliamentary debates and newspapers spanning the French Third Republic (1870-1940) and was developed in close collaboration with a historian to ensure its relevance and accuracy. The dataset comprises 1782 questions that emphasize multi-hop connections across diverse historical documents, demanding cross-source synthesis, temporal reasoning, and the integration of sparse evidence. HistoriQA-ThirdRepublic serves as a valuable resource for evaluating the performance of retrieval-augmented and large language models in domain-specific contexts, demonstrating how the methodology can be adapted for other languages and national corpora, thereby bridging the divide between advanced NLP and the practical demands of historical inquiry.

Why it matters

For professionals in AI/NLP development, digital humanities, and government archives, this dataset offers a unique resource for training and evaluating advanced language models on complex, multi-hop historical reasoning, pushing the boundaries of AI applications in specialized domains.

How to implement this in your domain

1Utilize HistoriQA-ThirdRepublic to benchmark and fine-tune retrieval-augmented and large language models for domain-specific historical inquiry.
2Adapt the methodology for constructing multi-hop QA datasets to other languages or national historical corpora.
3Collaborate with historians or domain experts to design datasets that capture complex reasoning patterns relevant to specific fields.
4Explore the application of multi-hop QA systems for internal knowledge management or archival research within organizations.
5Develop AI tools that can synthesize information from heterogeneous sources, including text and potentially other media, for comprehensive analysis.

Who benefits

AcademiaGovernmentMediaConsultingEdTech

Key takeaways

Historical research requires complex multi-hop reasoning across diverse sources.
HistoriQA-ThirdRepublic is a new dataset for French historical QA.
It evaluates LLMs on cross-source synthesis and temporal reasoning.
The methodology is adaptable for other languages and historical contexts.

Original post by Aur\'elien Pellet (LRE), Julien Perez (EPITA, LRE), Marie Puren (LRE, CJM)

"arXiv:2606.31325v1 Announce Type: new Abstract: We present HistoriQA-ThirdRepublic: a French-language dataset of multi-hop historical questions derived from parliamentary debates and newspapers of the French Third Republic. Designed in collaboration with a historian, the corpus c…"

View on X

Originally posted by Aur\'elien Pellet (LRE), Julien Perez (EPITA, LRE), Marie Puren (LRE, CJM) on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

HistoriQA: New Multi-Hop QA Dataset for French History.

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Research

Philosophical Foundations for Explainable AI in Healthcare Explored

New Metric Improves LLM Reinforcement Learning with Verifiable Rewards.

New ACE Module Boosts LLM Agent Context Management