Preregistration Protocol Mitigates p-Hacking in LLM Research.

Maria Thomas, Kristina Gligoric, Nihar B. Shah· June 29, 2026 View original

Summary

Researchers propose a preregistration protocol to combat p-hacking in LLM-based research, where experimenters tune prompts or parameters to achieve desired results. By preregistering the analysis plan and eligible future models, the protocol effectively blocks p-hacks from transferring to newly released LLMs.

The increasing use of Large Language Models (LLMs) for data generation, classification, and annotation in research introduces a significant risk of "p-hacking." This occurs when researchers iteratively adjust prompts, decoding parameters, or output formats until a statistically significant or desired result is achieved, compromising the validity of findings. To counter this, a novel protocol has been proposed: preregistering the experiment and specifying a set of eligible future LLMs. The core idea is that researchers finalize their experimental procedure on current models and then publicly commit to an analysis plan, along with a list of models that will be considered for the confirmatory analysis. The actual confirmatory analysis is then run on the *first eligible LLM released after* the preregistration. Because this future model does not exist at the time of commitment, it cannot be "hacked against." Furthermore, configurations that successfully p-hack one model often do not transfer effectively to a different, newer model. The protocol was evaluated on two tasks with known true values, demonstrating that it blocked the transfer of p-hacks in over 70% of cases across various models and configurations. This method offers a robust way to enhance the scientific rigor and trustworthiness of LLM-based research.

Why it matters

Professionals conducting or relying on LLM-based research can adopt this protocol to ensure the integrity and reproducibility of their findings, fostering greater trust in AI-generated insights.

How to implement this in your domain

  1. 1Adopt a preregistration protocol for all LLM-based research projects, specifying prompts, parameters, and analysis plans.
  2. 2Commit to using a future, unreleased LLM for confirmatory analysis to prevent p-hacking.
  3. 3Educate research teams on the risks of p-hacking in LLM experiments and the benefits of preregistration.
  4. 4Integrate preregistration platforms into research workflows to formalize commitment to experimental designs.

Who benefits

Academic ResearchAI DevelopmentMarket ResearchData Science

Key takeaways

  • LLM-based research is susceptible to p-hacking through iterative tuning of prompts and parameters.
  • A preregistration protocol can mitigate p-hacking by committing to future, unreleased LLMs.
  • P-hacks often do not transfer effectively across different LLM versions.
  • This protocol enhances the scientific rigor and trustworthiness of LLM research.

Original post by Maria Thomas, Kristina Gligoric, Nihar B. Shah

"arXiv:2606.27687v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used to generate, classify, and annotate data whose outputs feed downstream hypothesis tests. However, LLM-based research is easy to p-hack: a researcher can tune the prompts, decoding…"

View on X

Originally posted by Maria Thomas, Kristina Gligoric, Nihar B. Shah on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses