VERITAS Enhances Zero-Shot Theorem Proving with Verifier Feedback

Manish Acharya, Zhenyu Liao, Yueke Zhang, Kevin Leach, Yu Huang, Yifan Zhang· June 19, 2026 View original

Summary

VERITAS is a new zero-shot framework for formal theorem proving that significantly improves performance by integrating detailed verifier signals into its proof search. It uses a two-phase protocol involving Best-of-N sampling followed by a critic-guided MCTS pass, leveraging failures as negative examples.

This paper introduces VERITAS, a novel framework designed to enhance zero-shot formal theorem proving using large language models. Unlike previous methods that often simplify verifier feedback to a binary pass/fail signal, VERITAS incorporates rich verifier signals, such as syntax errors, type mismatches, and partial goal progress, directly into its proof search mechanism. The framework operates through a two-phase protocol. Initially, it employs a Best-of-N sampling approach to generate candidate proofs. Subsequently, a critic-guided Monte Carlo Tree Search (MCTS) pass refines these candidates, explicitly utilizing the failures from the first phase as negative examples to guide exploration. Empirical results demonstrate VERITAS's effectiveness, achieving a 40.6% success rate on the miniF2F benchmark, outperforming unguided sampling methods. The framework also shows strong performance on a new combinatorics benchmark, VERITAS-CombiBench, highlighting its ability to iteratively recover correct lemma names from detailed verifier feedback.

Why it matters

Improving automated theorem proving can accelerate research in mathematics, computer science, and formal verification, leading to more robust software and hardware systems.

How to implement this in your domain

  1. 1Explore integrating detailed feedback mechanisms from verification tools into AI-driven code generation or testing pipelines.
  2. 2Adopt a multi-phase generation and refinement strategy, similar to VERITAS, for complex problem-solving tasks in AI.
  3. 3Leverage negative examples derived from failed attempts to guide subsequent AI model exploration and improvement.
  4. 4Consider developing domain-specific verifiers that provide granular feedback for AI systems operating in critical applications.

Who benefits

Software EngineeringCybersecurityAcademiaAI/ML DevelopmentHardware Design

Key takeaways

  • Detailed verifier feedback significantly improves LLM-based theorem proving.
  • VERITAS uses a two-phase protocol: sampling followed by critic-guided MCTS.
  • Failed proof attempts are used as explicit negative examples to guide search.
  • The framework achieves higher success rates on complex formal proving benchmarks.

Original post by Manish Acharya, Zhenyu Liao, Yueke Zhang, Kevin Leach, Yu Huang, Yifan Zhang

"arXiv:2606.19399v1 Announce Type: new Abstract: LLM-based formal provers often collapse rich verifier signals (syntax errors, type mismatches, partial goal progress) into a binary pass/fail bit. We present VERITAS, a zero-shot framework that routes every verifier signal back into…"

View on X

Originally posted by Manish Acharya, Zhenyu Liao, Yueke Zhang, Kevin Leach, Yu Huang, Yifan Zhang on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses