Distillation Improves Compact LLM Math Reasoning Accuracy

Gaurab Baral, Aaditya Khanal, Yangyang Tao, Junxiu Zhou· July 1, 2026 View original

Summary

This paper demonstrates that knowledge distillation from a large reasoning model (DeepSeek-R1) to a compact student model (Qwen2.5-7B) significantly improves the student's mathematical reasoning accuracy. Using a Chain-of-Thought training corpus, the fine-tuned student model achieved a 4.76 percentage-point improvement on competition problems and generalized well to a benchmark, with response length being a critical factor.

Researchers explored the effectiveness of knowledge distillation in enhancing the mathematical reasoning abilities of smaller language models. They successfully transferred knowledge from a powerful large reasoning model, DeepSeek-R1, to a more compact student model, Qwen2.5-7B. This was achieved by creating a specialized Chain-of-Thought (CoT) training dataset derived from historical mathematics competition problems, using a dual-agent framework. The student model was fine-tuned using Low-Rank Adaptation (LoRA) on this CoT corpus. While the base Qwen2.5-7B model had 64.67% accuracy and the teacher DeepSeek-R1 achieved 91.40%, the fine-tuned student model reached a mean accuracy of 69.43% on the competition dataset, representing a notable 4.76 percentage-point improvement. It also generalized well to the MATH-500 benchmark, achieving 73.1% accuracy. A crucial finding was the impact of response length on reasoning quality, with accuracy declining significantly as response length was reduced, particularly for complex reasoning levels.

Why it matters

This research provides a practical method for deploying more capable, yet smaller and more efficient, LLMs for complex reasoning tasks, reducing computational costs and improving accessibility for various applications.

How to implement this in your domain

  1. 1Identify specific reasoning tasks where smaller LLMs underperform compared to larger models.
  2. 2Explore knowledge distillation techniques, particularly Chain-of-Thought (CoT) distillation, to improve compact models.
  3. 3Develop or acquire high-quality, reasoning-focused datasets for fine-tuning student models.
  4. 4Optimize response length generation in compact LLMs to balance efficiency and reasoning quality.

Who benefits

EdTechSoftware DevelopmentAI DevelopmentFinanceResearch & Development

Key takeaways

  • Knowledge distillation significantly improves compact LLM mathematical reasoning.
  • A Chain-of-Thought corpus from a large teacher model enhances student performance.
  • Fine-tuned student models show notable accuracy gains on competition problems and benchmarks.
  • Response length is a critical factor influencing the quality of mathematical reasoning.

Original post by Gaurab Baral, Aaditya Khanal, Yangyang Tao, Junxiu Zhou

"arXiv:2606.31048v1 Announce Type: new Abstract: This paper investigates knowledge distillation from a large reasoning model (DeepSeek-R1) to a compact student model (Qwen2.5-7B). Using historical problems from the John O'Bryan Mathematics Competition at Northern Kentucky Universi…"

View on X

Originally posted by Gaurab Baral, Aaditya Khanal, Yangyang Tao, Junxiu Zhou on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses