Pruning Pretrained LLMs Outperforms Training Small Models from Scratch

Yufeng Xu, Taiming Lu, Kunjun Li, Jiachen Zhu, Mingjie Sun, Zhuang Liu· June 15, 2026 View original

▶ The 60-second brief

Summary

This research compares pruning large language models with training smaller models from scratch, using Llama-3.1-8B as a base. It concludes that pruning consistently provides a stronger starting point, especially with limited training budgets, transferring valuable knowledge that new training alone cannot fully recover.

The study investigates two primary approaches for developing efficient, smaller language models: pruning a larger, pre-trained model or training a new small model from scratch. Using Llama-3.1-8B, researchers applied various pruning methods across different granularities and compared the resulting models under controlled training conditions. The findings indicate that when the training token budget is limited, initializing a small model by pruning a larger parent model consistently yields better performance than training a model of similar size from random initialization. This suggests that the pre-trained model provides a robust knowledge base that is difficult to replicate with a fresh start and limited resources. However, if the training budget is not a constraint, training from scratch can become competitive, particularly for coarser pruning granularities. The research concludes that pruning is generally advantageous when a large pre-trained model is available and training resources are constrained, as it effectively transfers knowledge that additional training tokens alone cannot fully recover, especially at finer pruning granularities.

Why it matters

For AI engineers and developers, understanding the most efficient way to create performant smaller LLMs is critical for resource optimization and deployment on edge devices. This research provides clear guidance on whether to prune existing models or train new ones, impacting development timelines and computational costs.

How to implement this in your domain

  1. 1Consider pruning a larger, pre-trained model if your project has a limited training token budget for smaller LLMs.
  2. 2Experiment with different pruning granularities (depth, width, sparse) to find the optimal balance for your specific use case.
  3. 3Evaluate the trade-offs between pruning and training from scratch based on available computational resources and desired model performance.
  4. 4Leverage existing large models as strong initialization points for smaller, specialized models to accelerate development.

Who benefits

AI DevelopmentEdge ComputingSoftware EngineeringCloud Services

Key takeaways

  • Pruning large LLMs generally outperforms training small models from scratch with limited token budgets.
  • Pre-trained models transfer valuable knowledge that is hard to recover through new training alone.
  • The advantage of pruning narrows with larger training budgets and higher pruning ratios.
  • For unlimited training budgets, training from scratch can be competitive for coarser pruning.

Original post by Yufeng Xu, Taiming Lu, Kunjun Li, Jiachen Zhu, Mingjie Sun, Zhuang Liu

"arXiv:2606.14150v1 Announce Type: new Abstract: Pruning promises a shortcut to strong small language models. In this work, we examine this promise by pruning Llama-3.1-8B at pruning ratios of 0.5--0.8 with six methods spanning depth, width, and sparse granularities, under two con…"

View on X

Originally posted by Yufeng Xu, Taiming Lu, Kunjun Li, Jiachen Zhu, Mingjie Sun, Zhuang Liu on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses