ResearchAI Engineering & DevTools AI Research

PHF Improves LLM Reasoning by Distilling Teacher's Internal States

Yuhan Li, Mingxu Zhang, Dazhong Shen, Ying Sun· June 30, 2026 View original

Summary

Researchers propose Privileged Hidden Flow (PHF), a new method for on-policy self-distillation (OPSD) that enhances LLM reasoning. PHF distills the internal hidden states and trajectory geometry of a privileged teacher model, leading to significant performance gains over existing OPSD baselines.

On-policy self-distillation (OPSD) is a technique used to train reasoning models by having them learn from their own generated outputs, guided by a "privileged teacher" that has access to correct solutions. Current OPSD methods primarily supervise only the output distribution, meaning the teacher's internal computational process isn't directly leveraged. A new approach, Privileged Hidden Flow (PHF), addresses this by additionally distilling how a privileged teacher's hidden states evolve along the same generated sequence. Instead of forcing exact hidden state matches, PHF aligns the token-to-token transition directions and the overall trajectory geometry of the hidden states. This method, which includes an all-layer recipe and adjacent-layer relations, consistently improves aggregate performance across various Qwen models, demonstrating a more effective way to transfer internal reasoning capabilities from a teacher to a student model.

Why it matters

This research provides a more effective way to train smaller, more efficient LLMs to mimic the complex reasoning processes of larger, more capable models, leading to better performance with fewer resources.

How to implement this in your domain

1Investigate integrating PHF into existing self-distillation or knowledge distillation pipelines for LLMs.
2Experiment with PHF to improve the reasoning capabilities of smaller LLMs for specific tasks.
3Evaluate the trade-offs between computational cost and performance gains when applying PHF.
4Consider using PHF for transferring complex reasoning patterns from proprietary large models to more accessible open-source alternatives.

Who benefits

AI EngineeringSoftware DevelopmentResearch & DevelopmentCloud ComputingData Science

Key takeaways

PHF enhances on-policy self-distillation by leveraging a teacher's internal hidden states.
It aligns hidden state transition directions and trajectory geometry, not just output distributions.
PHF consistently improves reasoning performance across different LLM sizes.
This method offers a more effective way to transfer complex reasoning from teacher to student models.

Original post by Yuhan Li, Mingxu Zhang, Dazhong Shen, Ying Sun

"arXiv:2606.29340v1 Announce Type: new Abstract: On-policy self-distillation (OPSD) trains a reasoning model on rollouts sampled from its own policy by matching a privileged teacher that also sees verified reference solutions. Existing OPSD objectives supervise only the output dis…"

View on X

Originally posted by Yuhan Li, Mingxu Zhang, Dazhong Shen, Ying Sun on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

More in AI Engineering & DevTools

AI Engineering & DevTools

Sky Pro Cloud Rendering Optimized, Cost Cut by 50%

An upcoming Sky Pro update significantly reduces cloud rendering costs by 50% through texture consolidation and introduces more intuitive cloud shape controls. The new controls allow independent erosion strength adjustments for cloud tops and bottoms, improving visual quality and ease of use.

@dangreenheckJun 30, 2026

AI InvestingAI News & ToolsAI Engineering & DevTools

Popping the GPU Bubble

The piece discusses the current high demand and pricing for GPUs, suggesting that the market might be nearing a point of correction or saturation.

radqJun 30, 2026

AI News & ToolsAI Engineering & DevTools

LongCat-2.0 Model Launching Soon on Hugging Face

The LongCat-2.0 model is expected to be released shortly on the Hugging Face platform, making it accessible to developers and researchers.

@_akhaliqJun 30, 2026