DRIFT Refines LLM Instruction Data for Peak Performance

Zefan Wang, Lincheng Li, Tianyu Yu, Yuan Yao· June 18, 2026 View original

Summary

DRIFT (Data Refinement via On-Policy Influence Functions for Supervised Fine-Tuning) is a new method that refines instruction data for Large Language Models (LLMs) to elevate their capability upper bound. It uses on-policy influence functions to identify and prioritize data instances most capable of improving the final model, overcoming limitations of standard attribution methods.

Optimizing the training data distribution is paramount for maximizing the capabilities of Large Language Models (LLMs through Supervised Fine-Tuning (SFT). While many data curation techniques focus on accelerating training under budget constraints, the challenge remains to refine data to push the model's ultimate performance ceiling. DRIFT addresses this by employing instance-level data attribution using Influence Functions (IF). It identifies and overcomes two key limitations of standard IF formulations: a "proximity gap" arising from off-policy validation targets and a bias towards gradient norms. The framework utilizes the model's own on-policy rollouts as validation targets, which minimizes the parameter proximity gap and aligns better with IF's local neighborhood assumption. It further applies signed weighting based on trajectory correctness and debiases influence scores. This allows a small set of validation queries to reliably attribute the full dataset, consistently raising the performance ceiling for 7B-parameter instruction and reasoning models, outperforming existing data curation baselines.

Why it matters

This research offers a powerful new approach to maximize the performance of LLMs by intelligently refining their training data. For professionals, this means building more capable and robust AI models, leading to better product performance and more effective AI applications.

How to implement this in your domain

  1. 1Explore DRIFT's on-policy data attribution for optimizing instruction datasets used in LLM fine-tuning.
  2. 2Investigate using influence functions to identify high-impact training examples for your AI models.
  3. 3Consider adapting the concept of "on-policy rollouts" as validation targets for more accurate data attribution.
  4. 4Implement data refinement strategies to push the performance ceiling of your organization's language models.

Who benefits

AI/ML ResearchSoftware DevelopmentContent CreationCustomer ServiceMarketing

Key takeaways

  • Data distribution is crucial for LLM capabilities, especially for elevating performance.
  • DRIFT refines instruction data using on-policy influence functions.
  • It overcomes limitations like proximity gaps and gradient norm bias in attribution.
  • The method consistently raises the performance ceiling of LLMs.

Original post by Zefan Wang, Lincheng Li, Tianyu Yu, Yuan Yao

"arXiv:2606.18307v1 Announce Type: new Abstract: Optimizing the training data distribution for Supervised Fine-Tuning (SFT) dictates the capability of Large Language Models (LLMs). While existing data curation methods excel at accelerating training under constrained budgets, they…"

View on X

Originally posted by Zefan Wang, Lincheng Li, Tianyu Yu, Yuan Yao on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses