FastMix Optimizes Data Mixtures for Large Models Efficiently.

Haoru Tan, Sitong Wu, Yanfeng Chen, Jun Xia, Ruobing Xie, Bin Xia, Xingwu Sun, Xiaojuan Qi· June 16, 2026 View original

Summary

FASTMIX is a new framework that automates data mixture discovery for pre-training and post-training large models by jointly optimizing mixture coefficients and model parameters. It reformulates mixture selection as a bilevel optimization problem, enabling efficient gradient-based optimization and outperforming baselines with reduced search costs.

The performance of large models heavily relies on large and diverse datasets, but identifying the optimal data mixture for both pre-training and post-training remains a significant challenge. Traditional methods often depend on heuristics or resource-intensive simulations. FASTMIX addresses this by introducing a novel framework that automates data mixture discovery while only requiring the training of a single proxy model. This significantly enhances efficiency and scalability compared to previous approaches. The core innovation lies in reformulating mixture selection as a bilevel optimization problem. Under this reformulation, optimizing mixture ratios becomes mathematically equivalent to assigning per-source loss weights under uniform source sampling, embedding mixture coefficients directly into a differentiable optimization objective. FASTMIX employs an approximate iterative procedure, alternating between updating model parameters based on current mixture ratios and updating those ratios based on validation feedback. This gradient-based approach has shown to outperform baseline methods across pre- and post-training tasks, drastically reducing the search cost for optimal data mixtures.

Why it matters

Professionals can significantly improve the performance of large models and reduce computational costs by automatically optimizing data mixtures, leading to more effective and efficient AI development.

How to implement this in your domain

  1. 1Integrate FASTMIX or similar data mixture optimization frameworks into large model training pipelines.
  2. 2Reformulate data selection as a bilevel optimization problem for gradient-based mixture tuning.
  3. 3Implement iterative optimization procedures that alternate between model parameter and mixture ratio updates.
  4. 4Utilize validation feedback to dynamically adjust data mixture coefficients during training.
  5. 5Explore the provided code repository to apply FASTMIX to custom pre-training and fine-tuning tasks.

Who benefits

AI DevelopmentData ScienceCloud ComputingResearch & Development

Key takeaways

  • Optimal data mixture is crucial for large model performance.
  • FASTMIX automates data mixture discovery through gradient-based optimization.
  • It reformulates mixture selection as a bilevel optimization problem.
  • FASTMIX significantly reduces search costs and outperforms traditional methods.

Original post by Haoru Tan, Sitong Wu, Yanfeng Chen, Jun Xia, Ruobing Xie, Bin Xia, Xingwu Sun, Xiaojuan Qi

"arXiv:2606.14971v1 Announce Type: new Abstract: While large and diverse datasets have driven recent advances in large models, identifying the optimal data mixture for pre-training and post-training remains a significant open problem. We address this challenge with FASTMIX, a nove…"

View on X

Originally posted by Haoru Tan, Sitong Wu, Yanfeng Chen, Jun Xia, Ruobing Xie, Bin Xia, Xingwu Sun, Xiaojuan Qi on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses