Holistic Data Scheduler Boosts LLM Pre-training Efficiency and Capability.

Chenhao Dang, Jing Ma, Mingjie Liao· June 24, 2026 View original

▶ The 2-minute explainer

Summary

The Holistic Data Scheduler (HDS) is a new multi-objective reinforcement learning framework that optimizes data mixing for LLM pre-training. By considering data quality, inter-domain influence, and model weight norms, HDS significantly improves training efficiency and final model performance.

Researchers have introduced the Holistic Data Scheduler (HDS), a novel framework designed to optimize data composition during Large Language Model (LLM) pre-training. Existing online data mixing methods often rely on a single optimization perspective, which fails to account for the complex, multi-dimensional requirements of LLM training. HDS addresses this by formulating data scheduling as a continuous control reinforcement learning problem. The framework leverages the Soft Actor-Critic (SAC) algorithm for its stability and sample efficiency in exploring high-dimensional policy spaces. A key innovation of HDS is its multi-objective, holistic reward function. This function integrates three critical perspectives: a data-driven reward for quality, a loss-driven reward capturing inter-domain influence, and a model-driven reward based on weight norms. Systematic experiments on LLMs of various sizes, using The Pile benchmark, demonstrated HDS's effectiveness. It achieved the same final validation perplexity as the next best method with 44% fewer training iterations. Furthermore, HDS delivered a 7.2% improvement on the MMLU 0-shot task and consistent gains across other benchmarks, showcasing its ability to enhance both training efficiency and the final capabilities of LLMs.

Why it matters

This research offers a significant advancement for anyone involved in pre-training large language models, promising substantial improvements in both computational efficiency and model quality. Optimizing data scheduling can lead to faster development cycles and more capable LLMs, directly impacting the cost and performance of AI applications.

How to implement this in your domain

  1. 1Investigate integrating the Holistic Data Scheduler (HDS) framework into your LLM pre-training pipelines.
  2. 2Experiment with the multi-objective reward function, adapting its components (data-driven, loss-driven, model-driven) to your specific LLM training goals.
  3. 3Utilize the Soft Actor-Critic (SAC) algorithm for stable and efficient exploration of data mixing policies.
  4. 4Benchmark HDS against current online data mixing strategies to quantify efficiency gains and performance improvements on your datasets.
  5. 5Consider how dynamic data composition can be further tailored for specialized LLMs or specific downstream tasks.

Who benefits

AI DevelopmentCloud ComputingResearch & DevelopmentSoftware Engineering

Key takeaways

  • The Holistic Data Scheduler (HDS) optimizes LLM pre-training data mixing using multi-objective reinforcement learning.
  • HDS integrates data quality, inter-domain influence, and model weight norms into its reward function.
  • It significantly reduces training iterations (44% fewer) while improving final model capabilities (e.g., 7.2% MMLU gain).
  • This framework enhances both the efficiency and performance of large language model development.

Original post by Chenhao Dang, Jing Ma, Mingjie Liao

"arXiv:2606.24133v1 Announce Type: new Abstract: The composition of training data, governed by the diversity of sources and their mixing strategy, is a cornerstone of Large Language Model (LLM) pre-training. Online Data Mixing (ODM), the technique of adaptively adjusting data mixt…"

View on X

Originally posted by Chenhao Dang, Jing Ma, Mingjie Liao on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses