EMA-FS Accelerates GBDT Training with Gain-Informed Feature Screening

Yan Song· June 26, 2026 View original

Summary

EMA-based Feature Screening (EMA-FS) is an algorithm-level optimization that significantly accelerates Gradient Boosted Decision Tree (GBDT) training, like LightGBM, by using an exponential moving average of per-feature split gains to screen out low-utility features during histogram construction. This informed approach outperforms random feature subsampling, offering substantial speedups with improved or maintained accuracy.

Gradient Boosted Decision Trees (GBDTs), such as LightGBM, spend a significant portion of their training time, often 65-70%, on constructing per-feature histograms. Existing methods for speeding this up, like random feature subsampling, discard features indiscriminately without considering their predictive value. Researchers have introduced EMA-based Feature Screening (EMA-FS) as a more intelligent optimization. EMA-FS maintains an exponential moving average (EMA) of the split gains for each feature across boosting iterations. After an initial warmup phase, it restricts histogram construction to only the top-K features, which are ranked by their historical gain. This gain-informed approach ensures that high-utility features are retained while less important ones are screened out, unlike random subsampling. The method operates at the per-tree level, ensuring full compatibility with LightGBM's existing histogram subtraction trick without requiring changes to core routines. Evaluations across diverse datasets, including financial fraud detection and advertising click-through prediction, demonstrated significant speedups. For instance, EMA-FS achieved a 2.61x speedup on a synthetic benchmark and 1.45x on the IEEE-CIS Fraud dataset. In some cases, it also improved AUC scores while delivering substantial speedups. A stochastic variant, S-EMA-FS, was also introduced, offering a unified framework for deterministic screening and random subsampling.

Why it matters

Accelerating GBDT training without sacrificing accuracy is crucial for data scientists and machine learning engineers, especially in industries dealing with large, high-dimensional datasets where model training time is a bottleneck. EMA-FS offers a practical, compatible, and effective solution for faster model development and deployment.

How to implement this in your domain

  1. 1Integrate EMA-FS into your GBDT training workflows, particularly when using LightGBM.
  2. 2Experiment with different K values (number of top features retained) and retention rates to balance speed and accuracy.
  3. 3Consider applying the stochastic variant (S-EMA-FS) for more flexible control over feature selection.
  4. 4Evaluate the performance gains on your specific high-dimensional datasets, especially in fraud detection or advertising.

Who benefits

BFSIAdTechE-commerceManufacturingData Science

Key takeaways

  • EMA-FS significantly accelerates GBDT training by screening features based on their historical gain.
  • It outperforms random feature subsampling, retaining high-utility features for histogram construction.
  • The method is compatible with LightGBM and offers substantial speedups, sometimes with improved accuracy.
  • This optimization is highly valuable for data scientists working with large, high-dimensional datasets.

Original post by Yan Song

"arXiv:2606.26337v1 Announce Type: new Abstract: Gradient Boosted Decision Trees (GBDT), exemplified by LightGBM, spend a dominant fraction of training time -- typically 65-70% -- constructing per-feature histograms. Existing approaches such as random feature subsampling (feature_…"

View on X

Originally posted by Yan Song on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses