Sparse Bagging Framework Compresses Ensembles, Improves Calibration and Speed.

Meher Sai Preetam, Meher Bhaskar· June 15, 2026 View original

Summary

Simplex-Constrained Sparse Bagging (SCSB) is a new framework for compressing and calibrating bootstrap-based bagging ensembles like Random Forests. It optimizes ensemble pruning and calibration by minimizing Out-Of-Bag loss with a concave quadratic penalty, leading to significant compression, faster inference, and better probability calibration.

Standard ensemble learning methods, such as Random Forests or Bagged Neural Networks, typically assign equal voting power to all their constituent models. This uniform weighting can lead to overconfidence and ignores the varying local competence of individual estimators. This research introduces Simplex-Constrained Sparse Bagging (SCSB), a mathematically rigorous framework designed to address these limitations. SCSB focuses on post-training compression and probability calibration for bootstrap-based bagging ensembles. It reframes ensemble pruning and calibration as a joint optimization problem over the probability simplex, aiming to minimize the Out-Of-Bag (OOB) loss. To achieve sparsity and effectively prune less useful estimators, the framework introduces a concave quadratic penalty, overcoming the "L1-simplex paradox" where the L1 norm fails to induce sparsity on the simplex. The SCSB framework is model-agnostic and has demonstrated impressive results, achieving up to 96% ensemble compression. This significant compression translates directly into linear inference speedups and superior probability calibration, as measured by lowered Expected Calibration Error, all while maintaining or even enhancing generalization accuracy.

Why it matters

Data scientists and machine learning engineers can leverage SCSB to deploy more efficient, faster, and better-calibrated ensemble models without sacrificing accuracy. This is particularly valuable in resource-constrained environments or applications requiring high-speed inference and reliable probability estimates.

How to implement this in your domain

  1. 1Evaluate SCSB for compressing and calibrating existing bagging ensembles in production.
  2. 2Integrate SCSB into machine learning pipelines to improve inference speed and reduce model footprint.
  3. 3Apply SCSB to enhance the reliability of probability predictions in classification tasks.
  4. 4Benchmark SCSB against traditional bagging methods for performance, compression, and calibration metrics.

Who benefits

FinanceHealthcareE-commerceAutonomous SystemsPredictive Analytics

Key takeaways

  • SCSB compresses bagging ensembles by up to 96%, leading to linear inference speedups.
  • It improves probability calibration by minimizing OOB loss with a concave quadratic penalty.
  • The framework is model-agnostic and maintains or enhances generalization accuracy.
  • SCSB offers a rigorous approach to address overconfidence and inefficiency in ensemble learning.

Original post by Meher Sai Preetam, Meher Bhaskar

"arXiv:2606.13589v1 Announce Type: cross Abstract: We present Simplex-Constrained Sparse Bagging (SCSB), a mathematically rigorous framework for post-training compression and probability calibration of bootstrap-based bagging ensembles. Standard bagging ensembles (such as Random F…"

View on X

Originally posted by Meher Sai Preetam, Meher Bhaskar on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses