New Benchmark Reveals Label Noise Challenges for ML Models.

Shadman Islam, Agustinus Kristiadi, Mostafa Milani· June 16, 2026 View original

Summary

This paper introduces CILN, a new benchmark generation framework that creates instance-dependent label noise through controlled input corruptions. It demonstrates that the structure of noise, not just its rate, significantly impacts the performance and failure modes of noisy-label learning methods.

Evaluating machine learning methods designed to handle noisy labels often relies on synthetic instance-dependent label noise (IDN) benchmarks. However, existing methods for generating this noise, typically through simulated imperfect annotators or classifiers, often leave the underlying source of ambiguity unclear. This research presents CILN, a novel framework for generating IDN benchmarks by introducing controlled input corruptions. A diverse group of "voters" then labels these corrupted instances, resulting in datasets where both the origin and intensity of ambiguity are explicit and manageable. The framework was used to create 90 benchmark settings across CIFAR10, MNIST, and Adult datasets, covering various corruption types and severity levels. Experiments confirmed that CILN-generated benchmarks exhibit genuine instance-dependent noise and diverse confusion structures. On CIFAR-10, these benchmarks produced label distributions closer to human uncertainty than existing synthetic IDN benchmarks. Crucially, corruption-mediated IDN exposed failure modes in popular noisy-label learning methods like Co-Teaching and DivideMix that were not apparent under comparable levels of rater-fallibility noise. This highlights that the specific structure of noise, beyond just its rate, plays a significant role in benchmark difficulty and algorithm behavior, offering a complementary framework for studying noisy-label learning.

Why it matters

For ML engineers and researchers, this new benchmarking framework provides a more realistic and controllable way to test the robustness of noisy-label learning algorithms, leading to more reliable and generalizable models in real-world applications.

How to implement this in your domain

1Utilize the CILN framework to generate more realistic instance-dependent label noise benchmarks.
2Evaluate existing noisy-label learning methods against corruption-mediated IDN to identify hidden failure modes.
3Develop new algorithms specifically designed to handle diverse noise structures, not just noise rates.
4Consider the source and severity of ambiguity when designing and testing ML models for real-world data.
5Integrate controlled input corruptions into data augmentation strategies for model robustness.

Who benefits

AI DevelopmentData ScienceQuality AssuranceMachine Learning Research

Key takeaways

CILN generates instance-dependent label noise through controlled input corruptions.
Noise structure, not just rate, significantly impacts ML model performance.
Corruption-mediated IDN can expose failure modes in popular noisy-label methods.
This framework offers a more explicit and controllable way to benchmark noisy-label learning.

Original post by Shadman Islam, Agustinus Kristiadi, Mostafa Milani

"arXiv:2606.14965v1 Announce Type: new Abstract: Synthetic instance-dependent label noise (IDN) benchmarks are widely used to evaluate noisy-label learning methods, yet existing approaches typically generate noise through imperfect annotators or classifier raters, leaving the sour…"

View on X

Originally posted by Shadman Islam, Agustinus Kristiadi, Mostafa Milani on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

New Benchmark Reveals Label Noise Challenges for ML Models.

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Research

VISReg Enhances JEPA Training with Novel Regularization

Margaret Atwood Criticizes AI for "Garbage In, Garbage Out" Flaw

Podcast Explores Large Test-Time Compute and AI Model Budgets