TS-Fault Benchmarks Time Series Forecasters Against Structural Faults

Yuyang Zhao, Lian Xu, Hao Miao, Chenxi Liu, Hao Xue· June 18, 2026 View original

Summary

Researchers introduce TS-Fault, a benchmark that evaluates time series forecasting (TSF) models under explicit, parameterized fault scenarios. The study reveals that clean-data accuracy often anti-correlates with robustness, and foundation models, despite high accuracy, can be fragile under mechanism-level faults.

Time series forecasting (TSF) is critical for decision-making across various sectors, yet current model evaluations primarily rely on average error on clean data, assuming this predicts real-world reliability. This assumption often fails because actual faults are not random noise but structured events, involving temporal patterns, broken dependencies, regime changes, missing data, and causal propagation. This paper presents TS-Fault, a novel benchmark designed to rigorously evaluate TSF models against explicit, parameterized fault scenarios with controllable semantic difficulty. TS-Fault categorizes common failures into four modes along two axes: observation- vs. mechanism-level, and univariate vs. multivariate. These faults are strategically injected into prediction-critical windows using a unified importance score, allowing robustness to be tested against the specific structures models depend on. The evaluation of 21 models across 6 datasets, 4 fault modes, and 5 difficulty levels revealed three counter-intuitive findings. First, clean-data accuracy often anti-correlates with robustness. Second, clean rankings are preserved under observation-level faults but significantly reshuffled under mechanism-level faults. Third, all catastrophic failures occurred under mechanism-level faults, with foundation models, despite their high clean-data accuracy, exhibiting the greatest fragility. The code for TS-Fault is publicly available.

Why it matters

Relying solely on clean-data accuracy for time series forecasting can lead to catastrophic failures in real-world deployments. TS-Fault provides a critical tool for professionals to assess and improve the robustness of TSF models against realistic operational faults, ensuring more reliable decision-making in critical applications.

How to implement this in your domain

  1. 1Integrate TS-Fault or similar structured fault injection methodologies into time series forecasting model evaluation pipelines.
  2. 2Prioritize robustness metrics alongside accuracy when selecting and deploying TSF models for critical applications.
  3. 3Analyze the performance of existing TSF models under mechanism-level faults to identify potential fragility points.
  4. 4Develop and train TSF models specifically designed to be resilient to structured faults, not just generic noise.
  5. 5Educate stakeholders on the limitations of clean-data accuracy and the importance of fault-aware benchmarking for TSF.

Who benefits

EnergyTransportationFinanceHealthcareManufacturing

Key takeaways

  • TS-Fault benchmarks time series forecasters against realistic structural faults, not just noise.
  • Clean-data accuracy often anti-correlates with real-world robustness in TSF models.
  • Mechanism-level faults cause catastrophic failures and reshuffle model rankings.
  • Foundation models, despite high accuracy, can be highly fragile under structural faults.

Original post by Yuyang Zhao, Lian Xu, Hao Miao, Chenxi Liu, Hao Xue

"arXiv:2606.18539v1 Announce Type: new Abstract: Time series forecasting (TSF) underpins consequential decisions in energy, transportation, finance, and healthcare, yet TSF models are almost universally ranked by a single number (e.g., average error) on clean held-out data, under…"

View on X

Originally posted by Yuyang Zhao, Lian Xu, Hao Miao, Chenxi Liu, Hao Xue on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses