SCARCE Estimates Rare AI System Failure Probabilities

Yingjie Wang, Yi Dong, Edmund Lau, Jie Meng, Taylor T Johnson, Xiaowei Huang· June 30, 2026 View original

Summary

Researchers introduce SCARCE (Scalable Cascade Analysis for Rare-event Characterisation via Embeddings), a novel method for estimating the probabilities of rare events in AI systems, such as jailbreaks. SCARCE replaces traditional handcrafted performance functions with learned latent representations and geometric rulers, achieving significantly lower error rates than classical methods and transferring effectively across domains.

Estimating the probabilities of rare events, which are critical for the safety and reliability of modern AI systems, is notoriously difficult due to the prohibitive sample budgets required by direct Monte Carlo simulations. Subset Simulation (SS) offers a solution by breaking down rare-event probabilities into more manageable conditional probabilities over nested intermediate events. However, classical SS demands a manually designed scalar performance function, requiring detailed knowledge of the failure mechanism and limiting its applicability to new domains. SCARCE (Scalable Cascade Analysis for Rare-event Characterisation via Embeddings) overcomes these limitations by replacing the performance function with learned latent representations and geometric rulers. These rulers score the proximity to failure regions, while adaptive thresholding directly constructs nested intermediate events from data. The method is formalized with a non-negative supermartingale, providing a valid upper envelope even with early stopping. In experiments, SCARCE demonstrated superior accuracy, achieving 400-500 times lower mean absolute error than traditional SS on MNIST misclassification tasks. When applied to LLM jailbreaks under a fleet-level threat model, a PCA-based ruler within SCARCE achieved low mean relative error on Llama-Guard-3-8B hidden states and successfully transferred to a GCG-style corpus, proving its effectiveness in characterizing rare AI system failures.

Why it matters

For professionals building and deploying AI systems, SCARCE provides a powerful and efficient tool to quantify the risk of rare but critical failures like jailbreaks, enabling more robust safety evaluations and system designs.

How to implement this in your domain

  1. 1Integrate SCARCE-like methodologies into the safety and reliability testing pipelines for AI systems, especially for identifying rare failure modes.
  2. 2Utilize learned latent representations and geometric rulers to characterize failure regions in complex AI models without requiring handcrafted performance functions.
  3. 3Apply SCARCE to estimate the probability of adversarial attacks or "jailbreaks" in large language models during development and deployment.
  4. 4Develop adaptive thresholding mechanisms to construct intermediate events for rare-event analysis directly from operational data.

Who benefits

CybersecurityAI DevelopmentAutonomous VehiclesFinanceHealthcare

Key takeaways

  • Rare event probability estimation is crucial for AI safety but computationally expensive.
  • SCARCE uses learned latent representations and geometric rulers instead of handcrafted functions.
  • It significantly reduces error rates compared to traditional Subset Simulation.
  • SCARCE effectively estimates LLM jailbreak probabilities and transfers across domains.

Original post by Yingjie Wang, Yi Dong, Edmund Lau, Jie Meng, Taylor T Johnson, Xiaowei Huang

"arXiv:2606.29623v1 Announce Type: new Abstract: Rare events govern the safety profile of modern AI systems, yet their probabilities are extremely difficult to estimate: direct Monte Carlo requires prohibitive sample budgets. Subset Simulation (SS) addresses this by decomposing a…"

View on X

Originally posted by Yingjie Wang, Yi Dong, Edmund Lau, Jie Meng, Taylor T Johnson, Xiaowei Huang on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses