New Algorithm Improves Linear Bandit Exploration Efficiency

Toshinori Kitamura, Shuai Liu, Csaba Szepesv\'ari· June 30, 2026 View original

Summary

This paper introduces Absolute Thompson Sampling (ATS), a modification of Thompson Sampling for stochastic linear bandits that ensures optimism in expectation by using absolute exploration noise. ATS maintains computational efficiency while simplifying regret analysis, achieving comparable regret bounds to existing methods. An ensemble version, EATS, is also proposed, which converges to UCB behavior.

In the realm of stochastic linear bandits, algorithms like Upper Confidence Bound (UCB) offer strong theoretical guarantees but can be computationally intensive. Conversely, Thompson Sampling (TS) is efficient but often harder to analyze due to its non-optimistic nature. This research bridges this gap by proposing a new approach. The paper introduces Absolute Thompson Sampling (ATS), a variant of TS that replaces signed exploration noise with its absolute value. This simple change ensures optimism in expectation, allowing for a UCB-style regret analysis while retaining TS's computational benefits. ATS achieves competitive regret bounds, matching existing TS results. Furthermore, the authors present Ensemble Absolute Thompson Sampling (EATS), which aggregates multiple absolute perturbations. EATS demonstrates strong performance with moderate ensemble sizes and theoretically converges to UCB behavior as the ensemble grows, offering a practical and theoretically sound method for balancing exploration and exploitation.

Why it matters

For professionals working with online decision-making systems, this new algorithm offers a more computationally efficient yet theoretically robust method for exploration-exploitation trade-offs, potentially leading to faster and more effective learning in applications like recommendation systems or A/B testing.

How to implement this in your domain

  1. 1Evaluate ATS/EATS as an alternative to UCB or standard TS for online learning tasks.
  2. 2Implement ATS in A/B testing frameworks to potentially reduce computational overhead.
  3. 3Experiment with EATS to find optimal ensemble sizes for specific application contexts.
  4. 4Compare the performance of ATS/EATS against current bandit algorithms in production.

Who benefits

E-commerceAdTechMarketingPersonalizationHealthcare (clinical trials)

Key takeaways

  • Absolute Thompson Sampling (ATS) offers an efficient and analyzable alternative for linear bandits.
  • ATS ensures optimism in expectation by using absolute exploration noise.
  • Ensemble Absolute Thompson Sampling (EATS) converges to UCB behavior with growing ensemble size.
  • The new algorithms provide a balance between computational efficiency and strong theoretical guarantees.

Original post by Toshinori Kitamura, Shuai Liu, Csaba Szepesv\'ari

"arXiv:2606.28616v1 Announce Type: new Abstract: In stochastic linear bandits, the canonical Upper Confidence Bound (UCB) algorithm admits a simple frequentist regret analysis but can be computationally demanding, while Thompson Sampling (TS) is computationally attractive yet typi…"

View on X

Originally posted by Toshinori Kitamura, Shuai Liu, Csaba Szepesv\'ari on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses