New Algorithms Optimize Contextual Slate Bandits with Limited Adaptivity

Tanmay Goyal, Sukruta Prakash Midigeshi, Gaurav Sinha· July 1, 2026 View original

Summary

This research introduces B-SlateGLinCB and RS-SlateGLinCB, two new algorithms for contextual slate bandit problems with generalized linear rewards under limited adaptivity. These algorithms achieve strong regret bounds and computational efficiency, outperforming baselines in simulations and practical language model context selection tasks.

This paper explores the contextual slate bandit problem, a scenario where a learner must select a slate of items from multiple sets to maximize rewards, with rewards following a Generalized Linear Model (GLM). The focus is on settings with limited adaptivity, meaning the system cannot update its policy continuously. The researchers propose two novel algorithms: B-SlateGLinCB for batched adaptivity and RS-SlateGLinCB for rarely-switching adaptivity.B-SlateGLinCB divides the learning horizon into a logarithmic number of batches, updating its policy only between batches. RS-SlateGLinCB, on the other hand, performs a limited number of parameter updates throughout the process. Both algorithms demonstrate impressive regret bounds that are independent of the GLM's non-linearity parameter, a common scaling factor in other GLM bandit algorithms.Crucially, these algorithms are computationally efficient, requiring only polynomial time per round despite the exponential number of possible slates. Simulations show they outperform existing limited-adaptivity baselines and remain competitive with fully adaptive state-of-the-art methods. Their strong performance was also validated in a practical in-context example selection task for language models, highlighting their real-world applicability.

Why it matters

Professionals designing recommendation systems, ad placement engines, or content personalization platforms can leverage these algorithms to achieve efficient and effective decision-making with reduced computational overhead and fewer policy updates.

How to implement this in your domain

  1. 1Assess current recommendation or content selection systems for opportunities to apply contextual slate bandits.
  2. 2Consider implementing B-SlateGLinCB for scenarios where batched policy updates are feasible and desirable.
  3. 3Explore RS-SlateGLinCB for systems requiring very infrequent policy switches to minimize computational cost.
  4. 4Evaluate the algorithms' performance against existing baselines using A/B testing or simulation with real-world data.
  5. 5Integrate the chosen algorithm into production systems, particularly for tasks like in-context example selection for LLMs.

Who benefits

E-commerceMedia & EntertainmentAdTechEdTechSocial Media

Key takeaways

  • New algorithms, B-SlateGLinCB and RS-SlateGLinCB, address contextual slate bandit problems with limited adaptivity.
  • They offer strong regret bounds and high computational efficiency.
  • The algorithms are suitable for scenarios like recommendation systems and content personalization.
  • They show competitive performance against fully adaptive methods in practical applications.

Original post by Tanmay Goyal, Sukruta Prakash Midigeshi, Gaurav Sinha

"arXiv:2606.31449v1 Announce Type: new Abstract: We investigate the contextual slate bandit problem with generalized linear rewards under limited adaptivity. At each round, the learner is presented with $N$ sets of items, where each item is represented by a $d$-dimensional feature…"

View on X

Originally posted by Tanmay Goyal, Sukruta Prakash Midigeshi, Gaurav Sinha on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses