New Algorithm Optimizes Embedding Model Routing in Recommenders

Yan Dai, Negin Golrezaei, Patrick Jaillet· June 16, 2026 View original

Summary

This research formalizes embedding model routing in recommendation systems as an adversarial contextual linear bandit problem with low-rank experts. It introduces Hypentropy Policy Gradient (HPG), a policy gradient algorithm that provably adapts to unknown low-rank structures and achieves efficient policy regret, offering a computationally efficient solution for dynamic query routing.

Modern recommendation systems frequently route diverse user queries to multiple embedding models to provide relevant suggestions. This paper addresses the challenge of optimizing this dynamic routing under realistic conditions, such as adversarial queries, bandit feedback, and limited model observability. The authors formalize this problem as an adversarial contextual linear bandit, where queries are contexts, items are actions, and embedding models act as low-rank experts. The research identifies that standard regret notions are insufficient for this problem and proposes a log-quadratic policy class that is expressive yet structured enough for efficient online learning. To solve this, they introduce Hypentropy Policy Gradient (HPG), a policy gradient algorithm. HPG is proven to adapt to unknown low-rank structures even with incomplete information, achieving efficient linearized policy regret that avoids the curse of dimensionality. The paper also provides a computationally efficient and parameter-free implementation of HPG, making it a practical solution for dynamic embedding model routing.

Why it matters

For professionals building and optimizing recommendation systems, HPG offers a robust and efficient method for dynamically routing queries to the most appropriate embedding models. This can lead to improved recommendation quality, better resource utilization, and enhanced user experience in complex, real-world scenarios.

How to implement this in your domain

  1. 1Evaluate current embedding model routing strategies in recommendation systems for potential inefficiencies.
  2. 2Consider implementing the Hypentropy Policy Gradient (HPG) algorithm for dynamic query routing.
  3. 3Utilize HPG to adapt to unknown low-rank structures in embedding models for improved performance.
  4. 4Benchmark HPG against existing routing algorithms to measure improvements in recommendation quality and computational efficiency.

Who benefits

E-commerceMedia & EntertainmentSocial MediaAdvertisingAI/ML Development

Key takeaways

  • Embedding model routing in recommendation systems can be optimized using contextual bandits.
  • HPG is a new policy gradient algorithm for efficient dynamic routing.
  • HPG adapts to unknown low-rank structures and avoids the curse of dimensionality.
  • This method offers a computationally efficient solution for improving recommendation quality.

Original post by Yan Dai, Negin Golrezaei, Patrick Jaillet

"arXiv:2606.14929v1 Announce Type: new Abstract: Modern recommendation systems increasingly rely on dynamically routing diverse queries to multiple embedding models. Despite its practical significance, this problem remains poorly understood under realistic conditions like adversar…"

View on X

Originally posted by Yan Dai, Negin Golrezaei, Patrick Jaillet on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses