SLARouter Optimizes LLM Routing with Cost and SLA Guarantees.

Herbert Woisetschl\"ager, Arastun Mammadli, Ryan Zhang, Shiqiang Wang· June 19, 2026 View original

Summary

SLARouter is a new online routing algorithm that learns a cost-optimal policy for LLM requests from sparse user feedback, while providing theoretical guarantees for both cost optimality and strict Service Level Agreement (SLA) compliance. It reduces operating costs by up to 2.2x over baselines without per-benchmark tuning.

The inference costs associated with large language model (LLM) applications are escalating rapidly due to increasing demand and infrastructure expenses. Simultaneously, users expect high-quality responses, often formalized through Service Level Agreements (SLAs) in commercial contexts. This creates a fundamental conflict between managing operational costs and ensuring user satisfaction. While existing cost-aware LLM request routing methods show promise, they typically rely on complete feedback, offline training, extensive per-workload tuning, and often lack explicit SLA guarantees or real-time adaptability. To address these limitations, researchers have developed SLARouter, an innovative online routing algorithm. SLARouter is designed to learn a cost-optimal routing policy using only the sparse, one-sided user feedback commonly available in production systems. A key feature of SLARouter is its theoretical guarantees for both achieving cost optimality and strictly adhering to SLA compliance. Extensive experiments across various LLM benchmarks demonstrate that SLARouter effectively meets SLA constraints without requiring specific tuning for each benchmark, leading to significant operational cost reductions of up to 2.2 times compared to current baseline methods.

Why it matters

For businesses deploying LLM-powered applications, SLARouter offers a critical solution to balance performance, user satisfaction, and operational costs. It enables more efficient resource allocation and ensures that quality standards are met, directly impacting profitability and customer retention.

How to implement this in your domain

  1. 1Evaluate current LLM inference costs and SLA compliance metrics within your applications.
  2. 2Investigate integrating SLARouter or similar online, cost-optimal routing algorithms into your LLM serving infrastructure.
  3. 3Configure SLARouter to leverage existing sparse user feedback signals for policy learning.
  4. 4Monitor and compare the cost savings and SLA adherence achieved with SLARouter against current routing strategies.
  5. 5Adapt routing policies dynamically based on real-time performance and cost data to maintain optimal balance.

Who benefits

Cloud ComputingAI/ML PlatformsSoftware-as-a-Service (SaaS)E-commerceCustomer Service

Key takeaways

  • SLARouter optimizes LLM routing to balance inference costs and user satisfaction.
  • It learns cost-optimal policies from sparse user feedback in real-time.
  • The algorithm provides theoretical guarantees for both cost optimality and SLA compliance.
  • SLARouter can reduce operating costs by up to 2.2x without extensive tuning.

Original post by Herbert Woisetschl\"ager, Arastun Mammadli, Ryan Zhang, Shiqiang Wang

"arXiv:2606.19376v1 Announce Type: new Abstract: Inference costs for large language model (LLM) applications are rapidly growing, driven by surging demand and rising infrastructure cost. Users expect high-quality responses, and in commercial settings this is formally codified in S…"

View on X

Originally posted by Herbert Woisetschl\"ager, Arastun Mammadli, Ryan Zhang, Shiqiang Wang on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses