ScaleToT Generalizes LLM Reasoning for Billion-Scale User Modeling

Tianbao Ma, Chang Xi, Yichuan Zou, Chengen Li, Linxun Chen, Zilong Lu, Yanan Niu, Zhaojie Liu, Han Li, Kun Gai· June 24, 2026 View original

▶ The 2-minute explainer

Summary

ScaleToT enables structured LLM reasoning for billions of low-activity users by learning from a small LLM-processed subset and extending it to the broader population. It uses a Tree-of-Thought refinement and a two-stage training process to infer latent user states from sparse profiles, significantly reducing compute costs.

Accurate user modeling typically relies on extensive interaction histories, which are often unavailable for a vast number of low-activity users. While Large Language Models (LLMs) can infer user states from static profiles, this approach becomes unreliable with sparse data and prohibitively expensive at a billion-user scale. ScaleToT addresses this by learning structured reasoning from a small, LLM-processed user subset and then generalizing it to the entire low-activity user base. ScaleToT enhances reasoning reliability by constructing typed user-state chains using an entropy-guided Tree-of-Thought (ToT) refinement procedure. This teacher-curated reasoning is then used to train a student model on static profiles through supervised fine-tuning (SFT) and a novel Outcome-Driven Segment-Aware Implicit Reward Policy Optimization (OSIPO). This two-stage process allows the student model to acquire the complex reasoning capabilities from sparse data. Finally, ScaleToT transfers the student model's reasoning representations to a lightweight profile encoder. This encoder provides shared reasoning signals for the remaining users without requiring direct LLM inference, drastically cutting down computational costs. Evaluated in a billion-scale advertising deployment for lifetime value (LTV) prediction, ScaleToT achieved a 6.738% increase in LT30, demonstrating its effectiveness and efficiency in modeling low-activity users.

Why it matters

For professionals in marketing, advertising, and product development, ScaleToT offers a breakthrough in accurately modeling billions of users with limited data, enabling highly personalized experiences and improved prediction metrics like lifetime value, all while significantly reducing computational overhead.

How to implement this in your domain

  1. 1Assess existing user modeling pipelines for low-activity user segments and data sparsity challenges.
  2. 2Explore implementing a teacher-student model architecture to distill LLM reasoning into lightweight models.
  3. 3Investigate Tree-of-Thought (ToT) or similar structured reasoning techniques for improving inference reliability.
  4. 4Pilot ScaleToT's approach for specific use cases like LTV prediction or personalized recommendations.
  5. 5Collaborate with data science and engineering teams to integrate and optimize such models for large-scale deployment.

Who benefits

AdvertisingE-commerceSocial MediaMarketingFintech

Key takeaways

  • ScaleToT enables accurate user modeling for billions of low-activity users with sparse data.
  • It generalizes structured LLM reasoning from a small subset to a large population.
  • The method significantly reduces compute costs compared to full LLM inference.
  • Online A/B tests showed a substantial increase in lifetime value prediction.

Original post by Tianbao Ma, Chang Xi, Yichuan Zou, Chengen Li, Linxun Chen, Zilong Lu, Yanan Niu, Zhaojie Liu, Han Li, Kun Gai

"arXiv:2606.24605v1 Announce Type: new Abstract: Accurate user modeling often depends on rich interaction histories, which are unavailable for billions of low-activity users. Large Language Models (LLMs) can infer latent user states from static profiles, but this reasoning becomes…"

View on X

Originally posted by Tianbao Ma, Chang Xi, Yichuan Zou, Chengen Li, Linxun Chen, Zilong Lu, Yanan Niu, Zhaojie Liu, Han Li, Kun Gai on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses