SuperThoughts Speeds Up LLM Chain-of-Thought Reasoning

Zheyang Xiong, Shivam Garg, Max Yu, Vaishnavi Shrivastava, Haoyu Zhao, Anastasios Kyrillidis, Dimitris Papailiopoulos· June 15, 2026 View original

Summary

SuperThoughts is a method that compresses consecutive Chain-of-Thought (CoT) tokens into single latent representations, allowing LLMs to decode two tokens per step. This approach significantly reduces CoT length and doubles inference throughput while largely maintaining accuracy, especially with an adaptive fallback mechanism.

Long Chain-of-Thought (CoT) reasoning is a powerful technique for enhancing the problem-solving capabilities of large language models (LLMs), but its sequential token generation process makes it computationally expensive and slow. While some research has explored reasoning in continuous latent spaces to bypass discrete token generation, these methods often face challenges with training stability and struggle to scale to more complex, long-horizon tasks due to a lack of sufficient supervision signals. To address these limitations, researchers propose SuperThoughts. This innovative method compresses pairs of consecutive CoT tokens into single latent representations. A lightweight Multi-Token Prediction (MTP) module then decodes two tokens per step, effectively doubling the inference throughput. Crucially, this approach preserves discrete token supervision during training, ensuring stability and scalability. The method was fine-tuned on various Qwen2.5-Math-Instruct models (1.5B, 7B, 14B) and evaluated across challenging benchmarks like MATH500, AMC, OlympiadBench, and GPQA-Diamond. By incorporating a confidence-based adaptive mechanism that reverts to standard decoding when uncertainty is high, SuperThoughts achieved approximately 20-30% reduction in CoT length. This was accomplished while largely maintaining accuracy, with only a minimal 1-2 point accuracy drop on most tasks, demonstrating a significant improvement in efficiency without substantial performance degradation.

Why it matters

For professionals deploying LLMs in applications requiring complex reasoning, SuperThoughts offers a way to significantly improve inference speed and reduce computational costs without sacrificing much accuracy. This can enable faster responses and more efficient use of resources.

How to implement this in your domain

  1. 1Evaluate LLM inference bottlenecks: Identify where Chain-of-Thought reasoning is slowing down your LLM applications.
  2. 2Explore multi-token prediction: Investigate integrating multi-token prediction modules like SuperThoughts into your LLM inference pipeline.
  3. 3Fine-tune for efficiency: Consider fine-tuning your LLMs with techniques that compress reasoning steps for improved throughput.
  4. 4Implement adaptive decoding: Utilize confidence-based adaptive mechanisms to balance speed and accuracy in LLM reasoning.

Who benefits

AI DevelopmentSoftware EngineeringCustomer ServiceResearch & DevelopmentContent Generation

Key takeaways

  • SuperThoughts significantly speeds up LLM Chain-of-Thought reasoning.
  • It compresses two CoT tokens into one latent representation, doubling throughput.
  • The method maintains accuracy with minimal degradation, especially with adaptive decoding.
  • This improves efficiency and reduces computational costs for complex LLM tasks.

Original post by Zheyang Xiong, Shivam Garg, Max Yu, Vaishnavi Shrivastava, Haoyu Zhao, Anastasios Kyrillidis, Dimitris Papailiopoulos

"arXiv:2606.13862v1 Announce Type: new Abstract: Long Chain-of-Thought (CoT) reasoning improves LLM problem-solving but is computationally expensive due to sequential token generation. While recent works explore reasoning in continuous latent spaces to bypass discrete token genera…"

View on X

Originally posted by Zheyang Xiong, Shivam Garg, Max Yu, Vaishnavi Shrivastava, Haoyu Zhao, Anastasios Kyrillidis, Dimitris Papailiopoulos on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses