OTCache Accelerates Diffusion Models with Geometry-Aware Caching

Huanlin Gao, Fang Zhao, Qiang Hui, Fuyuan Shi, Shaoan Zhao, Yantao Li, Chao Tan, Ting Lu, Yuren You, Kai Wang, Shiguo Lian· July 1, 2026 View original

▶ The 2-minute explainer

Summary

OTCache is a training-free framework that uses Optimal Transport to predict caching schedules, significantly accelerating diffusion model sampling while improving generation fidelity. It achieves substantial speedups on various models by optimizing caching across different inference budgets.

A new framework called OTCache has been introduced to significantly speed up diffusion model sampling without requiring additional training. Current graph-based caching methods for diffusion models often struggle in low-NFE (Number of Function Evaluations) regimes due to an assumption of additive independence that doesn't always hold. OTCache addresses this by modeling caching schedules as a smooth evolution in policy space, drawing inspiration from Optimal Transport theory. The process involves three steps: first, establishing a high-fidelity reference schedule; second, conducting a lightweight anchor search for extreme low-budget settings; and finally, predicting schedules for target budgets through quantile interpolation. Experimental results demonstrate that OTCache achieves impressive acceleration rates, such as 4.5x on FLUX.1, 4.7x on Qwen-Image, and 3.66x on HunyuanVideo. Crucially, these speedups are achieved while consistently enhancing the quality of generated images compared to existing state-of-the-art caching methods. This work offers a novel perspective on optimizing diffusion models through advanced schedule modeling.

Why it matters

For professionals working with generative AI, particularly diffusion models, this innovation offers a way to drastically reduce inference time and computational costs without sacrificing output quality. This can accelerate development cycles and make high-quality generation more accessible.

How to implement this in your domain

  1. 1Explore integrating OTCache into existing diffusion model pipelines to evaluate potential speedups and quality improvements.
  2. 2Benchmark current diffusion model inference times against OTCache's reported performance on similar tasks.
  3. 3Consider the computational savings from reduced NFEs and how that impacts infrastructure costs for generative AI applications.
  4. 4Investigate the applicability of Optimal Transport principles to other areas of AI optimization beyond diffusion models.
  5. 5Review the provided code on GitHub to understand the implementation details and potential for customization.

Who benefits

AI DevelopmentCreative IndustriesGamingE-commerceMedia & Entertainment

Key takeaways

  • OTCache significantly accelerates diffusion model sampling without retraining.
  • It uses Optimal Transport to create geometry-aware caching schedules.
  • The framework improves generation fidelity while achieving substantial speedups.
  • This offers a new approach to optimizing generative AI inference.

Original post by Huanlin Gao, Fang Zhao, Qiang Hui, Fuyuan Shi, Shaoan Zhao, Yantao Li, Chao Tan, Ting Lu, Yuren You, Kai Wang, Shiguo Lian

"arXiv:2606.31026v1 Announce Type: new Abstract: We propose OTCache, a training-free framework for accelerating diffusion sampling via caching schedule prediction. Existing graph-based caching methods reduce redundant computation by optimizing shortest-path objectives, but rely on…"

View on X

Originally posted by Huanlin Gao, Fang Zhao, Qiang Hui, Fuyuan Shi, Shaoan Zhao, Yantao Li, Chao Tan, Ting Lu, Yuren You, Kai Wang, Shiguo Lian on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses