Zeta Optimizer Improves Neural Network Training with Dual Whitening

Kaiwen Chen, Shuhai Zhang, Qiuwu Chen, Zimo Liu, Linxiao Li, Ying Sun, Yuchen Li, Yifan Zhang, Bo Han, Mingkui Tan· June 15, 2026 View original

Summary

Researchers introduce Zeta, a new optimizer that enhances large-scale neural network training by applying a dual whitening process. It addresses the issue of scale heterogeneity in momentum matrices, leading to faster convergence and better generalization across various AI tasks.

Training large neural networks often relies on matrix-aware optimizers that leverage the structural properties of weight parameters. However, a significant challenge for these methods, such as Muon, is their sensitivity to the conditioning of input matrices, particularly the severe scale heterogeneity observed in raw momentum matrices. A new optimizer named Zeta has been developed to tackle this problem. Zeta employs a dual whitening pipeline, first applying coordinate whitening to correct intra-matrix scale imbalances, and then spectral whitening. This specific ordering is crucial because coordinate whitening establishes the necessary statistical isotropy for spectral whitening to operate effectively and reliably. The developers prove that this dual approach significantly reduces orthogonalization error and improves the condition number of the input matrices. Empirical evaluations demonstrate that Zeta matches or surpasses strong baselines in language modeling, mixture-of-experts architectures, and vision tasks, resulting in faster convergence and improved generalization.

Why it matters

This innovation offers a more robust and efficient optimization method for training large-scale neural networks, which can accelerate research and development in AI. Professionals working with deep learning models can achieve better performance and faster training times, especially for complex architectures like Transformers.

How to implement this in your domain

  1. 1Evaluate Zeta as an alternative optimizer for training large-scale neural networks, particularly Transformer-based models.
  2. 2Integrate the Zeta optimizer into existing deep learning frameworks to leverage its dual whitening capabilities.
  3. 3Benchmark Zeta's performance against current state-of-the-art optimizers on specific language modeling or vision tasks.
  4. 4Consider the implications of improved convergence and generalization for deploying more efficient and accurate AI models in production.

Who benefits

AI/ML DevelopmentCloud ComputingAutonomous SystemsNatural Language ProcessingComputer Vision

Key takeaways

  • Zeta is a new optimizer that uses dual whitening to improve large-scale neural network training.
  • It addresses scale heterogeneity in momentum matrices, a common vulnerability in matrix-aware optimizers.
  • The specific ordering of coordinate and spectral whitening is critical for its effectiveness.
  • Zeta leads to faster convergence and better generalization across diverse AI tasks.

Original post by Kaiwen Chen, Shuhai Zhang, Qiuwu Chen, Zimo Liu, Linxiao Li, Ying Sun, Yuchen Li, Yifan Zhang, Bo Han, Mingkui Tan

"arXiv:2606.14187v1 Announce Type: new Abstract: Large-scale neural network training increasingly relies on matrix-aware optimizers that exploit the structure of weight parameters beyond element-wise adaptation. However, existing matrix-aware methods such as Muon have an underappr…"

View on X

Originally posted by Kaiwen Chen, Shuhai Zhang, Qiuwu Chen, Zimo Liu, Linxiao Li, Ying Sun, Yuchen Li, Yifan Zhang, Bo Han, Mingkui Tan on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses