Transformers Learn Modular Multiplication via Discrete-Log Clock Algorithm

Huu Danh Nguyen (Stanford University)· June 17, 2026 View original

Summary

This research reveals that small transformers learn modular multiplication by effectively reducing it to addition in discrete-log space, implementing a "Discrete-Log Clock" algorithm. By analyzing the learned embedding in the multiplicative character transform basis, the spectrum becomes sparse, showing that neurons are tuned to specific multiplicative frequencies.

Previous studies on how small transformers learn modular multiplication observed a "dense" Fourier spectrum in the learned embeddings, suggesting that all frequencies were necessary. This contrasted with modular addition, which only required a sparse set of key frequencies. New research demonstrates that this density is an artifact of using the wrong analytical basis. By applying the multiplicative character transform, which is the natural Fourier transform for multiplication, the embedding spectrum becomes highly sparse. This indicates that the transformer is not learning multiplication directly but rather transforming it into an additive problem. Specifically, the study found that 96.9% of MLP neurons are precisely tuned to a single multiplicative frequency, and neuron activation heatmaps show 2D-periodic structure when reordered by the discrete logarithm. This suggests the transformer implements a "Discrete-Log Clock" algorithm, analogous to the "Clock algorithm" for addition, by converting multiplication into addition within discrete-log space. This methodology highlights the importance of matching the analysis basis to the algebraic structure of the task to reveal interpretable patterns.

Why it matters

Understanding how transformers learn complex mathematical operations provides crucial insights into their internal mechanisms, which can inform the design of more efficient, interpretable, and robust AI models for various computational tasks.

How to implement this in your domain

  1. 1Apply basis-matching analysis techniques to interpret the internal workings of transformers on other algebraic tasks.
  2. 2Design transformer architectures that explicitly leverage discrete-logarithm-like transformations for multiplicative operations.
  3. 3Investigate if similar "clock" algorithms are implicitly learned for other complex functions within neural networks.
  4. 4Develop diagnostic tools that use algebraic structure to reveal hidden learning mechanisms in AI models.

Who benefits

AI ResearchTheoretical Computer ScienceCryptographyMachine Learning Engineering

Key takeaways

  • Transformers learn modular multiplication by converting it to addition in discrete-log space.
  • The "dense" Fourier spectrum observed previously was an artifact of incorrect analysis basis.
  • Multiplicative character transform reveals sparse, interpretable learning patterns.
  • Matching analysis basis to algebraic structure is crucial for understanding transformer mechanisms.

Original post by Huu Danh Nguyen (Stanford University)

"arXiv:2606.17399v1 Announce Type: new Abstract: When small transformers grok modular multiplication, prior work reports that the learned embedding has a "dense" Fourier spectrum requiring all frequencies. This contrasts with modular addition, where only a sparse set of key freque…"

View on X

Originally posted by Huu Danh Nguyen (Stanford University) on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses