Cosine-Scored Sparse Autoencoders Improve Feature Learning in AI Models

Silen Naihin, Lev Stambler· June 16, 2026 View original

Summary

A new approach for sparse autoencoders replaces the traditional inner product score with a learned blend of cosine similarity and input magnitude. This method prevents high-norm tokens from dominating feature detection, leading to more human-recognizable features.

Sparse autoencoders (SAEs) are used to detect features in AI models, typically relying on an inner product score. This standard method can lead to issues where high-norm input tokens disproportionately activate features, regardless of their actual content alignment, especially after sublayer normalization has removed magnitude information. Researchers propose a novel scoring mechanism for SAEs, replacing the inner product with a learned blend of cosine similarity and input magnitude. This allows the optimizer to determine the optimal balance, and surprisingly, features never choose more than half-magnitude dependence, indicating a preference for content over raw magnitude. The new cosine-scored autoencoders, at matched reconstruction performance, learn features that align significantly more often with human-recognizable concepts. This suggests that the traditional inner product wastes dictionary slots on "norm detectors," while the cosine-based approach fosters more meaningful and interpretable feature representations.

Why it matters

For AI engineers and researchers, this innovation offers a way to build more interpretable and robust AI models. By learning more meaningful features, it can lead to better model understanding, easier debugging, and potentially improved performance in various applications, especially those relying on sparse representations.

How to implement this in your domain

  1. 1Adopt cosine-scored sparse autoencoders in new AI model development for improved feature learning.
  2. 2Experiment with this scoring method in existing sparse autoencoder architectures.
  3. 3Evaluate the interpretability of features learned by cosine-scored SAEs compared to traditional methods.
  4. 4Apply this technique in domains where feature interpretability and robustness are critical.

Who benefits

AI DevelopmentMachine Learning ResearchNatural Language ProcessingComputer Vision

Key takeaways

  • Cosine-scored sparse autoencoders learn more interpretable features.
  • Traditional inner product scoring can lead to suboptimal feature detection.
  • The new method balances cosine similarity and input magnitude for better content alignment.
  • Improved feature learning can enhance model understanding and debugging.

Original post by Silen Naihin, Lev Stambler

"arXiv:2606.15054v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) detect features via inner product, so a feature's activation scales with both its directional alignment and the input's norm. Under BatchTopK, high-norm tokens inflate all pre-activations simultaneously, c…"

View on X

Originally posted by Silen Naihin, Lev Stambler on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses