Mahalanobis Cosine Similarity Improves Linear Probe Comparison

Zhuofan Josh Ying, Peter Hase, Nikolaus Kriegeskorte· June 19, 2026 View original

Summary

Research extends the empirical finding that Mahalanobis cosine similarity (MCS) accurately predicts out-of-distribution (OOD) AUROC for linear probes. The study proves this linearity in closed form, linking it to the probe's signal-to-noise ratio.

Linear probes are a common tool in AI interpretability research, often compared using standard cosine similarity to understand how well they capture specific concepts. However, a more refined metric, Mahalanobis cosine similarity (MCS), reweights the inner product by the test data covariance, making it more task-aware. Previous work suggested that a probe's MCS to a reference probe trained on out-of-distribution (OOD) data could nearly perfectly predict the probe's OOD AUROC (Area Under the Receiver Operating Characteristic curve). This new research expands on that empirical observation, demonstrating its generality across various models, layers, and concept domains. The study provides a closed-form proof for this phenomenon, explaining that for balanced classes with Gaussian projections, both OOD AUROC and MCS are sigmoid-shaped functions of the probe's signal-to-noise ratio (SNR) on the test data, thus exhibiting a linear relationship. The theory also predicts conditions under which this linearity might fail, which was empirically verified. MCS therefore offers a theoretically grounded and empirically effective alternative to Euclidean cosine similarity for comparing linear probes, enhancing the reliability of interpretability studies.

Why it matters

For AI professionals focused on model interpretability and evaluation, Mahalanobis Cosine Similarity provides a more robust and theoretically sound method for comparing linear probes. This leads to more accurate assessments of how well models understand and represent specific concepts, especially when dealing with out-of-distribution data, improving trust and reliability in AI systems.

How to implement this in your domain

  1. 1Adopt Mahalanobis Cosine Similarity (MCS) as a standard metric for comparing linear probes in your interpretability research.
  2. 2Use MCS to evaluate the robustness of your model's internal representations when facing out-of-distribution data.
  3. 3Leverage the theoretical insights to understand when MCS is most effective and when its linearity might break down.
  4. 4Integrate MCS into your model development pipeline to guide the creation of more interpretable and reliable AI systems.

Who benefits

AI ResearchSoftware DevelopmentHealthcareFinanceAutonomous Systems

Key takeaways

  • Mahalanobis Cosine Similarity (MCS) is a superior metric for comparing linear probes.
  • MCS accurately predicts out-of-distribution AUROC for linear probes.
  • The linearity between MCS and OOD AUROC is theoretically proven and linked to signal-to-noise ratio.
  • MCS offers a theoretically grounded alternative to Euclidean cosine similarity for interpretability research.

Original post by Zhuofan Josh Ying, Peter Hase, Nikolaus Kriegeskorte

"arXiv:2606.19603v1 Announce Type: new Abstract: Linear probes are widely used in interpretability research and often compared by cosine similarity. The Mahalanobis cosine similarity (MCS) between two directions, which reweights the inner product by test data covariance, is a natu…"

View on X

Originally posted by Zhuofan Josh Ying, Peter Hase, Nikolaus Kriegeskorte on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses