Singular Learning Explains Simplicity Bias in Deep Networks.

Kathl\'en Kohn, Giovanni Luca Marchetti, Farhan Shabir, Vahid Shahverdi, Weisheng Wang· June 30, 2026 View original

Summary

This paper investigates critical points in deep fully-connected networks with monomial activations, showing that these singularities, studied in Singular Learning Theory, occur precisely at subnetworks where neurons are inactive or redundant. This provides a mathematical explanation for the implicit bias towards simpler functions in deep neural networks.

The optimization process in neural networks is profoundly influenced by critical points, which arise from the model's architectural design. These points, where the Jacobian of the model's parametrization becomes rank-deficient, are central to Singular Learning Theory. This research specifically examines such critical points within deep fully-connected networks that utilize monomial activations. Through the application of tools from polynomial algebra, such as Mason's Theorem, the study demonstrates that for sufficiently high activation degrees, criticality in these networks precisely corresponds to the formation of subnetworks. This means that at these critical parameter configurations, certain neurons become either inactive or redundant. This finding offers a rigorous mathematical perspective on the implicit bias observed in deep neural networks, providing an explanation for why these models tend to converge towards simpler, more parsimonious functions, aligning with the principle of Occam's Razor.

Why it matters

Understanding the implicit bias of neural networks helps AI researchers and engineers design more predictable models, interpret their behavior better, and potentially guide optimization towards desired simpler solutions, which can improve generalization.

How to implement this in your domain

  1. 1Consider the implications of implicit bias when designing and training deep neural networks, especially for interpretability.
  2. 2Explore how architectural choices and activation functions might influence the tendency towards simpler solutions.
  3. 3Investigate methods to control or leverage this implicit bias to improve model generalization or reduce complexity.
  4. 4Apply insights from Singular Learning Theory to diagnose and understand optimization challenges in complex models.

Who benefits

AI/ML ResearchSoftware DevelopmentData ScienceAcademia

Key takeaways

  • Critical points in neural networks are linked to singularities in Singular Learning Theory.
  • In deep monomial networks, these critical points correspond to inactive or redundant neurons.
  • This provides a mathematical basis for the implicit bias towards simpler functions.
  • Understanding this bias can aid in designing more predictable and generalizable models.

Original post by Kathl\'en Kohn, Giovanni Luca Marchetti, Farhan Shabir, Vahid Shahverdi, Weisheng Wang

"arXiv:2606.28464v1 Announce Type: new Abstract: In the optimization of neural networks, gradient dynamics are influenced by critical points that arise from the model's architecture. These critical points occur where the Jacobian of the model's parametrization is rank-deficient, a…"

View on X

Originally posted by Kathl\'en Kohn, Giovanni Luca Marchetti, Farhan Shabir, Vahid Shahverdi, Weisheng Wang on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses