DC Programming in Wasserstein Space Optimizes MMD and Energy Distance.

Cl\'ement Bonet, Pierre-Cyril Aubin-Frankowski, Youssef Mroueh· June 29, 2026 View original

▶ The 2-minute explainer

Summary

This research extends the convex-concave procedure (CCCP) to the Wasserstein space for optimizing non-convex functionals, particularly Maximum Mean Discrepancy (MMD) and Energy Distance (ED). It provides explicit difference-of-convex (DC) decompositions for these functionals, leading to faster and more stable convergence than standard gradient descent.

Optimizing functions over probability measures is a common task in machine learning, often performed within the Wasserstein space. However, many objective functions of practical interest, such as Maximum Mean Discrepancy (MMD) and Energy Distance (ED), are non-convex when considered along Wasserstein geodesics. This non-convexity makes it difficult to analyze and guarantee the performance of standard first-order optimization methods. This paper addresses this challenge by studying a class of objectives in the Wasserstein space that can be broken down into a "difference-of-convex" (DC) decomposition. It then adapts the well-known convex-concave procedure (CCCP) to this specific setting. Under certain smoothness and strong convexity assumptions on the convex components, the authors prove that their resulting algorithm achieves almost stationarity along its iterations. The primary focus is on MMD and ED functionals, for which the researchers develop explicit Wasserstein DC decompositions. They demonstrate that this scheme achieves local convergence under mild assumptions. Empirical results confirm that using these carefully chosen DC decompositions leads to faster and more stable convergence compared to traditional Wasserstein gradient descent when optimizing MMD objectives. This provides a more robust and efficient way to handle these important statistical distances in machine learning applications.

Why it matters

Professionals in machine learning, especially those working with generative models, domain adaptation, or statistical inference, can use this advanced optimization technique to achieve more stable and efficient training of models that rely on MMD or ED.

How to implement this in your domain

  1. 1Review current optimization strategies for models involving probability measure comparisons (e.g., GANs, domain adaptation).
  2. 2Investigate the applicability of Difference of Convex (DC) programming for non-convex objectives in Wasserstein space.
  3. 3Explore implementing the lifted convex-concave procedure (CCCP) for MMD or Energy Distance optimization.
  4. 4Benchmark the DC-based optimization against standard gradient descent methods on relevant machine learning tasks.

Who benefits

AI/TechData ScienceMachine Learning ResearchFinance

Key takeaways

  • The research extends DC programming and CCCP to the Wasserstein space.
  • It provides explicit DC decompositions for MMD and Energy Distance functionals.
  • The method achieves faster and more stable convergence than Wasserstein gradient descent.
  • This improves optimization for non-convex objectives over probability measures.

Original post by Cl\'ement Bonet, Pierre-Cyril Aubin-Frankowski, Youssef Mroueh

"arXiv:2606.27767v1 Announce Type: new Abstract: Optimizing functionals over the space of probability measures is now ubiquitous in machine learning. A widely used approach is to perform the optimization directly over the Wasserstein space, but many objective functionals of practi…"

View on X

Originally posted by Cl\'ement Bonet, Pierre-Cyril Aubin-Frankowski, Youssef Mroueh on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses