MMD Finetuning Calibrates Generative Models to Feature Distributions

Nathaniel L. Diamant, Brian L. Trippe· June 19, 2026 View original

▶ The 60-second brief

Summary

Kernel Calibrating Generative Models (kCGM) is introduced to correct distributional miscalibration in generative models by minimizing Maximum Mean Discrepancy (MMD) between generated and target feature distributions. This method improves feature matching while preserving sample validity, unlike direct finetuning.

This paper presents kernel Calibrating Generative Models (kCGM), a novel approach to address the issue of distributional miscalibration in generative models. While generative models can produce plausible individual samples, their overall feature distribution might significantly deviate from a desired target set. Direct finetuning on the target set often leads to overfitting and compromises the validity of generated samples. kCGM tackles this by minimizing the Maximum Mean Discrepancy (MMD) between the feature distributions of generated and target data. It employs an unbiased score-function estimator for MMD minimization, coupled with KL regularization to ensure the finetuned model remains close to its pretrained state. The effectiveness of kCGM is demonstrated across various tasks, including drug-like molecule generation, protein generation, and DNA generation. For instance, in generating antibiotics, kCGM significantly improves target feature matching while maintaining chemical validity, a challenge for direct finetuning. The method is adaptable to different generative model architectures, including autoregressive, continuous-space diffusion, and discrete diffusion models, using only feature-level supervision.

Why it matters

Ensuring generative models produce outputs that not only look plausible but also match desired statistical properties is crucial for applications in drug discovery, materials science, and data augmentation, leading to more useful and reliable AI-generated content.

How to implement this in your domain

  1. 1Apply kCGM to finetune generative models for domain-specific data generation where feature distribution matching is critical.
  2. 2Utilize kCGM in drug discovery to generate molecules with specific therapeutic properties while maintaining chemical validity.
  3. 3Implement kCGM for data augmentation tasks to create synthetic data that accurately reflects the statistical properties of real datasets.
  4. 4Explore kCGM for calibrating large language models to generate text with specific stylistic or factual distributions.

Who benefits

PharmaceuticalsBiotechnologyMaterials ScienceAI/ML DevelopmentData Science

Key takeaways

  • kCGM calibrates generative models to match target feature distributions.
  • It minimizes Maximum Mean Discrepancy (MMD) using an unbiased estimator.
  • kCGM improves feature matching while preserving sample validity, unlike direct finetuning.
  • The method is applicable to various generative model architectures and domains.

Original post by Nathaniel L. Diamant, Brian L. Trippe

"arXiv:2606.19496v1 Announce Type: new Abstract: Generative models can produce individually plausible samples while deviating substantially from a target set in the distribution of key features. For example, a model pretrained on broad drug-like chemical space may generate molecul…"

View on X

Originally posted by Nathaniel L. Diamant, Brian L. Trippe on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses