WARP Recovers Foundation Model Training Data Portfolios from Weights

Tzu-Heng Huang, Aditya Goyal, John Cooper, Frederic Sala· July 3, 2026 View original

Summary

WARP is a new framework that infers the domain mixture weights used to train fine-tuned foundation models directly from their released weights, addressing the lack of transparency in training data recipes. It achieves this by analyzing geometric footprints in weight space, outperforming existing methods like membership inference.

The training data recipes for large foundation models, particularly the domain mixture weights that dictate how different data sources are sampled, are rarely disclosed to the public. This lack of transparency creates a significant challenge for researchers attempting to understand and analyze these models. Traditional methods, such as membership inference, only identify individual training samples and cannot characterize the overall composition of the training corpus. To address this, a new framework called WARP (Weight-Space Analysis for Recovering Training Data Portfolios) has been introduced. WARP works by analyzing the released weights of a fine-tuned model to infer its training data mixtures. It achieves this by interpolating between the base model and the fine-tuned model using model merging techniques, which generates "pseudo-checkpoints" that simulate the training trajectory. This process reveals a geometric footprint of the training data within the model's weight space. From these simulated footprints, WARP extracts geometric features and then maps them to domain proportions. This mapping can be done either through a parameter-free softmax readout or by using an MLP projector trained on synthetic mixtures. In controlled experiments with models like BERT and GPT-2, WARP demonstrated impressive accuracy, recovering domain mixtures with average Mean Absolute Error (MAE) as low as 0.046 and 0.104, respectively. This performance significantly surpasses that of membership inference and even a variant with access to the true training trajectory, offering a powerful tool for understanding model provenance.

Why it matters

Professionals can use WARP to gain critical insights into the composition of training data for publicly released foundation models, aiding in bias detection, intellectual property concerns, and informed model selection for specific applications.

How to implement this in your domain

  1. 1Utilize WARP to analyze the training data composition of third-party foundation models before integration.
  2. 2Implement WARP internally to audit the data mixture used for fine-tuning proprietary models.
  3. 3Develop strategies to mitigate potential biases or intellectual property risks identified by WARP's analysis.
  4. 4Integrate WARP's insights into model governance and responsible AI practices.

Who benefits

AI ResearchCybersecurityLegalSoftware DevelopmentGovernment

Key takeaways

  • Foundation model training data recipes are often undisclosed, creating transparency issues.
  • WARP infers training data domain mixtures directly from model weights.
  • It uses model merging to create "pseudo-checkpoints" and geometric footprints.
  • WARP significantly outperforms existing methods in recovering data composition.

Original post by Tzu-Heng Huang, Aditya Goyal, John Cooper, Frederic Sala

"arXiv:2607.01686v1 Announce Type: new Abstract: Foundation models are routinely released to the public, yet the data recipes used to train them -- such as domain mixture weights that determine how different sources are sampled -- are rarely disclosed. This creates an access asymm…"

View on X

Originally posted by Tzu-Heng Huang, Aditya Goyal, John Cooper, Frederic Sala on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses