Synthetic Data Filtering Boosts Survival Model Training
▶ The 2-minute explainer
Summary
This paper introduces FoGS (Filtered Mixture-of-Generators for Survival analysis), a novel method that reframes synthetic data construction as sample selection rather than generation. FoGS draws from multiple generators and filters samples using an ensemble of survival models, significantly improving downstream survival model performance when training on synthetic data in privacy-restricted clinical settings.
Why it matters
Healthcare professionals, pharmaceutical researchers, and data scientists can leverage FoGS to overcome data scarcity and privacy concerns in survival analysis, enabling the development of more robust and accurate predictive models for patient outcomes, drug efficacy, and disease progression using fully synthetic data.
How to implement this in your domain
- 1Assess current data privacy challenges and data scarcity issues in your survival analysis projects.
- 2Explore implementing a mixture-of-generators approach for synthetic data creation.
- 3Develop or integrate a sample filtering mechanism based on plausibility scoring using an ensemble of models.
- 4Pilot FoGS or similar synthetic data generation and filtering techniques for specific clinical or research cohorts.
- 5Collaborate with data privacy experts to ensure synthetic data generation methods meet regulatory compliance.
Who benefits
Key takeaways
- FoGS improves survival model performance by filtering synthetic data from a mixture of generators.
- It addresses data scarcity and privacy concerns in clinical settings by enabling fully synthetic training.
- The method uses an ensemble of survival models to score and select plausible synthetic samples.
- FoGS often matches or exceeds real-data training performance without compromising privacy.
Original post by Niccol\`o Maria Rizzi, Eugenio Lomurno, Alberto Archetti, Matteo Matteucci
"arXiv:2607.00127v1 Announce Type: new Abstract: Survival analysis models time-to-event data, but in clinical settings training data are costly and scarce: events accrue over years of follow-up, cohorts are small, and privacy regulations restrict sharing across institutions. Tabul…"
View on XOriginally posted by Niccol\`o Maria Rizzi, Eugenio Lomurno, Alberto Archetti, Matteo Matteucci on X · view source
Want to go deeper?
Turn these trends into skills with Learnijoy's hands-on AI & tech courses.
Explore coursesMore in AI Engineering & DevTools
Keynotes on Sandboxing and World Models Receive High Praise
An event organizer highlighted the success of extended keynotes at AIE, where speakers Chris Manning and Abhishek Bhattacharya presented on sandboxing and world models to a large, engaged audience.
Human Feedback Guides Generative Meta-Learning for Robust Generalization.
This paper introduces Generative Meta-Learning with Human Feedback (GMHF), a framework that uses expert intuition to guide data synthesis and bridge the domain gap for machine learning models. GMHF employs a Conditional Neural ODE as a generative digital twin and an RL agent to refine latent physical parameters based on feedback, significantly reducing deployment loss and improving generalization under distribution shifts.
Valdi: Value Diffusion World Models for MPC
Valdi introduces Value Diffusion World Models, combining end-to-end online training for Model Predictive Control (MPC) with a latent diffusion dynamics model. Preliminary experiments show that Valdi, using a single diffusion step, matches deterministic MLP baselines in the CarRacing environment, highlighting a trade-off between predictive multimodality and control performance.