PSyGenTAB Generates Privacy-Preserving Synthetic Clinical Data

Arshia Ilaty, Hossein Shirazi, Manasi Chitale, Kedar Hegde, Dhanalakshmi Ramesh, Rashmi S. Manjunath, Amir Rahmani, Hajar Homayouni· June 18, 2026 View original

Summary

Researchers developed PSyGenTAB, a framework that generates synthetic clinical tabular data by formulating the process as a constrained optimization problem. This method explicitly manages the privacy-utility trade-off, preserving clinically meaningful patterns while protecting patient privacy.

The development of medical AI is frequently hampered by restricted access to high-quality clinical data due to stringent privacy regulations like HIPAA and GDPR, alongside data silos. Synthetic data generation offers a promising avenue, but current methods often struggle to balance privacy with data utility, potentially degrading crucial clinical patterns or risking patient re-identification. This paper introduces PSyGenTAB, a novel privacy-preserving generative framework that addresses this challenge. It reframes synthetic healthcare data generation as a constrained optimization problem, which is then solved using the Augmented Lagrangian Method. Through extensive evaluation across multiple clinical benchmarks, PSyGenTAB demonstrated its ability to maintain essential inter-feature clinical relationships and minority-class diagnostic patterns. Downstream AI models trained on PSyGenTAB's synthetic data achieved performance comparable to those trained on real patient records, while privacy audits confirmed reduced exact record reproduction and strong resilience against membership inference attacks. This framework provides a principled way to foster secure, cross-institutional AI development in healthcare.

Why it matters

This framework is crucial for accelerating medical AI development by enabling secure data sharing and model training across institutions, overcoming significant privacy barriers without compromising data utility or patient confidentiality.

How to implement this in your domain

  1. 1Evaluate PSyGenTAB or similar constrained optimization approaches for generating synthetic data in privacy-sensitive domains.
  2. 2Implement privacy-preserving synthetic data generation to facilitate AI model development and testing with restricted real data.
  3. 3Collaborate with legal and compliance teams to define and embed explicit privacy constraints into data generation pipelines.
  4. 4Conduct rigorous privacy audits and utility assessments on synthetic datasets before deployment in AI projects.
  5. 5Explore cross-institutional data collaboration opportunities using privacy-preserving synthetic data.

Who benefits

HealthcarePharmaceuticalsMedical AIInsuranceResearch Institutions

Key takeaways

  • PSyGenTAB generates high-utility synthetic clinical data while preserving patient privacy.
  • It uses constrained optimization to explicitly manage the privacy-utility trade-off.
  • Synthetic data generated by PSyGenTAB maintains critical clinical patterns and relationships.
  • AI models trained on this synthetic data perform comparably to those trained on real data.

Original post by Arshia Ilaty, Hossein Shirazi, Manasi Chitale, Kedar Hegde, Dhanalakshmi Ramesh, Rashmi S. Manjunath, Amir Rahmani, Hajar Homayouni

"arXiv:2606.18518v1 Announce Type: new Abstract: The development of medical AI is constrained by limited access to high-quality clinical data due to institutional silos and strict privacy regulations such as HIPAA and GDPR. Synthetic data generation offers a potential solution, bu…"

View on X

Originally posted by Arshia Ilaty, Hossein Shirazi, Manasi Chitale, Kedar Hegde, Dhanalakshmi Ramesh, Rashmi S. Manjunath, Amir Rahmani, Hajar Homayouni on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses