TDGT Toolkit Generates High-Fidelity, Privacy-Preserving Tabular Data

Vasileios C. Pezoulas, Nikolaos S. Tachos, Eleni Georga, Kostas Marias, Manolis Tsiknakis, Dimitrios I. Fotiadis· July 1, 2026 View original

Summary

A new web-based toolkit, TDGT, facilitates synthetic tabular data generation and fidelity assessment, featuring adaptive Bayesian mixture models and VAE-based latent space learning. It supports GPU acceleration and offers multi-metric evaluation for privacy and data quality.

The increasing demand for privacy-preserving data sharing in AI workflows has led to the development of TDGT, a new Tabular Data Generation Toolkit. This web-based solution offers an integrated platform for creating synthetic tabular data and thoroughly assessing its fidelity. It addresses common limitations in existing tools by incorporating adaptive generation strategies and comprehensive multi-metric evaluation. TDGT introduces the Adaptive Bayesian Mixture Synthesizer (ABMS), an innovative algorithm that automatically determines the optimal number of mixture components, eliminating manual hyperparameter tuning. For more complex, nonlinear data distributions, it offers VAE-ABMS, a hybrid architecture combining Variational Autoencoders with adaptive Bayesian mixture synthesis. The toolkit also includes a GPU-accelerated variant for large datasets and evaluates synthetic data using eleven statistical fidelity metrics, alongside privacy risk indicators like k-anonymity.

Why it matters

This toolkit is crucial for professionals needing to share or use sensitive tabular data while maintaining privacy, enabling responsible AI development and compliance with data protection regulations.

How to implement this in your domain

  1. 1Explore TDGT for generating synthetic datasets to train or test AI models without exposing sensitive real data.
  2. 2Integrate the Adaptive Bayesian Mixture Synthesizer (ABMS) into data pipelines to automate optimal synthetic data generation.
  3. 3Utilize TDGT's multi-metric evaluation capabilities to rigorously assess the fidelity and privacy risks of generated data.
  4. 4Leverage the GPU-accelerated features for efficient synthetic data generation in large-scale scenarios.

Who benefits

HealthcareBFSICybersecurityMarketingGovernment

Key takeaways

  • TDGT provides a unified, web-based solution for synthetic tabular data generation and assessment.
  • The Adaptive Bayesian Mixture Synthesizer (ABMS) automates hyperparameter tuning for optimal data generation.
  • TDGT supports high-fidelity generation for complex data and includes comprehensive privacy risk indicators.
  • GPU acceleration enables efficient processing for large datasets across various domains.

Original post by Vasileios C. Pezoulas, Nikolaos S. Tachos, Eleni Georga, Kostas Marias, Manolis Tsiknakis, Dimitrios I. Fotiadis

"arXiv:2606.31268v1 Announce Type: new Abstract: The growing demand for privacy-preserving data sharing has positioned synthetic data generation as a critical component of responsible AI workflows. Despite notable advances in generative modeling, existing solutions often lack inte…"

View on X

Originally posted by Vasileios C. Pezoulas, Nikolaos S. Tachos, Eleni Georga, Kostas Marias, Manolis Tsiknakis, Dimitrios I. Fotiadis on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses