Tabular Foundation Models Lack Robustness to Microbiome Data

Tabular Foundation Models Lack Robustness to Microbiome Data Shifts

Giulia Perciballi, Ahmad Fall, Federica Granese, Edi Prifti, Jean-Daniel Zucker· June 25, 2026 View original

Summary

This research introduces a benchmark evaluating the robustness of tabular foundation models (TFMs) to biologically inspired distribution shifts in microbiome abundance data. It finds that TFMs are sensitive to perturbations like zero-imputation and sparsification, even when discriminative features are preserved.

Tabular Foundation Models (TFMs) have shown strong performance in analyzing microbiome abundance data, but their resilience to realistic distribution shifts, which are common in biological datasets, remains largely unexplored. This study addresses this gap by presenting a new benchmark designed to assess TFM robustness under biologically inspired perturbations across six gut microbiome datasets, covering four different disease contexts. The evaluation focuses on an in-context learning setting, where models are given unperturbed support sets and then tested on perturbed query samples. To isolate robustness beyond simple "shortcut" features, the researchers preserved the most discriminative taxa while applying three controlled perturbation strategies: removing high-abundance uninformative taxa, increasing zero-inflation (sparsification), and introducing spurious non-zero injections (zero-imputation). The results indicate that merely protecting discriminative features is insufficient to guarantee stability under support-query shifts. All perturbations degraded model performance, with zero-imputation consistently proving the most harmful. This suggests that corrupting the global feature structure significantly impairs generalization, even when key taxa are retained. Furthermore, sparsification disproportionately affected TFMs compared to a classical random forest baseline, highlighting TFMs' greater sensitivity to zero-inflation-type shifts.

Why it matters

For professionals in bioinformatics, healthcare, and drug discovery leveraging AI for microbiome analysis, understanding the robustness limitations of TFMs is critical. This research highlights the need for more resilient models and careful validation strategies when deploying AI in real-world biological settings with inherent data variability.

How to implement this in your domain

1Incorporate robustness benchmarks, similar to the one proposed, into the development and validation pipeline for TFMs in microbiome research.
2Develop novel TFM architectures or training methodologies specifically designed to enhance resilience against biologically inspired distribution shifts.
3Implement data augmentation strategies that simulate realistic perturbations to improve model generalization in microbiome datasets.
4Evaluate the trade-offs between TFM complexity and robustness, potentially favoring simpler models like random forests for certain shift types.
5Collaborate with domain experts to define and simulate additional realistic distribution shifts relevant to specific biological contexts.

Who benefits

HealthcarePharmaceuticalsBiotechnologyFood & BeverageAgriculture

Key takeaways

Tabular Foundation Models show limited robustness to realistic microbiome data shifts.
Perturbations like zero-imputation and sparsification significantly degrade performance.
Protecting discriminative features alone does not ensure stability under data shifts.
TFMs are more sensitive to zero-inflation shifts than classical baselines.

Original post by Giulia Perciballi, Ahmad Fall, Federica Granese, Edi Prifti, Jean-Daniel Zucker

"arXiv:2606.24995v1 Announce Type: new Abstract: Tabular foundation models (TFMs) achieve strong performance on microbiome abundance data, yet their robustness under realistic distribution shift remains poorly characterized. We introduce a benchmark that evaluates the robustness o…"

View on X

Originally posted by Giulia Perciballi, Ahmad Fall, Federica Granese, Edi Prifti, Jean-Daniel Zucker on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

Tabular Foundation Models Lack Robustness to Microbiome Data Shifts

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Research

VISReg Enhances JEPA Training with Novel Regularization

Margaret Atwood Criticizes AI for "Garbage In, Garbage Out" Flaw

Podcast Explores Large Test-Time Compute and AI Model Budgets