Research Probes Memorization in Tabular In-Context Learning Models

Francesco Capano, Jonas B\"ohler· July 1, 2026 View original

Summary

A new framework, ICLMEM, investigates parametric memorization in large tabular models (LTMs) using in-context learning. It reveals moderate memorization signals, particularly for low-cardinality tasks, though these signals largely diminish under realistic training conditions.

Large tabular models (LTMs) leveraging in-context learning (ICL) have achieved state-of-the-art performance on tabular tasks, but their memorization dynamics have remained largely unexplored. This research introduces ICLMEM, a novel probing framework specifically designed to investigate parametric memorization in these models. The framework aims to differentiate between predictions based on contextual patterns and those derived from the model's internal, memorized knowledge. ICLMEM employs a zero-information multiple-choice context to compel the model to rely on its parametric memory, stripping away valid contextual cues. Through a controlled fine-tuning setup, the study establishes ground truth for membership and accounts for common pitfalls like distribution shift and feature contamination. The evaluation on a leading real-world-trained LTM detected moderate memorization signals in 8 out of 10 tasks, with stronger signals observed for low-cardinality and binary tasks. However, these signals were found to largely vanish under more realistic training conditions, suggesting that while a potential risk, it may not be pervasive in typical deployments.

Why it matters

Understanding memorization in LTMs is critical for professionals concerned with data privacy and security, especially when deploying AI in regulated industries handling sensitive tabular information.

How to implement this in your domain

  1. 1Implement ICLMEM-like probing techniques to assess parametric memorization in your organization's large tabular models.
  2. 2Review and adjust fine-tuning strategies for LTMs to mitigate potential memorization risks, especially for sensitive data.
  3. 3Develop data governance policies that account for the memorization potential of LTMs, particularly for low-cardinality or binary features.
  4. 4Calibrate model evaluation against pre-trained base models to accurately identify true memorization signals.

Who benefits

BFSIHealthcareGovernmentRetailLegal

Key takeaways

  • Large tabular models can exhibit moderate parametric memorization signals.
  • The ICLMEM framework effectively probes and quantifies memorization in LTMs.
  • Memorization signals are strongest for low-cardinality and binary tasks.
  • Under realistic training conditions, memorization signals largely diminish.

Original post by Francesco Capano, Jonas B\"ohler

"arXiv:2606.31208v1 Announce Type: new Abstract: Large tabular models (LTMs), i.e., tabular foundation models leveraging in-context learning (ICL), achieve state-of-the-art performance on tabular tasks. While LLMs are known to unintentionally memorize training data, the memorizati…"

View on X

Originally posted by Francesco Capano, Jonas B\"ohler on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses