In-Context Learning Explores Intrinsic Curiosity for Data Selection

Eric Elmoznino, Sangnie Bhardwaj, Johannes von Oswald, Rajai Nasser, Blaise Ag\"uera y Arcas, Jo\~ao Sacramento, Rif A. Saurous, Guillaume Lajoie· June 19, 2026 View original

Summary

This research investigates if large sequence models' in-context learning (ICL) capabilities can support "intrinsic curiosity" for automated data selection. It proves that while ICL-derived rewards cannot unbiasedly estimate true learning progress in general Markov decision processes, they can successfully do so in non-temporal settings like active learning.

Effective machine learning relies not only on sophisticated data modeling but also on intelligent data selection. The concept of "intrinsic curiosity" aims to automate data selection by rewarding an agent for its "learning progress," which measures how much new observations improve a world model's predictive ability. Traditionally, calculating these rewards involves computationally expensive gradient descent updates within each trajectory, making it impractical for large-scale applications. This study explores whether the emergent in-context learning (ICL) abilities of sequence models can overcome this computational bottleneck by acting as immediate, update-free world models. The goal is to train an exploration policy that maximizes learning progress using only an in-context learner's prediction errors and counterfactual context manipulations. The findings indicate a theoretical limitation: in general Markov decision processes, unbiased estimation of true learning progress using ICL-derived rewards is not possible due to nuisance terms. However, a positive result is shown for a broad subclass of non-temporal settings, such as active learning and Bayesian Experimental Design, where ICL-derived rewards can successfully bound and asymptotically converge to true learning progress. Experimental validation in continuous and symbolic environments supports these theoretical claims, demonstrating that this ICL-driven framework can train curious data-collection policies that explore optimally in specific contexts.

Why it matters

For professionals developing AI systems that require efficient data collection or active learning, this research offers insights into the capabilities and limitations of using in-context learning for intrinsic curiosity, potentially leading to more scalable and effective data acquisition strategies in certain domains.

How to implement this in your domain

  1. 1Assess the suitability of in-context learning for data selection in your specific non-temporal active learning or experimental design tasks.
  2. 2Design exploration policies that leverage ICL-derived prediction errors for intrinsic rewards in appropriate settings.
  3. 3Implement and test ICL-driven frameworks for automated data collection in scenarios like active learning where theoretical guarantees apply.
  4. 4Consider the theoretical limitations for general temporal Markov decision processes and explore alternative curiosity mechanisms if needed.

Who benefits

AI/ML DevelopmentData ScienceRoboticsScientific ResearchAutonomous Systems

Key takeaways

  • In-context learning (ICL) can potentially support intrinsic curiosity for automated data selection.
  • ICL-derived rewards can unbiasedly estimate learning progress in non-temporal settings like active learning.
  • However, ICL-derived rewards face limitations in general temporal Markov decision processes.
  • This framework can train curious data-collection policies that explore optimally in specific contexts.

Original post by Eric Elmoznino, Sangnie Bhardwaj, Johannes von Oswald, Rajai Nasser, Blaise Ag\"uera y Arcas, Jo\~ao Sacramento, Rif A. Saurous, Guillaume Lajoie

"arXiv:2606.19476v1 Announce Type: new Abstract: Effective machine learning depends not only on how we model data, but also on what data we choose to collect. While large sequence models have revolutionized data modeling, the problem of automated data selection, or "intrinsic curi…"

View on X

Originally posted by Eric Elmoznino, Sangnie Bhardwaj, Johannes von Oswald, Rajai Nasser, Blaise Ag\"uera y Arcas, Jo\~ao Sacramento, Rif A. Saurous, Guillaume Lajoie on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses