LLMs Drive Evolutionary Feature Engineering for Structured Data.

Ege Onur Taga, Yilin Zhuang, M. Emrullah Ildiz, Petros Mol, Abhimanyu Das, Karthik Duraisamy, Samet Oymak· July 3, 2026 View original

Summary

Evolutionary Feature Engineering (EFE) uses LLM-based evolution to discover preprocessing transformations for structured data, representing them as Python programs. EFE-Time improves time-series forecasting, and EFE-Tab enhances tabular prediction, boosting accuracy and interpretability.

Large language models (LLMs) are increasingly being utilized as powerful search operators within evolutionary optimization frameworks. This research introduces Evolutionary Feature Engineering (EFE), a novel framework that leverages LLM-based evolution to automatically discover effective preprocessing transformations for structured data. These transformations are represented as Python programs with a standardized `fit/transform` interface, allowing for seamless integration into existing machine learning pipelines. During the evolutionary process, candidate programs are iteratively refined using contextual information from the dataset, summary statistics, and performance feedback from a validation set. The EFE framework is demonstrated in two distinct settings. EFE-Time is designed for time-series forecasting, where it learns invertible, dataset-specific normalizations. These normalizations have been shown to improve the performance of off-the-shelf time-series foundation models, reducing forecasting errors (MASE, WQL, MAE) by 3% or more on average across datasets, with improvements reaching up to 19% on specific datasets like COVID-Deaths. Notably, these gains are observed even with recent and advanced time-series foundation models such as Chronos-2. In the context of tabular prediction, EFE-Tab evolves compact feature programs that both add useful, interpretable features and remove redundant ones. This approach either matches or surpasses the performance of existing LLM-based feature-engineering methods. EFE-Tab proved particularly effective with classical decision trees, where small sets of evolved features yielded competitive accuracy while maintaining high interpretability. Overall, EFE highlights the significant potential of LLM-based evolution to enhance both the accuracy and interpretability of machine learning models when automatically processing structured data.

Why it matters

Data scientists and machine learning engineers can use EFE to automate and optimize the often time-consuming and manual process of feature engineering, leading to more accurate models and faster development cycles for structured data applications.

How to implement this in your domain

  1. 1Explore integrating LLM-based evolutionary algorithms into your feature engineering workflows for structured data.
  2. 2Experiment with EFE-Time for time-series forecasting tasks to automatically discover optimal data normalizations.
  3. 3Apply EFE-Tab to tabular datasets to generate compact, interpretable features for improved model performance.
  4. 4Develop internal guidelines for leveraging LLMs in automated data preprocessing and feature creation.

Who benefits

FinanceHealthcareRetailManufacturingData Science Consulting

Key takeaways

  • EFE uses LLM-based evolution to automate feature engineering for structured data.
  • It generates Python programs for preprocessing transformations, improving model accuracy and interpretability.
  • EFE-Time significantly reduces errors in time-series forecasting, even with advanced foundation models.
  • EFE-Tab creates compact, interpretable features for tabular prediction, especially effective with decision trees.

Original post by Ege Onur Taga, Yilin Zhuang, M. Emrullah Ildiz, Petros Mol, Abhimanyu Das, Karthik Duraisamy, Samet Oymak

"arXiv:2607.01548v1 Announce Type: new Abstract: Large language models are increasingly used as open-ended search operators in evolutionary optimization. We introduce Evolutionary Feature Engineering (EFE), a framework for using LLM-based evolution to discover preprocessing transf…"

View on X

Originally posted by Ege Onur Taga, Yilin Zhuang, M. Emrullah Ildiz, Petros Mol, Abhimanyu Das, Karthik Duraisamy, Samet Oymak on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses