LLMs Complement Tabular Models for Industrial Car Retrofit Prediction

Aina Vila Pons, Ioannis Tzachristas, Constantinos Antoniou· June 16, 2026 View original

Summary

A study on industrial car retrofit prediction found that while classical tree ensembles remain strong on tabular data, LLMs can serve as effective complementary components. Direct prompting of LLMs struggled with limited semantic signal, but embedding features and hybrid stacking approaches showed promise, improving overall model performance.

This research investigates the application of Large Language Models (LLMs) to tabular data, specifically in the context of industrial car retrofit prediction, where semantic content is limited. The study compares traditional tabular machine learning baselines with three LLM-based strategies: embedding features, direct prompted classification, and an ML+LLM stacking approach. The findings indicate that classical tree ensembles continue to be the strongest standalone models for tasks like binary occurrence prediction, multi-class classification, and regression on this type of structured operational data. However, the LLM experiments revealed nuanced results. While direct prompting of LLMs performed poorly when semantic signal was removed (e.g., via hashing), LLM-generated embeddings proved useful, and a hybrid stacking approach combining ML with LLMs yielded the best manually built multi-class model. This suggests that LLMs are more effective as complementary components rather than direct replacements for robust tabular baselines in privacy-constrained industrial settings.

Why it matters

Data scientists and ML engineers working with enterprise tabular data, especially in manufacturing or logistics, can learn how to effectively integrate LLMs into their workflows. This can lead to improved predictive models for complex operational tasks, even when data lacks rich textual semantics.

How to implement this in your domain

  1. 1Experiment with LLM-generated embeddings as features for classical tabular machine learning models in industrial prediction tasks.
  2. 2Implement hybrid ML+LLM stacking approaches to leverage the strengths of both model types for improved performance on structured data.
  3. 3Avoid direct prompting of LLMs for classification on tabular data with limited semantic content, as it may lead to poor results.
  4. 4Benchmark LLM-enhanced models against strong tabular baselines to quantify performance gains in specific industrial applications.

Who benefits

AutomotiveManufacturingLogisticsSupply ChainIndustrial IoT

Key takeaways

  • Classical tree ensembles remain strong baselines for tabular data in industrial prediction.
  • LLMs are more effective as complementary components (e.g., via embeddings or stacking) than as standalone replacements for tabular models.
  • Direct prompting of LLMs struggles when semantic signal is limited in tabular data.
  • Hybrid ML+LLM approaches can achieve superior performance in complex industrial prediction tasks.

Original post by Aina Vila Pons, Ioannis Tzachristas, Constantinos Antoniou

"arXiv:2606.15314v1 Announce Type: new Abstract: Industrial retrofit planning depends on structured operational data rather than free text: planners must estimate whether a newly registered prototype will require a retrofit, which retrofit package it will need, and how long the wo…"

View on X

Originally posted by Aina Vila Pons, Ioannis Tzachristas, Constantinos Antoniou on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses