LLM Features Can Degrade GNN Performance on Homophilous Graphs.

Zhongyuan Wang, Pratyusha Vemuri· June 17, 2026 View original

▶ The 60-second brief

Summary

A study reveals that concatenating LLM-generated node features can systematically degrade Graph Neural Network (GNN) accuracy on homophilous benchmarks, contrary to common belief. This "concatenation interference" is observed with pure input concatenation and is correlated with the LLM's standalone discriminability rather than graph homophily.

It is widely assumed that incorporating features generated by large language models (LLMs) into Graph Neural Networks (GNNs) generally improves accuracy on standard benchmarks. However, new research presents a contrasting observation: when LLM features are simply concatenated to the input (without joint training, distillation, or prompt conditioning), they can actually *reduce* GNN accuracy, particularly on homophilous graph benchmarks. The study found significant accuracy drops on datasets like PubMed (-17.0 pp) and Cora (-4.3 pp) when using SBERT-encoded GPT-4o-mini TAPE features with an MLP backbone. This "concatenation interference" effect diminishes with different GNN backbones or random data splits and can even reverse on medium-homophily datasets like WikiCS and ogbn-arxiv. To predict when this degradation occurs, the researchers propose a simple measure called Delta_sig, which quantifies the LLM's standalone discriminability. This measure correlates more strongly with the concatenation cost than graph homophily itself. The findings suggest that simply adding LLM features via concatenation is not always beneficial and can introduce interference, especially when the LLM features are not sufficiently discriminative on their own for the specific task.

Why it matters

AI engineers and researchers working with GNNs and LLMs must be cautious about simply concatenating LLM features, as it can unexpectedly degrade model performance. Understanding the conditions under which this interference occurs is crucial for designing effective hybrid AI systems.

How to implement this in your domain

  1. 1Avoid direct concatenation of LLM features to GNN inputs without careful validation, especially on homophilous graphs.
  2. 2Evaluate the standalone discriminability (Delta_sig) of LLM features before integrating them into GNNs.
  3. 3Consider alternative integration strategies like joint training, distillation, or prompt conditioning instead of pure concatenation.
  4. 4Benchmark GNN performance with and without LLM feature concatenation across diverse graph datasets to identify potential interference.

Who benefits

Social Network AnalysisDrug DiscoveryRecommender SystemsCybersecurityKnowledge Graphs

Key takeaways

  • Concatenating LLM features can degrade GNN accuracy on homophilous graphs.
  • This "concatenation interference" is observed with pure input concatenation.
  • The effect correlates with LLM's standalone discriminability (Delta_sig).
  • Careful integration strategies beyond simple concatenation are needed for hybrid GNN-LLM systems.

Original post by Zhongyuan Wang, Pratyusha Vemuri

"arXiv:2606.17579v1 Announce Type: new Abstract: Adding LLM-generated node features to graph neural networks (GNNs) is widely reported to improve accuracy on standard benchmarks. We document a contrasting observation: when LLM features are introduced through pure input concatenati…"

View on X

Originally posted by Zhongyuan Wang, Pratyusha Vemuri on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses