External Feedback Outperforms Self-Refinement for LLM Improv

External Feedback Outperforms Self-Refinement for LLM Improvement

Bart{\l}omiej Cupia{\l}, Jan {\L}ojek, Miko{\l}aj Garstecki, Szymon Pob{\l}ocki, Alicja Ziarko, Piotr Mi{\l}o\'s· July 1, 2026 View original

Summary

A study investigates the true impact of natural language feedback on LLM performance, finding that significant improvements beyond mere repeated attempts primarily stem from strong external teachers rather than self-generated feedback. The research emphasizes that a student model's ability to utilize feedback is a key bottleneck.

In multi-turn language agent settings, improvements in final accuracy are often observed, but it's frequently unclear whether these gains are due to genuinely useful feedback or simply other factors like resampling, format correction, or additional computation during testing. This research introduces a controlled student-teacher protocol to isolate and study when natural-language feedback truly drives improvement beyond what repeated attempts alone can achieve. The study evaluated thirteen open-weight models across various tasks, examining external feedback, self-feedback, and unguided self-refinement. The findings reveal that multi-turn improvement is often not strong evidence of effective feedback use; self-generated feedback, for instance, offered minimal gains beyond unguided self-refinement. Instead, the most substantial feedback-specific improvements were observed when strong external teachers provided guidance that went beyond generic retry instructions. The research also highlights that the student model's inherent ability to effectively utilize feedback is a more critical driver of interactive gains than the specific identity of the teacher, though teacher quality remains important. These results underscore the need to evaluate feedback-based agents against repeated-attempt baselines and suggest that enhancing an agent's capacity to act on feedback is a central bottleneck for interactive improvement.

Why it matters

Professionals designing and implementing LLM-based interactive systems need to understand that not all feedback is equally valuable, and investing in high-quality external feedback mechanisms and improving LLM's feedback assimilation capabilities is crucial for real performance gains.

How to implement this in your domain

1Design LLM evaluation metrics that differentiate between true feedback-driven improvement and gains from mere retries or format corrections.
2Prioritize developing robust external feedback mechanisms from human experts or highly capable 'teacher' models.
3Focus on training LLMs to better interpret and integrate external feedback into their reasoning processes.
4Implement A/B testing for different feedback strategies to identify what truly drives performance improvements in your applications.

Who benefits

AI DevelopmentCustomer ServiceEducationSoftware DevelopmentRobotics

Key takeaways

Multi-turn LLM improvement often isn't solely due to effective feedback.
Strong external teachers provide significantly more useful feedback than self-generated feedback.
A student LLM's ability to use feedback is a primary bottleneck for interactive improvement.
Feedback-based agents should be evaluated against repeated-attempt baselines.

Original post by Bart{\l}omiej Cupia{\l}, Jan {\L}ojek, Miko{\l}aj Garstecki, Szymon Pob{\l}ocki, Alicja Ziarko, Piotr Mi{\l}o\'s

"arXiv:2606.30774v1 Announce Type: new Abstract: We study when natural-language feedback produces improvement beyond the gains obtainable from repeated attempts alone. In multi-turn language agent setting, higher final accuracy can reflect useful feedback, but it can also arise fr…"

View on X

Originally posted by Bart{\l}omiej Cupia{\l}, Jan {\L}ojek, Miko{\l}aj Garstecki, Szymon Pob{\l}ocki, Alicja Ziarko, Piotr Mi{\l}o\'s on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

External Feedback Outperforms Self-Refinement for LLM Improvement

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Research

Philosophical Foundations for Explainable AI in Healthcare Explored

New Metric Improves LLM Reinforcement Learning with Verifiable Rewards.

New ACE Module Boosts LLM Agent Context Management