Calibrating LLM Evaluators Mitigates Preference Coupling in Feedback Loops

Zewen Liu· July 1, 2026 View original

Summary

This study shows that applying probability calibration to an LLM evaluator's pairwise judgments significantly reduces "evaluator preference coupling," a phenomenon where evaluator biases propagate into an agent's learned strategy. Calibration reduced the coupling coefficient by 20-49% and Jensen-Shannon divergence by 45-67% in experiments.

This research investigates a critical issue in LLM agent development: "evaluator preference coupling," where systematic biases in an LLM evaluator's feedback propagate into the agent's learned behavior. Previous work identified this problem and provided a diagnostic framework, but this study is the first to explore mitigation strategies through evaluator calibration.The authors propose applying probability calibration to the evaluator's pairwise judgments to reduce the propagation of spurious preferences. In a controlled experiment, they compared standard binary TTRL (win/loss) with a confidence-calibrated TTRL (using probability-weighted updates). DeepSeek-V4-Pro served as the executor agent, and GLM5.2 as the evaluator.The results demonstrated a significant reduction in evaluator preference coupling. Calibration decreased the coupling coefficient gamma by 20-49% and the Jensen-Shannon divergence by 45-67%. A symmetric-LR control confirmed that these effects were genuinely due to calibration and not merely reduced update asymmetry. The study releases the calibrated TTRL protocol, recommending it as a lightweight mitigation for LLM-as-judge deployment pipelines.

Why it matters

Professionals building LLM agents that learn from feedback can use calibration techniques to ensure their agents learn from more objective and less biased evaluations, leading to more robust and fair AI systems.

How to implement this in your domain

  1. 1Identify LLM agent feedback loops where an LLM acts as an evaluator.
  2. 2Implement the diagnostic framework (EPC) to measure evaluator preference coupling in existing systems.
  3. 3Apply probability calibration techniques to the LLM evaluator's pairwise judgments.
  4. 4Integrate the calibrated TTRL protocol for probability-weighted updates in agent learning.
  5. 5Monitor the coupling coefficient and Jensen-Shannon divergence to quantify the reduction in bias.

Who benefits

AI/ML DevelopmentSoftware DevelopmentCustomer ServiceContent ModerationGaming

Key takeaways

  • LLM evaluators can introduce biases (preference coupling) into agent learning.
  • Probability calibration of evaluator judgments can significantly mitigate this coupling.
  • Calibrated TTRL reduces coupling coefficients and Jensen-Shannon divergence.
  • This offers a lightweight and practical mitigation for LLM-as-judge pipelines.

Original post by Zewen Liu

"arXiv:2606.31371v1 Announce Type: new Abstract: When large language model (LLM) agents adapt their behavior through evaluator feedback, systematic evaluator biases propagate into the agent's learned strategy distribution - a phenomenon termed evaluator preference coupling. Prior…"

View on X

Originally posted by Zewen Liu on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses