Distribution-Aware AI Model Predicts Concurrent Go Program Behavior.

Kaviru Hapuarachchi· June 17, 2026 View original

Summary

This research introduces a method for training a 7B language model to predict the next events in concurrent Go programs by matching empirical distributions of possible outcomes, rather than a single label. This approach improves accuracy and calibration compared to traditional methods, especially for nondeterministic scheduler behavior.

Predicting the next step in a concurrent program is inherently difficult due to the nondeterministic nature of schedulers, meaning the same program prefix can lead to multiple valid subsequent events. Training a model with a single label for such a process often results in the model merely guessing one of many possible outcomes. Researchers have turned this challenge into a training signal by running concurrent Go programs multiple times and aggregating the observed next events into an empirical distribution. A 7B language model is then fine-tuned using a Kullback-Leibler (KL) objective to match this distribution, effectively learning the probabilities of different valid next steps. Evaluated on 798 held-out predictions from real-world Go bugs (e.g., CockroachDB, Kubernetes), this distribution-aware fine-tuning achieved 36.2% accuracy, outperforming Gemini 3.5 Flash zero-shot (34.8%) and the unfine-tuned model (28.6%). While accuracy was similar to cross-entropy training, this method significantly reduced Expected Calibration Error from 0.205 to 0.169. The work also formally identifies a goroutine-leak signature. All datasets, trained adapters, and tooling are open-sourced.

Why it matters

This breakthrough offers a more robust and accurate way to understand and predict the behavior of complex concurrent systems, which is critical for debugging, testing, and ensuring the reliability of high-performance software. Software engineers and developers working with concurrent programming can leverage this for improved code quality and system stability.

How to implement this in your domain

  1. 1Adopt distribution-aware training techniques for models predicting behavior in other nondeterministic systems.
  2. 2Utilize the released dataset and tooling to analyze and debug concurrent Go programs more effectively.
  3. 3Integrate formal goroutine-leak signatures into static analysis tools for Go codebases.
  4. 4Explore fine-tuning large language models with empirical distributions for complex system modeling tasks.
  5. 5Apply the concept of reducing Expected Calibration Error to improve the trustworthiness of AI predictions in critical software systems.

Who benefits

Software DevelopmentCloud ComputingCybersecurityQuality AssuranceDevOps

Key takeaways

  • A distribution-aware training method improves next-step prediction in concurrent Go programs.
  • It addresses nondeterminism by matching empirical distributions of outcomes.
  • The approach significantly reduces Expected Calibration Error, improving model calibration.
  • The dataset, trained adapters, and tooling are open-sourced for broader use.

Original post by Kaviru Hapuarachchi

"arXiv:2606.17508v1 Announce Type: new Abstract: Training a model to predict the next step in a concurrent program is harder than it looks: two runs of the same program from the same trace prefix can produce different next events, both valid, because the scheduler is nondeterminis…"

View on X

Originally posted by Kaviru Hapuarachchi on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses