Diffusion Language Models Enhance Radiology Report Drafting

Max Van Puyvelde, Halil Ibrahim Gulluk, Wim Van Criekinge, Olivier Gevaert· July 3, 2026 View original

Summary

Diffusion language models, specifically DiffusionGemma-26B, are shown to match or exceed autoregressive models in medical visual question answering and offer unique bidirectional infill capabilities. This allows radiologists to interactively fix report fragments and have the model fill in text, significantly improving drafting efficiency and consistency.

While most medical foundation models rely on autoregressive (AR) text generation, this research explores the potential of discrete diffusion language models. The study adapts DiffusionGemma-26B, a mixture-of-experts diffusion model, and benchmarks it against its AR counterpart, Gemma-4-26B, using an identical LoRA fine-tuning recipe on medical visual question answering datasets. An LLM judge, robust to verbosity, was used for scoring. The findings indicate that the diffusion model performs comparably to or even surpasses AR models across all evaluated datasets. Furthermore, the fine-tuned diffusion model (with 3.8B active parameters) proves competitive with frontier vision-language models and offers significantly faster decoding, at 3.5-4.4 times the speed. Beyond performance parity, the key advantage of the diffusion model lies in its inherent ability for any-order infill. Unlike AR models, which generate text left-to-right, diffusion models denoise a token canvas bidirectionally. This allows radiologists to correct specific report fragments and have the model intelligently fill in the missing text between them, a capability crucial for improving the efficiency and consistency of real-world radiology report drafting, which often involves terse or inconsistent initial inputs.

Why it matters

For healthcare professionals, particularly radiologists, this technology offers a powerful new tool to streamline report generation, improve accuracy, and enhance consistency, ultimately leading to more efficient workflows and better patient care.

How to implement this in your domain

  1. 1Pilot diffusion language models for interactive drafting in radiology departments to assess efficiency gains.
  2. 2Collaborate with AI developers to integrate bidirectional infill capabilities into existing medical reporting systems.
  3. 3Train radiologists on new interactive drafting workflows that leverage the unique features of diffusion models.
  4. 4Evaluate the impact of diffusion models on report consistency and accuracy through clinical studies.

Who benefits

HealthcareMedical ImagingAI/ML EngineeringSoftware Development

Key takeaways

  • Diffusion language models can match or exceed autoregressive models in medical text generation.
  • They offer significantly faster decoding speeds compared to autoregressive models.
  • The unique bidirectional infill capability allows for interactive, any-order text editing, ideal for report drafting.
  • This technology has the potential to greatly improve efficiency and consistency in radiology report generation.

Original post by Max Van Puyvelde, Halil Ibrahim Gulluk, Wim Van Criekinge, Olivier Gevaert

"arXiv:2607.01436v1 Announce Type: new Abstract: Diffusion language models, which generate text by denoising a token canvas bidirectionally instead of emitting tokens left to right, have become competitive with autoregressive (AR) generation. Medical foundation models, however, re…"

View on X

Originally posted by Max Van Puyvelde, Halil Ibrahim Gulluk, Wim Van Criekinge, Olivier Gevaert on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses