DLR Boosts Low-Rank LLM Pre-Training with Zero Inference Cost

Dong Wang, Wenwu Tang, Yun Cheng, Olga Saukh· June 30, 2026 View original

Summary

This paper introduces Duplicated Latent Residual (DLR), a training-only, parameter-free plug-in that enhances low-rank pre-training of large language models (LLMs). DLR augments low-rank outputs with a fixed structured residual that is absorbed into the up-projection after training, resulting in zero additional parameters, FLOPs, or memory during deployment while improving perplexity, especially for larger models.

Pre-training large language models (LLMs) at scale is exceptionally resource-intensive. Low-rank pre-training offers a promising solution by factorizing weight matrices to reduce parameters and computational operations, though it often compromises model quality compared to full-rank training. This research proposes Duplicated Latent Residual (DLR), an innovative plug-in designed to improve low-rank pre-training without incurring any additional cost during inference. DLR works by augmenting the standard low-rank output with a fixed, structured residual during training. Crucially, after training, this residual is mathematically absorbed into the up-projection layer, meaning DLR adds zero learnable parameters, FLOPs, or memory overhead during deployment. Experiments across LLaMA models (60M to 7B parameters) show DLR consistently strengthens low-rank pre-training, particularly for models 130M and above, and the folded checkpoints transfer effectively to supervised fine-tuning.

Why it matters

For AI engineers and product developers, DLR offers a significant advancement in making large language models more accessible and efficient to pre-train, potentially lowering costs and accelerating the development of high-quality, smaller models without sacrificing inference performance.

How to implement this in your domain

  1. 1Evaluate DLR for pre-training custom low-rank LLMs to reduce computational costs.
  2. 2Integrate DLR into existing low-rank model training pipelines to improve quality.
  3. 3Benchmark DLR-enhanced low-rank models against full-rank counterparts for performance and efficiency.
  4. 4Consider DLR when developing LLMs for edge devices or resource-constrained environments.

Who benefits

AI/ML DevelopmentCloud ComputingEdge AISoftware DevelopmentHigh-Tech

Key takeaways

  • DLR enhances low-rank LLM pre-training without adding inference cost.
  • It uses a training-only, parameter-free structured residual.
  • The residual is absorbed post-training, maintaining low-rank deployment efficiency.
  • DLR improves perplexity, especially for larger LLaMA models (130M+).

Original post by Dong Wang, Wenwu Tang, Yun Cheng, Olga Saukh

"arXiv:2606.28932v1 Announce Type: new Abstract: Large language models have driven recent progress in language and multimodal AI, yet pre-training them at scale is prohibitively expensive. Low-rank pre-training, which factorizes each weight matrix into a rank-r product to reduce b…"

View on X

Originally posted by Dong Wang, Wenwu Tang, Yun Cheng, Olga Saukh on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses