Hard-Routed MoR-LoRA Composes Frozen Reasoning Experts Effic

Hard-Routed MoR-LoRA Composes Frozen Reasoning Experts Efficiently.

Seyed Alireza Molavi, Zhan Su, Yan Hu, Peyman Sheikholharam Mashhadi, Stefan Byttner, Prayag Tiwari· July 1, 2026 View original

Summary

Hard-Routed MoR-LoRA is a two-stage framework that efficiently composes independently trained LoRA adapters (experts) into a single LLM using hard selection, preserving expert behavior with fewer trainable parameters than soft-routing methods. This is particularly useful for multi-domain adaptation when original training data is unavailable.

Composing independently trained LoRA (Low-Rank Adaptation) adapters into a single large language model (LLM) is a valuable technique for adapting models to multiple domains, especially when the original training data cannot be shared. A common approach involves Mixture-of-Experts (MoE) style routing, but soft weighted combinations can alter the unit-scale additive update under which each LoRA module was originally trained. This research introduces Hard-Routed MoR-LoRA, a two-stage framework designed for composing frozen reasoning LoRA experts through unit-scale hard selection. In the first stage, domain-specific LoRA adapters are independently trained using reinforcement learning from verifiable feedback to create specialized reasoning experts. In the second stage, all experts are frozen, and their reasoning traces are distilled. Only a lightweight shared router and a small attention LoRA are then trained for integration. The router employs hard top-1 routing, selecting exactly one expert per token, with a straight-through estimator enabling gradient-based training. Experiments across various benchmarks and model scales demonstrate that Hard-Routed MoR-LoRA effectively preserves expert behavior while requiring significantly fewer trainable parameters compared to soft-routing mixture baselines. Analysis also suggests that soft mixtures often concentrate routing mass on a single expert, reinforcing the efficiency of hard unit-scale routing for frozen LoRA expert composition.

Why it matters

AI engineers and researchers can leverage this framework to efficiently combine specialized LLM capabilities for multi-domain applications without retraining entire models, leading to more adaptable and resource-efficient AI systems.

How to implement this in your domain

1Train domain-specific LoRA adapters independently for different reasoning tasks.
2Freeze the trained LoRA experts to preserve their specialized behaviors.
3Implement a lightweight router that uses hard top-1 selection to choose one expert per token.
4Distill reasoning traces from experts to inform router training.
5Evaluate the composed model on multi-domain benchmarks, comparing against soft-routing methods.

Who benefits

AI EngineeringSoftware DevelopmentNatural Language ProcessingResearch & DevelopmentEdTech

Key takeaways

Hard-Routed MoR-LoRA efficiently combines specialized LoRA experts.
It uses hard selection to preserve expert behavior with fewer parameters.
The framework is beneficial for multi-domain adaptation without shared data.
It outperforms soft-routing baselines in parameter efficiency and expert preservation.

Original post by Seyed Alireza Molavi, Zhan Su, Yan Hu, Peyman Sheikholharam Mashhadi, Stefan Byttner, Prayag Tiwari

"arXiv:2606.31413v1 Announce Type: new Abstract: Composing independently trained LoRA adapters into a single large language model is useful for multi-domain adaptation, especially when the original training data cannot be shared. A common approach is to use MoE-style routing over…"

View on X

Originally posted by Seyed Alireza Molavi, Zhan Su, Yan Hu, Peyman Sheikholharam Mashhadi, Stefan Byttner, Prayag Tiwari on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

Hard-Routed MoR-LoRA Composes Frozen Reasoning Experts Efficiently.

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Engineering & DevTools

Philosophical Foundations for Explainable AI in Healthcare Explored

New Metric Improves LLM Reinforcement Learning with Verifiable Rewards.

New ACE Module Boosts LLM Agent Context Management