AETDICE Unifies Nonlinear Multi-Objective Reinforcement Learning

Woosung Kim, Youngjun Suh, Jinho Lee, Jongmin Lee, Byung-Jun Lee· July 1, 2026 View original

▶ The 2-minute explainer

Summary

A new Aggregation-Expectation-Transformation (AET) framework unifies nonlinear multi-objective reinforcement learning (MORL) objectives, bridging the gap between Scalarized Expected Return (SER) and Expected Scalarized Return (ESR). AETDICE, an offline RL algorithm, enables tractable optimization for these objectives.

Optimizing systems with multiple, often conflicting, objectives and nonlinear preferences is a significant challenge in multi-objective reinforcement learning (MORL). Historically, nonlinear MORL has been divided into two distinct paradigms: Scalarized Expected Return (SER) and Expected Scalarized Return (ESR), each with its own optimization complexities. This paper introduces the Aggregation-Expectation-Transformation (AET) framework, which successfully unifies these disparate criteria. The AET framework provides a principled foundation for general nonlinear MORL by decomposing scalarization into three parts. Building upon this, the researchers propose AETDICE, a tractable offline RL algorithm specifically designed for AET objectives. AETDICE leverages DICE-style density-ratio estimation within an augmented state space, enabling sample-based optimization from static datasets. This unified approach resolves long-standing barriers, allowing for more sophisticated and human-aligned AI decision-making that can capture complex trade-offs like risk aversion or fairness.

Why it matters

This framework offers a powerful way for professionals to design AI systems that can optimize complex, nonlinear trade-offs in real-world scenarios, leading to more nuanced and effective decision-making.

How to implement this in your domain

  1. 1Apply the AETDICE algorithm to optimize complex real-world systems with multiple, conflicting objectives, such as resource allocation or autonomous control.
  2. 2Integrate the AET framework into existing multi-objective reinforcement learning research to unify and simplify objective definitions.
  3. 3Utilize AETDICE's offline RL capabilities to learn optimal policies from historical datasets without requiring live interaction.
  4. 4Explore how the AET framework can capture specific nonlinear preferences like risk aversion or fairness in your AI models.

Who benefits

RoboticsAutonomous SystemsFinanceLogisticsEnergy Management

Key takeaways

  • The AET framework unifies previously fragmented nonlinear MORL objectives.
  • AETDICE is an offline RL algorithm for tractable optimization of AET objectives.
  • It enables sample-based optimization from static datasets using density-ratio estimation.
  • The framework addresses complex trade-offs like risk aversion and fairness in AI decision-making.

Original post by Woosung Kim, Youngjun Suh, Jinho Lee, Jongmin Lee, Byung-Jun Lee

"arXiv:2606.31178v1 Announce Type: new Abstract: Optimizing nonlinear preferences in multi-objective reinforcement learning (MORL) is essential for capturing complex trade-offs like risk aversion or fairness. However, such non-linearity has historically bifurcated nonlinear MORL o…"

View on X

Originally posted by Woosung Kim, Youngjun Suh, Jinho Lee, Jongmin Lee, Byung-Jun Lee on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses