New Geometry Quantifies Non-Stationarity Cost in Adversarial MDPs

Kai Hidajat· June 30, 2026 View original

Summary

This paper introduces a normal-fan geometry to analyze non-stationary adversarial Markov Decision Processes (MDPs), distinguishing between consequential and harmless changes in loss. It defines a "face-crossing price" to quantify the minimum regret incurred when the optimal policy shifts due to non-stationarity, allowing dynamic regret to be decomposed into intrinsic priced motion and within-face selection error.

In dynamic decision-making problems, traditional analyses often equate the cost of non-stationarity with the magnitude of change in the loss function. However, a large change in loss might not alter the optimal policy, while a small change could necessitate a complete policy shift. This research proposes a more nuanced understanding. The paper develops a normal-fan geometry for finite-horizon adversarial Markov Decision Processes (MDPs) with fixed transitions. This framework views occupancy measures as a polytope, where each loss vector exposes an optimal face. Non-stationarity is then conceptualized as a path through this normal fan, where crossing a "wall" between cones signifies a change in the optimal policy and incurs regret. A key concept introduced is the "face-crossing price," which quantifies the minimum regret from maintaining a previous optimal policy under a new loss. This allows for an exact decomposition of dynamic regret into the intrinsic cost of policy shifts and within-face selection errors, effectively separating consequential non-stationarity from harmless variations.

Why it matters

For professionals designing adaptive systems in dynamic environments, this research provides a more precise theoretical framework to understand and quantify the true cost of environmental changes, enabling the development of more robust and efficient adaptive algorithms.

How to implement this in your domain

  1. 1Apply the normal-fan geometry concept to analyze the stability of optimal policies in dynamic control systems.
  2. 2Develop adaptive algorithms that explicitly account for the "face-crossing price" when responding to environmental changes.
  3. 3Use the decomposition of dynamic regret to diagnose the sources of performance degradation in non-stationary settings.
  4. 4Inform the design of robust reinforcement learning agents operating in adversarial or rapidly changing environments.

Who benefits

RoboticsAutonomous SystemsFinance (algorithmic trading)LogisticsCybersecurity

Key takeaways

  • A normal-fan geometry analyzes non-stationarity in adversarial MDPs.
  • It distinguishes between consequential and harmless changes in loss.
  • The "face-crossing price" quantifies regret from optimal policy shifts.
  • Dynamic regret can be decomposed into priced motion and within-face selection error.

Original post by Kai Hidajat

"arXiv:2606.29092v1 Announce Type: new Abstract: In a changing decision problem, standard dynamic-regret analyses have often equated the cost of non-stationarity to how far loss moves. However, it is simultaneously possible for a loss sequence to travel far and retain the same opt…"

View on X

Originally posted by Kai Hidajat on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses