New RL Algorithm Solves Continuous-Time Optimal Stopping Problems

Cosmin Borsa, Michael Ludkovski· June 17, 2026 View original

Summary

A novel reinforcement learning algorithm, CARLOS, enables continuous-time optimal stopping decisions, overcoming limitations of traditional discrete-time methods. It uses a deep neural network and adaptive sampling to learn precise exercise rules, delivering higher prices and computational efficiency for financial options.

This research introduces CARLOS (Continuous-time Adaptive Reinforcement Learning for Optimal Stopping), a new algorithm designed to solve optimal stopping problems with continuous time resolution. Traditional simulation-based methods often rely on discretizing the stopping decision, which can lead to inaccuracies or computational inefficiencies. CARLOS addresses these limitations by employing an aggregate deep neural network (ADNN) to learn a joint space-time decision boundary. The algorithm progressively refines its timing-value estimates by starting with a coarse time grid and gradually increasing the frequency of stopping opportunities. An adaptive sampling strategy further concentrates training efforts near the optimal stopping boundary. Benchmarking results demonstrate that CARLOS outperforms existing Bermudan option solvers, achieving prices closer to the theoretical American upper bound. Furthermore, it exhibits high computational efficiency compared to non-RL alternatives, making it a powerful tool for financial modeling and other applications requiring precise optimal stopping decisions.

Why it matters

For professionals in finance and other fields dealing with optimal stopping problems (e.g., option pricing, project management), CARLOS offers a more accurate and efficient method to determine optimal exercise rules, potentially leading to better decision-making and increased profitability.

How to implement this in your domain

  1. 1Evaluate CARLOS for pricing American or Bermudan options in your financial models.
  2. 2Explore applying continuous-time optimal stopping to real estate investment decisions.
  3. 3Integrate deep reinforcement learning techniques into your quantitative finance workflows.
  4. 4Develop adaptive sampling strategies for training neural networks in time-sensitive applications.
  5. 5Benchmark the performance of your current optimal stopping solvers against this new RL-based approach.

Who benefits

BFSIQuantitative FinanceInvestment ManagementEnergy TradingProject Management

Key takeaways

  • CARLOS offers a continuous-time solution for optimal stopping problems.
  • It uses deep reinforcement learning and adaptive sampling for precision.
  • The algorithm outperforms traditional discrete-time Bermudan solvers.
  • It provides higher accuracy and computational efficiency for financial applications.

Original post by Cosmin Borsa, Michael Ludkovski

"arXiv:2606.17545v1 Announce Type: new Abstract: Simulation based solvers for optimal stopping problems must discretize the stopping decision. Under classical dynamic programming, a coarse exercise grid with only a few stopping opportunities can materially undervalue the optimal e…"

View on X

Originally posted by Cosmin Borsa, Michael Ludkovski on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses