Meta-RL Enhances Spacecraft Safety and Fuel Efficiency in Adversarial Scenarios

Alejandro Posadas-Nava, Richard Linares, Minduli Wijayatunga· June 17, 2026 View original

Summary

This paper investigates memory-efficient meta-reinforcement learning for adaptive safety-critical control in spacecraft proximity operations, especially under adversarial conditions. It evaluates various recurrent network architectures and training algorithms, finding that Mamba with PPO achieves superior task completion, safety, and fuel savings.

Autonomous spacecraft rendezvous and proximity operations (RPO) demand controllers that can guarantee safety while adhering to thrust constraints and minimizing fuel consumption. Input-constrained control barrier functions (ICCBFs) offer a method for ensuring safety in nonlinear systems with actuation limits. Previous work demonstrated that meta-reinforcement learning (meta-RL) could robustly learn ICCBF class-K functions for RPO. This research extends that framework by comparing the performance of three recurrent network architectures—Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU), and Selective State Space Model (Mamba)—and two training algorithms, Proximal Policy Optimization (PPO) and Soft Actor Critic (SAC). The goal was to identify the optimal configuration for tuning ICCBF class-K functions via meta-RL. Performance was assessed in both cooperative and adversarial scenarios, where a target spacecraft might act to compromise the chaser spacecraft's safety. The results indicate that state space models like Mamba, when combined with PPO, deliver superior task completion, safety guarantees, and fuel efficiency across all tested cooperative and uncooperative situations.

Why it matters

For aerospace engineers and mission planners, this advancement offers a more reliable and efficient method for controlling spacecraft in complex and potentially hostile environments, crucial for missions involving docking, servicing, or debris removal.

How to implement this in your domain

  1. 1Adopt Mamba-based recurrent networks with PPO for developing safety-critical controllers in autonomous systems.
  2. 2Apply the meta-RL framework to design adaptive control systems for spacecraft rendezvous and proximity operations.
  3. 3Benchmark existing control algorithms against this new meta-RL approach in simulated adversarial environments.
  4. 4Explore the use of ICCBFs in other safety-critical robotic applications beyond space.

Who benefits

AerospaceDefenseRoboticsAutonomous SystemsSpace Exploration

Key takeaways

  • Meta-RL can create robust, adaptive safety-critical controllers for spacecraft.
  • Mamba state space models combined with PPO excel in adversarial RPO scenarios.
  • The framework improves task completion, safety, and fuel efficiency.
  • ICCBFs are effective for ensuring safety under input constraints.

Original post by Alejandro Posadas-Nava, Richard Linares, Minduli Wijayatunga

"arXiv:2606.17414v1 Announce Type: new Abstract: Autonomous spacecraft rendezvous and proximity operations (RPO) require controllers that guarantee safety under thrust constraints while minimizing fuel expenditure. Input-constrained control barrier functions (ICCBFs) provide a con…"

View on X

Originally posted by Alejandro Posadas-Nava, Richard Linares, Minduli Wijayatunga on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses