EVOM Automates Actor-Critic Architecture Design for Reinforcement Learning

Boyun Zhang, Chao Wang, Kai Wu· June 26, 2026 View original

Summary

EVOM is an agentic meta-evolution framework that automates the discovery of high-performance actor-critic architectures for reinforcement learning. It uses a bi-level optimization approach where an inner loop trains weights and an outer loop, powered by an LLM-based design agent, iteratively refines architecture programs.

The EVOM framework introduces an innovative approach to automating the design of neural network architectures for actor-critic reinforcement learning (RL). Traditionally, these architectures are crafted manually, a process that is both time-consuming and challenging due to the vast design space and the need to train each candidate for evaluation. EVOM tackles this by framing architecture search as a bi-level optimization problem. The framework operates with an inner loop that handles the training of network weights using a low-fidelity Proximal Policy Optimization (PPO) algorithm. Crucially, an outer loop drives the meta-evolution of architectures. This outer loop is powered by a Large Language Model (LLM) acting as a design agent, which is entirely decoupled from the policy execution and environment control. This separation allows the LLM to focus purely on architectural innovation. Experimental results demonstrate that EVOM surpasses manually designed baselines, LLM-guided random search, and other state-of-the-art LLM-guided programmatic policy search methods. It achieved superior performance on challenging benchmarks like Ant-v4 and HalfCheetah-v4. Ablation studies confirmed that both the meta-evolutionary loop and the LLM Design Agent are essential components contributing to the framework's overall effectiveness.

Why it matters

Automating the design of RL architectures can significantly accelerate the development and deployment of advanced AI agents, reducing the need for expert manual tuning and potentially discovering more optimal or novel designs. This is critical for industries leveraging RL in complex, dynamic environments.

How to implement this in your domain

  1. 1Investigate integrating LLM-powered design agents into your automated machine learning (AutoML) pipelines.
  2. 2Explore bi-level optimization strategies for complex design problems beyond RL architectures.
  3. 3Consider using low-fidelity training methods in an inner loop to speed up the evaluation of candidate designs.
  4. 4Apply agentic meta-evolution concepts to other areas of AI model design where manual architecture search is a bottleneck.

Who benefits

RoboticsAutonomous SystemsGamingLogisticsAI Research & Development

Key takeaways

  • EVOM automates actor-critic architecture design for RL using a bi-level optimization framework.
  • An LLM-based design agent drives the meta-evolution of architectures, decoupled from policy execution.
  • The framework outperforms manual designs and other LLM-guided search methods on RL benchmarks.
  • Automated architecture search can accelerate AI development and discover novel, high-performing designs.

Original post by Boyun Zhang, Chao Wang, Kai Wu

"arXiv:2606.26327v1 Announce Type: new Abstract: In actor-critic reinforcement learning, network architectures are typically manually designed. Automating this design is challenging because each candidate must be trained before evaluation, and the design space is open-ended. To ad…"

View on X

Originally posted by Boyun Zhang, Chao Wang, Kai Wu on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses