PROPEL Accelerates AI Task Generator Training by Predicting Solver Success

Lorenz Wolf, Connor Watts, Roger Creus Castanyer, Geoffrey Bradway, Maxwill Lin, Augustine N. Mavor-Parker, Matthew Daborn-Sargent· June 18, 2026 View original

Summary

This paper introduces PROPEL, a framework designed to train task generators for reinforcement learning agents more efficiently by predicting task solvability. It addresses the bottleneck of needing repeated, time-consuming solver rollouts during generator training, especially for complex tasks like software engineering.

Training AI agents through reinforcement learning often hits a bottleneck: the scarcity of suitable tasks that are challenging yet solvable for the current model. Traditional synthetic task generation frequently produces tasks that are too easy, impossible, or poorly defined, hindering effective model improvement. Directly optimizing task generators with reinforcement learning is computationally intensive, as it requires numerous solver rollouts for each candidate task, which can take tens of minutes for complex tasks like software engineering. This makes solver-in-the-loop training impractical. PROPEL tackles this by training a lightweight "activation probe" on a pre-labeled dataset of generated tasks and their solver outcomes. This probe then acts as a proxy for the actual solver pass rate during generator optimization, significantly reducing evaluation time to a single forward pass. Experiments show PROPEL effectively shifts task generation towards the targeted solve rate across various domains, including math, code, and software engineering, substantially increasing the proportion of learnable tasks.

Why it matters

This research offers a significant advancement for training more capable AI agents, particularly in domains where task generation and evaluation are costly. It enables faster iteration and improvement of models by providing a more efficient way to generate optimally challenging training data.

How to implement this in your domain

  1. 1Adopt PROPEL to accelerate the development of AI agents in complex environments like robotics or game AI.
  2. 2Integrate the activation probe concept into custom task generation pipelines for large language models.
  3. 3Apply this framework to improve the efficiency of training models for automated software development or bug fixing.
  4. 4Explore using similar proxy models to optimize data generation in other machine learning applications.

Who benefits

Software DevelopmentAI ResearchRoboticsGamingEducation (AI Tutors)

Key takeaways

  • The "solver bottleneck" limits efficient training of RL agents due to costly task evaluation.
  • PROPEL introduces a solver-amortized framework to train task generators more efficiently.
  • A lightweight activation probe predicts task solve rates, replacing expensive solver rollouts.
  • PROPEL significantly increases the generation of learnable tasks across various domains.

Original post by Lorenz Wolf, Connor Watts, Roger Creus Castanyer, Geoffrey Bradway, Maxwill Lin, Augustine N. Mavor-Parker, Matthew Daborn-Sargent

"arXiv:2606.18284v1 Announce Type: cross Abstract: The limiting resource for training agents via reinforcement learning (RL) is increasingly frontier task supply: valid, solvable tasks just difficult enough to train the current model. As reasoning and agentic models improve, fixed…"

View on X

Originally posted by Lorenz Wolf, Connor Watts, Roger Creus Castanyer, Geoffrey Bradway, Maxwill Lin, Augustine N. Mavor-Parker, Matthew Daborn-Sargent on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses