PerturbCellRL Enhances Single-Cell Perturbation Prediction with RL and Verifiers.

Dongxia Wu, Mingyu Li, Yuhui Zhang, Anurendra Kumar, Emma Lundberg, Serena Yeung-Levy, Emily B. Fox· June 29, 2026 View original

Summary

This research introduces PerturbCellRL, a reinforcement learning framework that post-trains single-cell transcriptomic generators using biological verifiers as rewards to ensure consistency. It improves individual cell predictions for genetic and chemical interventions, moving beyond population-level accuracy to biologically consistent single-cell effects.

Predicting how individual cells respond to genetic or chemical interventions is crucial for drug discovery and biological research, potentially reducing the need for expensive lab screenings. While current generative models can predict population-level responses, they often lack explicit checks for biological consistency at the individual cell level. A new framework, PerturbCellRL, addresses this by employing reinforcement learning (RL) to post-train pre-existing single-cell transcriptomic generators. This framework uses a suite of "cell-level verifiers" as reward signals during RL training. These verifiers assess various biological aspects, including gene similarity, expression proximity, differential expression, and pathway activity, ensuring that generated cells align with known biological responses to perturbations. Evaluations across multiple benchmarks for genetic and chemical perturbations demonstrate that PerturbCellRL significantly improves reward-aligned metrics and a held-out evaluation metric compared to its pre-trained base generator. It also maintains competitive performance on population-level metrics against state-of-the-art methods. This approach marks a shift towards more trustworthy single-cell predictions by explicitly verifying biological consistency at the individual cell level.

Why it matters

Professionals in biotech and pharmaceuticals can leverage this to develop more accurate and biologically consistent single-cell perturbation models, accelerating drug discovery and reducing experimental costs.

How to implement this in your domain

  1. 1Assess current in-silico drug screening or cell perturbation prediction pipelines for biological consistency at the single-cell level.
  2. 2Explore integrating reinforcement learning frameworks like PerturbCellRL to post-train existing generative models.
  3. 3Develop or adapt biological verifiers as reward functions to guide model training towards desired biological outcomes.
  4. 4Benchmark PerturbCellRL's performance against current methods on specific drug targets or genetic interventions.

Who benefits

BiotechnologyPharmaceuticalsHealthcareLife Sciences

Key takeaways

  • PerturbCellRL uses RL and biological verifiers to improve single-cell perturbation predictions.
  • It ensures individual generated cells are biologically consistent, not just population-accurate.
  • The framework employs multiple verifiers for Pearson similarity, RMSE, DE Spearman, and Pathway activity.
  • PerturbCellRL outperforms base generators and remains competitive with state-of-the-art methods.

Original post by Dongxia Wu, Mingyu Li, Yuhui Zhang, Anurendra Kumar, Emma Lundberg, Serena Yeung-Levy, Emily B. Fox

"arXiv:2606.27752v1 Announce Type: new Abstract: Single-cell perturbation models can reduce costly wet-lab screening by predicting how cells respond transcriptionally to interventions. While recent generative models improve population-level prediction, individual generated cells a…"

View on X

Originally posted by Dongxia Wu, Mingyu Li, Yuhui Zhang, Anurendra Kumar, Emma Lundberg, Serena Yeung-Levy, Emily B. Fox on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses