CBD Enables API-Only Black-Box Unlearning for LLMs.

Zhiqiang Xie, Yijing Lin, Zhipeng Gao, Dong In Kim· June 29, 2026 View original

Summary

Researchers introduce Controlled Behavioral Divergence (CBD), an API-only black-box unlearning framework for LLMs that removes the influence of undesired data without model parameter access. CBD achieves a better unlearning-utility trade-off, especially when target and retained data share similar patterns, by routing unlearning-related prompts away from the LLM.

As Large Language Models (LLMs) are increasingly accessed via API services, the need to remove the influence of sensitive, copyrighted, or harmful data without retraining the entire model becomes critical. Traditional machine unlearning methods often require access to model parameters or internal logits, which is not feasible for API-only black-box LLMs. Furthermore, existing techniques struggle when the data to be forgotten shares strong semantic or structural similarities with data that should be retained. To address these challenges, a new framework called Controlled Behavioral Divergence (CBD) has been proposed. CBD operates entirely through API access, meaning it doesn't require internal model details. It employs two auxiliary models to create a controlled divergence in behavior between retained inputs and the inputs associated with the data to be unlearned. This divergence is then converted into an "unlearning relevance score." Based on this score, CBD intelligently routes prompts related to the unlearning target away from the main LLM. To enhance discrimination, especially when data is highly similar, CBD constructs a discriminative basis using gradient statistics, guiding the unlearning signal specifically towards target information rather than shared patterns. Experiments show CBD outperforms eleven other unlearning baselines, achieving a superior balance between forgetting undesired data and preserving overall model utility, even approaching retrained reference performance on forget sets while maintaining high utility on retained data.

Why it matters

Professionals managing LLM deployments can now implement effective data unlearning strategies for API-only models, crucial for compliance, data privacy, and mitigating risks from harmful or outdated information.

How to implement this in your domain

  1. 1Assess current LLM governance policies for handling sensitive or undesirable data and the need for unlearning capabilities.
  2. 2Investigate the CBD framework for potential integration into existing API-based LLM service architectures.
  3. 3Develop auxiliary models and a routing mechanism to implement controlled behavioral divergence for unlearning.
  4. 4Benchmark CBD's performance against existing unlearning methods to ensure compliance and utility preservation.

Who benefits

Cloud ServicesAI DevelopmentLegal/ComplianceData Privacy

Key takeaways

  • Machine unlearning for API-only LLMs is challenging due to black-box access and data similarity.
  • Controlled Behavioral Divergence (CBD) offers an API-only framework for black-box unlearning.
  • CBD uses auxiliary models and behavioral divergence to route unlearning-related prompts.
  • The method achieves a strong unlearning-utility trade-off, outperforming other baselines.

Original post by Zhiqiang Xie, Yijing Lin, Zhipeng Gao, Dong In Kim

"arXiv:2606.27683v1 Announce Type: cross Abstract: Edge devices increasingly invoke large language models (LLMs) through API services for context aware edge intelligence, while edge generated data may be collected to improve LLMs and may introduce sensitive, copyrighted, harmful,…"

View on X

Originally posted by Zhiqiang Xie, Yijing Lin, Zhipeng Gao, Dong In Kim on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses