New Framework Enables API-Only Black-Box LLM Unlearning

Zhiqiang Xie, Yijing Lin, Zhipeng Gao, Dong In Kim· June 29, 2026 View original

Summary

Researchers developed Controlled Behavioral Divergence (CBD), an API-only framework for unlearning specific data from black-box LLMs without retraining. CBD uses auxiliary models to create behavioral divergence, routing unlearning-related prompts away from the target LLM while preserving retained utility, even with highly similar data.

As large language models (LLMs) are increasingly accessed via API services, the need to remove sensitive, copyrighted, or outdated information from their behavior without full retraining becomes critical. Existing machine unlearning methods often require access to the model's internal parameters or logits, which is not feasible for black-box API-only LLMs. Furthermore, these methods struggle to preserve the model's general utility when the data to be forgotten is semantically very similar to the data that should be retained. This paper introduces Controlled Behavioral Divergence (CBD), a novel framework designed to address these challenges. CBD operates entirely through API access, leveraging two auxiliary models to generate a controlled divergence in behavior between retained and unlearning-target inputs. This divergence is then quantified into an unlearning relevance score, which guides the routing of specific prompts away from the target LLM. To enhance discrimination, especially when target and retained data are highly similar, CBD constructs a discriminative basis using gradient statistics and a regularized generalized eigenvalue problem. This ensures the unlearning signal targets specific information rather than broad prompt structures. Experiments show CBD outperforms eleven white-box and gray-box baselines, achieving a better trade-off between unlearning effectiveness and utility preservation. For instance, on the ToFU forget10 dataset, CBD approached the performance of a fully retrained model on the forget set while maintaining high utility, and on WMDP, it significantly reduced hazardous knowledge accuracy while preserving general knowledge.

Why it matters

For organizations deploying or using LLMs via APIs, this framework offers a practical solution for data governance, compliance, and mitigating risks associated with sensitive or harmful information without costly full model retraining. It enhances control over model behavior in black-box scenarios.

How to implement this in your domain

  1. 1Evaluate existing LLM API usage for potential data unlearning requirements, especially concerning sensitive or proprietary information.
  2. 2Investigate integrating unlearning frameworks like CBD into data governance and compliance strategies for LLM applications.
  3. 3Explore the use of auxiliary models and behavioral divergence techniques to manage model responses to specific input patterns.
  4. 4Develop strategies for identifying and categorizing data that may need to be "unlearned" from deployed LLMs.

Who benefits

BFSIHealthcareLegalAI DevelopmentMedia

Key takeaways

  • API-only black-box LLM unlearning is crucial for data governance and compliance.
  • CBD offers a novel framework to remove specific data influence without internal model access.
  • It effectively preserves general model utility even when unlearned and retained data are similar.
  • This approach significantly improves unlearning effectiveness compared to existing methods.

Original post by Zhiqiang Xie, Yijing Lin, Zhipeng Gao, Dong In Kim

"arXiv:2606.27683v1 Announce Type: new Abstract: Edge devices increasingly invoke large language models (LLMs) through API services for context aware edge intelligence, while edge generated data may be collected to improve LLMs and may introduce sensitive, copyrighted, harmful, or…"

View on X

Originally posted by Zhiqiang Xie, Yijing Lin, Zhipeng Gao, Dong In Kim on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses