PHANTOM Dataset Released for VLM Adversarial Attack Research

Simone Gallivanone, Hossein Khodadadi, Mauro Dore, Mauro Medda, Nicola Franco· June 24, 2026 View original

▶ The 2-minute explainer

Summary

A large-scale, open-source dataset called PHANTOM has been released, containing 47,524 pre-generated adversarial attacks for vision-language models (VLMs). This dataset aims to make adversarial data accessible, covering 10 high-level and 55 subcategories of harmful intents to aid in evaluating VLM robustness and safety.

The research community now has access to PHANTOM, a new large-scale, open-source dataset specifically designed for multimodal adversarial attacks targeting vision-language models (VLMs). This dataset addresses the significant computational cost and complexity typically associated with generating a large volume of adversarial samples. PHANTOM is diverse, representative, and practical, extending existing benchmarks by encompassing 10 high-level categories and 55 subcategories of harmful intents. Comprising 47,524 adversarial samples, the dataset was generated using state-of-the-art attack strategies from recent literature. It consolidates and expands upon prior benchmarks, offering 7,826 distinct intents and introducing an additional category to broaden coverage. The primary goal of PHANTOM is to provide realistic evaluation resources for researchers and practitioners to systematically assess VLM robustness and safety, fine-tune attack-generation models, and develop or stress-test defensive guardrails under a wide array of adversarial conditions.

Why it matters

This dataset significantly lowers the barrier to entry for adversarial research in VLMs, enabling more comprehensive and reproducible evaluations of model robustness and safety, which is critical for deploying reliable AI systems.

How to implement this in your domain

  1. 1Download and integrate the PHANTOM dataset into VLM development and testing pipelines.
  2. 2Use the dataset to benchmark the robustness of existing and new VLM architectures.
  3. 3Develop and fine-tune defensive guardrails and attack detection mechanisms using the diverse attack samples.
  4. 4Conduct research on novel adversarial attack strategies by analyzing the dataset's structure and intent categories.

Who benefits

AI DevelopmentCybersecurityAutonomous VehiclesContent ModerationRobotics

Key takeaways

  • PHANTOM is a large, open-source dataset of adversarial attacks for Vision-Language Models.
  • It contains 47,524 samples covering 10 categories and 55 subcategories of harmful intents.
  • The dataset aims to simplify VLM robustness and safety research by providing pre-generated attacks.
  • It enables systematic evaluation, fine-tuning of attack models, and development of defensive guardrails.

Original post by Simone Gallivanone, Hossein Khodadadi, Mauro Dore, Mauro Medda, Nicola Franco

"arXiv:2606.24388v1 Announce Type: new Abstract: We introduce a large-scale, open-source dataset of pre-generated adversarial attacks for vision-language models (VLMs). The dataset is designed to be diverse, representative, and practical, extending existing benchmarks by covering…"

View on X

Originally posted by Simone Gallivanone, Hossein Khodadadi, Mauro Dore, Mauro Medda, Nicola Franco on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses