New Multimodal CAD Dataset Released for AI Design Research

Jizong Zhan· June 17, 2026 View original

Summary

FllumaOne is a new code-native multimodal CAD dataset featuring 100,000 models generated by executable Python programs within the Flluma CAD system. It provides aligned data including programs, feature trees, STEP geometry, point clouds, natural language descriptions, and renderings, supporting various editable CAD research tasks.

Researchers have introduced FllumaOne, a novel code-native multimodal CAD dataset designed to advance editable computer-aided design (CAD) research. Unlike traditional datasets that might only provide final geometry, FllumaOne emphasizes the ordered construction history and modeling operations, which are crucial for understanding how a part can be edited. The dataset's models are generated by executable Python programs within the Flluma CAD system, ensuring a high degree of fidelity and editability. The primary release, FllumaOne-100K, comprises 100,000 validated samples spanning four levels of template complexity. Each sample in the dataset meticulously aligns its Python program with a structured feature tree, an intermediate representation optimized for training, as well as STEP geometry, a surface point cloud, natural-language descriptions, metadata, and eight canonical visible-edge renderings. This rich multimodal structure provides a comprehensive view of each CAD model. Rigorous validation processes were applied, including checks for kernel geometry, solid validity, and export success, with detailed reports on modality completeness and duplicate tests. A baseline model, Qwen2.5-Coder-1.5B LoRA, trained on 80,000 samples, achieved impressive results on the held-out test set, demonstrating high Python syntax validity, Flluma build success, and STEP-export validity. This dataset is expected to support a wide range of research areas, including conditioned CAD reconstruction, executable program synthesis, feature-tree prediction, and editable reverse engineering.

Why it matters

This dataset is a significant resource for professionals in AI and engineering, particularly those working on generative design, automated manufacturing, and intelligent CAD systems. It provides the structured, multimodal data needed to train advanced AI models for design automation, accelerating innovation in product development.

How to implement this in your domain

  1. 1Download and explore the FllumaOne dataset for training custom AI models in generative design or CAD automation.
  2. 2Develop new algorithms for conditioned CAD reconstruction or executable program synthesis using the dataset's unique code-native structure.
  3. 3Integrate feature-tree prediction capabilities into existing CAD software workflows to enhance design automation.
  4. 4Utilize the multimodal data (geometry, point clouds, text) to train AI for intelligent design completion or editable reverse engineering.
  5. 5Collaborate with academic institutions leveraging FllumaOne to stay abreast of cutting-edge AI applications in engineering design.

Who benefits

ManufacturingAutomotiveAerospaceProduct DesignAI Engineering

Key takeaways

  • FllumaOne is a new 100K-sample multimodal CAD dataset with executable Python programs.
  • It provides aligned data including feature trees, geometry, point clouds, and natural language.
  • The dataset supports research in generative design, program synthesis, and editable reverse engineering.
  • Rigorous validation ensures high quality and utility for AI training.

Original post by Jizong Zhan

"arXiv:2606.17696v1 Announce Type: new Abstract: Parametric computer-aided design records both final geometry and the ordered construction history that determines how a part can be edited. Datasets for editable CAD research should therefore expose modeling operations, parameters,…"

View on X

Originally posted by Jizong Zhan on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses