UltraFlux Achieves High-Quality Native 4K Text-to-Image Gene

UltraFlux Achieves High-Quality Native 4K Text-to-Image Generation

Tian Ye, Song Fei, Lei Zhu· July 2, 2026 View original

Summary

Researchers introduce UltraFlux, a Flux-based Diffusion Transformer trained natively at 4K resolution on a new 1M-image corpus, MultiAspect-4K-1M. This co-design approach addresses challenges in extending diffusion transformers to high resolutions and diverse aspect ratios, outperforming existing baselines.

Generating high-resolution images with text-to-image models, especially at 4K across various aspect ratios, presents significant challenges for current diffusion transformers. Issues arise from positional encoding, VAE compression, and optimization, which are tightly coupled and cannot be solved in isolation. A new approach, UltraFlux, tackles these problems through a data-model co-design strategy. It utilizes a Flux-based Diffusion Transformer trained on a specially curated 1-million-image 4K dataset, MultiAspect-4K-1M, which includes diverse aspect ratios and rich metadata. UltraFlux incorporates several innovations: Resonance 2D RoPE for positional encoding, a non-adversarial VAE post-training scheme for better 4K reconstruction, an SNR-Aware Huber Wavelet objective for gradient rebalancing, and a Stage-wise Aesthetic Curriculum Learning strategy. These combined elements enable stable, detail-preserving 4K image generation that generalizes across wide, square, and tall aspect ratios, surpassing open-source baselines and matching proprietary models.

Why it matters

Professionals in creative industries and AI development can leverage this advancement to produce higher fidelity and more versatile AI-generated imagery, reducing the need for manual upscaling or post-processing.

How to implement this in your domain

1Explore integrating UltraFlux or similar 4K generation models into creative workflows for marketing and design.
2Evaluate the quality and versatility of 4K AI-generated assets for specific project requirements.
3Investigate the underlying techniques (e.g., Resonance 2D RoPE, VAE post-training) for potential application in other generative AI tasks.
4Consider contributing to or utilizing datasets like MultiAspect-4K-1M for training custom high-resolution models.

Who benefits

Creative AgenciesMedia & EntertainmentE-commerceGamingAdvertising

Key takeaways

UltraFlux significantly improves native 4K text-to-image generation across diverse aspect ratios.
The co-design of data and model is crucial for overcoming high-resolution generation challenges.
New techniques like Resonance 2D RoPE and SNR-Aware Huber Wavelet objective contribute to enhanced image quality.
This research enables more stable and detail-preserving AI-generated visuals for professional use.

Original post by Tian Ye, Song Fei, Lei Zhu

"arXiv:2511.18050v1 Announce Type: cross Abstract: Diffusion transformers have recently delivered strong text-to-image generation around 1K resolution, but we show that extending them to native 4K across diverse aspect ratios exposes a tightly coupled failure mode spanning positio…"

View on X

Originally posted by Tian Ye, Song Fei, Lei Zhu on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

UltraFlux Achieves High-Quality Native 4K Text-to-Image Generation

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Research

Human Feedback Guides Generative Meta-Learning for Robust Generalization.

Valdi: Value Diffusion World Models for MPC

Task-Aware LLM Quantization Improves Efficiency and Performance.