NaviGen Personalizes Multimodal Generation Using User Behavior.

Hengji Zhou, Yufeng Liu, Ye Liu, Yong Xu, Lianghao Xia, Liqiang Nie· June 25, 2026 View original

Summary

This research introduces NaviGen, a framework that translates user interaction history into executable instructions for personalized multimodal content generation. It addresses challenges in encoding user behavior and developing instruction-writing skills in models, leading to more relevant and specific outputs.

Modern AI-generated content (AIGC) systems excel at producing high-quality images and videos, but they typically require precise input instructions. End-users, however, often struggle to articulate detailed visual preferences, leading to a mismatch between their desires and the generated output. This paper explores personalized content generation, aiming to convert a user's past interactions into clear, actionable instructions for synthesis. The NaviGen framework tackles two primary hurdles: effectively encoding user behavior in a format understandable by language models, and enabling models to generate these instructions, a skill not inherent in pre-training or standard behavioral data. NaviGen represents each interaction item with a dual identifier, combining collaborative and textual codes into a single token stream, serving as both a behavioral substrate and a semantic bridge. A two-stage pipeline, involving supervised fine-tuning (SFT) followed by reinforcement learning (RL), is then employed. This pipeline first distills preference reasoning and instruction writing from carefully evolved supervision, and subsequently aligns the generation process with explicit user intent using hierarchical and self-consistent rewards. Evaluations across product, game, and short-video domains demonstrate that NaviGen enhances personalized image and video generation, improves next-item prediction accuracy, and produces instructions that are more specific, relevant, and visually feasible.

Why it matters

For professionals in product development, marketing, and content creation, NaviGen offers a pathway to significantly improve user engagement and satisfaction by delivering truly personalized AI-generated content, reducing the need for explicit, detailed user prompts.

How to implement this in your domain

  1. 1Analyze user interaction data to identify patterns and implicit preferences for content generation.
  2. 2Explore dual-identifier representation strategies for user behavior in multimodal AI systems.
  3. 3Implement a two-stage SFT+RL pipeline to teach models both preference reasoning and instruction writing.
  4. 4Design hierarchical and self-consistent reward functions to align generated content with user intent.
  5. 5Integrate personalized generation capabilities into existing product, marketing, or content platforms.

Who benefits

E-commerceMedia & EntertainmentSocial MediaGamingDigital Marketing

Key takeaways

  • Personalized multimodal generation can be achieved by translating user behavior into executable instructions.
  • NaviGen uses a dual-identifier system and a two-stage SFT+RL pipeline for this purpose.
  • The framework improves content relevance, specificity, and next-item prediction.
  • It addresses the challenge of implicit user preferences in AIGC.

Original post by Hengji Zhou, Yufeng Liu, Ye Liu, Yong Xu, Lianghao Xia, Liqiang Nie

"arXiv:2606.24196v2 Announce Type: new Abstract: Modern AIGC pipelines deliver high-fidelity images and videos but presuppose a well-formed creation instruction, while end users rarely articulate visual details, leaving generators misaligned with user demand. We study personalized…"

View on X

Originally posted by Hengji Zhou, Yufeng Liu, Ye Liu, Yong Xu, Lianghao Xia, Liqiang Nie on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses