New Benchmark Reveals AI Agents Fail Hidden Social Norms
Summary
Researchers introduce NormAct, a benchmark for embodied social-norm interactions that evaluates multimodal large language models (MLLMs) on their ability to infer and comply with hidden social norms during planning. Experiments show a significant gap between explicit goal achievement and hidden norm compliance in state-of-the-art MLLMs, proposing NormPerceptor to address this.
Why it matters
For professionals developing embodied AI, robotics, or virtual assistants, ensuring social appropriateness is as important as task completion. This benchmark and proposed solution highlight a critical area for development, enabling the creation of AI systems that are not only functional but also socially intelligent and acceptable in human environments.
How to implement this in your domain
- 1Evaluate existing embodied AI agents against the NormAct benchmark to identify gaps in social norm compliance.
- 2Develop context-aware modules that can infer and activate relevant social norms based on environmental cues.
- 3Integrate norm-grounding mechanisms into MLLM planning pipelines to ensure social constraints are considered alongside explicit goals.
- 4Prioritize training data and fine-tuning strategies that emphasize implicit social understanding for embodied agents.
Who benefits
Key takeaways
- Embodied MLLMs struggle significantly with inferring and complying with hidden social norms.
- The NormAct benchmark reveals a large gap between explicit goal achievement and social appropriateness.
- The issue stems from difficulty in activating and grounding social knowledge in context, not a lack of general knowledge.
- NormPerceptor, a cue generator, can improve norm compliance and overall task success.
Original post by Shiyun Zhao, Xinwei Song, Tianyu Guo, Xiaomeng Gao, Mingyuan Liu, Xu Han, Yuanyuan Zhang, Zhenliang Zhang, Xue Feng, Bo Dai
"arXiv:2606.27826v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) are increasingly deployed as embodied planners in egocentric environments, where task success requires not only achieving instructed goals but also acting in socially appropriate ways. While…"
View on XPrimary sources
Originally posted by Shiyun Zhao, Xinwei Song, Tianyu Guo, Xiaomeng Gao, Mingyuan Liu, Xu Han, Yuanyuan Zhang, Zhenliang Zhang, Xue Feng, Bo Dai on X · view source
Want to go deeper?
Turn these trends into skills with Learnijoy's hands-on AI & tech courses.
Explore coursesMore in AI Engineering & DevTools

Sky Pro Cloud Rendering Optimized, Cost Cut by 50%
An upcoming Sky Pro update significantly reduces cloud rendering costs by 50% through texture consolidation and introduces more intuitive cloud shape controls. The new controls allow independent erosion strength adjustments for cloud tops and bottoms, improving visual quality and ease of use.
Popping the GPU Bubble
The piece discusses the current high demand and pricing for GPUs, suggesting that the market might be nearing a point of correction or saturation.

LongCat-2.0 Model Launching Soon on Hugging Face
The LongCat-2.0 model is expected to be released shortly on the Hugging Face platform, making it accessible to developers and researchers.