Analyzing Human-Like Behaviors in Large Language Models

Sunnie S. Y. Kim, Margit Bowler, Leon A Gatys· June 18, 2026 View original

Summary

Researchers conducted a multi-dimensional analysis of human-like behaviors in large language models, examining their prevalence, effects, and controllability across various models and user factors. The study found that while such behaviors are pervasive, their perceived appropriateness varies, and system prompting can control them with careful evaluation.

Large language models (LLMs) frequently exhibit a range of human-like behaviors, from expressing emotions and building relationships to setting boundaries and refusing requests. Despite the commonality of these behaviors, there has been a lack of empirical understanding regarding when and how LLMs should display them. This research aims to fill that gap through a comprehensive multi-dimensional analysis. The study utilized both LLM-as-a-judge and human evaluation across 21,000 multi-turn conversations involving four widely used models: GPT-4o, GPT-4.1-mini, Claude-Sonnet-4.6, and Gemini-2.5-flash. Findings indicate that human-like behaviors are widespread but vary significantly depending on the model and user factors, such as conversation goals and user profiles. Human evaluators judged self-referential and relationship-building behaviors as less appropriate from LLMs compared to humans, while boundary-maintaining behaviors were considered more appropriate from LLMs. Crucially, the research demonstrates that system prompting can effectively control these behaviors, though careful evaluation is necessary to prevent unintended consequences. These insights are vital for responsible LLM design and evaluation.

Why it matters

Understanding and controlling human-like behaviors in LLMs is crucial for designing ethical, effective, and user-friendly AI systems, especially in sensitive applications. Professionals can use these insights to fine-tune AI interactions, manage user expectations, and ensure appropriate AI conduct.

How to implement this in your domain

1Define clear guidelines for appropriate human-like behaviors in LLM applications based on user context.
2Utilize system prompts to control and modulate the expression of human-like traits in AI interactions.
3Conduct user testing and ethical reviews to assess the perceived appropriateness of LLM behaviors.
4Train AI developers on the nuances of human-like behavior generation and its implications for user experience.

Who benefits

AI EthicsCustomer ServiceHealthcareSocial MediaSoftware Development

Key takeaways

LLMs exhibit pervasive human-like behaviors, varying by model and user context.
Perceived appropriateness of these behaviors differs between humans and LLMs.
System prompting can control human-like behaviors, but requires careful evaluation.
Responsible LLM design needs to consider the implications of human-like interactions.

Original post by Sunnie S. Y. Kim, Margit Bowler, Leon A Gatys

"arXiv:2606.18258v1 Announce Type: cross Abstract: Large language models (LLMs) exhibit a wide range of human-like behaviors, from expressing thoughts and emotions, to engaging in relationship-building with users, to refusing requests and maintaining boundaries. Despite their prev…"

View on X

Originally posted by Sunnie S. Y. Kim, Margit Bowler, Leon A Gatys on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

Analyzing Human-Like Behaviors in Large Language Models

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Engineering & DevTools

MCP and A2A Protocols Standardize Agentic Internet Development

VISReg Enhances JEPA Training with Novel Regularization

Ford's AI-Driven Layoffs Backfire Significantly