LLM Psychological Profiles Found to Be Measurement Artifacts

LLM Psychological Profiles Found to Be Measurement Artifacts, Not Intrinsic Traits.

Jelena Meyer, David Garcia, Dirk U. Wulff· June 19, 2026 View original

Summary

Research indicates that apparent psychological profiles assigned to Large Language Models using human instruments are largely measurement artifacts, driven by a directional response bias rather than actual traits. This bias accounts for 81-90% of between-model variation, challenging the validity of using such profiles for safety or usability assessments.

Psychological instruments designed for humans are increasingly being used to assign stable psychological profiles to Large Language Models (LLMs). These profiles are then used to assess usability, safety, and even to employ LLMs as proxies for human participants in research. However, a new study, using a formal psychometric framework, suggests that these apparent profiles are largely an artifact of the measurement process itself. The researchers administered a battery of personality and risk-preference instruments, including self-reports and behavioral tasks, to 56 instruction-tuned LLMs and compared their responses to large human reference samples. They found that differences between models were primarily driven by a "directional response bias"—a tendency for the LLM to respond towards one end of a scale or a particular labeled option, irrespective of the item's content. This bias accounted for a substantial 81-90% of the variation observed between models, significantly higher than the 9-16% seen in humans. Further findings revealed that while this bias decreases with increased model capability, it is not entirely eliminated. The study also introduced the concept of "response orthogonality," which predicts an instrument's apparent reliability based on the proportion of items where trait and bias point in opposite directions. Crucially, the psychological profile an LLM appears to have can shift depending on the specific items used and can even be manufactured through item selection. These results strongly suggest that the observed psychological profiles of LLMs are products of the instruments used, rather than inherent properties of the models themselves, calling for dedicated, LLM-specific assessment methods.

Why it matters

For professionals involved in AI ethics, safety, and human-AI interaction design, this research is critical. It debunks the notion of stable LLM psychological profiles, urging a re-evaluation of how we assess and interpret LLM behavior, and emphasizing the need for robust, LLM-specific evaluation methodologies.

How to implement this in your domain

1Re-evaluate existing LLM safety and usability assessments that rely on human psychological instruments.
2Develop new, LLM-specific evaluation frameworks that account for response biases and focus on objective performance metrics.
3Educate teams on the limitations of applying human psychological concepts directly to AI models.
4Design LLM prompts and interaction strategies to mitigate the influence of directional response bias.
5Collaborate with psychometricians and AI ethicists to create valid and reliable assessment tools for AI behavior.

Who benefits

AI EthicsAI SafetyHuman-Computer InteractionAI ResearchSoftware Development

Key takeaways

LLM psychological profiles derived from human instruments are largely measurement artifacts.
A directional response bias, not intrinsic traits, drives most variation between LLMs.
This bias accounts for 81-90% of between-model differences.
New, LLM-specific assessment methods are needed to accurately evaluate AI behavior.

Original post by Jelena Meyer, David Garcia, Dirk U. Wulff

"arXiv:2606.20205v1 Announce Type: new Abstract: Psychological instruments designed for humans are increasingly used to assign large language models (LLMs) stable psychological profiles that affect their usability, safety assessment, and use as proxies for human participants in re…"

View on X

Originally posted by Jelena Meyer, David Garcia, Dirk U. Wulff on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

LLM Psychological Profiles Found to Be Measurement Artifacts, Not Intrinsic Traits.

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Research

VISReg Enhances JEPA Training with Novel Regularization

Margaret Atwood Criticizes AI for "Garbage In, Garbage Out" Flaw

Podcast Explores Large Test-Time Compute and AI Model Budgets