Affordance20Q Benchmarks LLM Affordance Reasoning from Physi

Affordance20Q Benchmarks LLM Affordance Reasoning from Physical Properties.

Yifan Jiang, Meige Yang, Zitong Li, Jay Pujara· June 15, 2026 View original

Summary

This paper introduces Affordance20Q, a new benchmark for evaluating LLMs' ability to reason about object affordances based on physical properties, without relying on memorized object identities. It reveals a significant gap between LLM and human performance and proposes KB-Anchored Rule Induction (KARI) to improve open-source LLMs.

Affordance reasoning, which involves inferring an object's potential actions from its physical characteristics, is crucial for human understanding and increasingly important for Large Language Models (LLMs). Existing benchmarks often inadvertently allow models to rely on memorized object-affordance mappings by explicitly revealing object identities, rather than truly testing reasoning from physical properties. To address this, Affordance20Q is introduced as a novel benchmark. It frames affordance reasoning as a 20-Questions game where models must identify a hidden object's affordance by asking yes/no questions about its physical properties, without knowing the object's identity. The benchmark comprises over 1,000 games across hundreds of objects and affordances, all meticulously annotated. Experiments with 15 state-of-the-art LLMs show a substantial performance gap compared to humans, indicating models struggle with discriminating questions as the game progresses. To improve this, the researchers developed KB-Anchored Rule Induction (KARI), an LLM-based pipeline that generates affordance rules grounded in knowledge bases. KARI significantly boosts open-source LLMs' performance, though its gains are limited by knowledge base coverage.

Why it matters

For professionals developing AI systems that interact with the physical world (e.g., robotics, virtual assistants, simulation), Affordance20Q provides a critical tool for evaluating and improving an LLM's ability to understand object functionality. This is essential for building more intelligent and adaptable AI that can operate effectively in diverse environments.

How to implement this in your domain

1Utilize the Affordance20Q benchmark to rigorously evaluate the affordance reasoning capabilities of your LLM-powered agents.
2Integrate knowledge base-anchored rule induction (KARI) techniques to enhance LLMs' ability to infer object affordances from physical properties.
3Develop LLM training strategies that emphasize reasoning over memorization for physical interaction tasks.
4Explore methods to expand and refine knowledge bases to improve the coverage and effectiveness of affordance reasoning systems.

Who benefits

RoboticsVirtual Reality/Augmented RealitySmart Home DevicesManufacturingGaming

Key takeaways

Affordance20Q is a new benchmark for evaluating LLM affordance reasoning from physical properties.
Current LLMs show a significant performance gap compared to humans on this task.
Models struggle to ask discriminating questions as the game progresses.
KB-Anchored Rule Induction (KARI) can improve LLM affordance reasoning.

Original post by Yifan Jiang, Meige Yang, Zitong Li, Jay Pujara

"arXiv:2606.14240v1 Announce Type: new Abstract: Affordance reasoning, the inference of an object's action possibilities from its physical properties (e.g., shape and material), is fundamental to human physical understanding and increasingly critical for Large Language Models (LLM…"

View on X

Originally posted by Yifan Jiang, Meige Yang, Zitong Li, Jay Pujara on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

Affordance20Q Benchmarks LLM Affordance Reasoning from Physical Properties.

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Research

VISReg Enhances JEPA Training with Novel Regularization

Margaret Atwood Criticizes AI for "Garbage In, Garbage Out" Flaw

Podcast Explores Large Test-Time Compute and AI Model Budgets