LLMs Exhibit Swarm Intelligence, Improving Estimation Accuracy.

Justin Brenne, Christian Meske· July 1, 2026 View original

▶ The 2-minute explainer

Summary

This research explores whether large language models can replicate human swarm intelligence effects, finding that both intra-model sampling and inter-model aggregation consistently reduce estimation errors by up to 37 percentage points. The study suggests LLMs possess metacognitive awareness regarding uncertainty, offering insights for deploying AI swarms in organizational decision-making.

Human swarm intelligence is known for its impressive collective accuracy, but it faces practical limitations related to cost, coordination, and time. This study investigates whether large language models (LLMs) can mimic these swarm intelligence effects, thereby addressing a significant gap in understanding how AI-based aggregation mechanisms function. A controlled experiment was conducted using 960 manually executed prompts across three proprietary LLMs: GPT-5, Gemini 2.5 Pro, and Claude Sonnet 4.5. The experiment tested both intra-model sampling (multiple queries to the same model) and inter-model aggregation (combining outputs from different models) on eight distinct estimation tasks. The findings consistently showed that both aggregation strategies led to error reduction, with significant improvements of up to 37 percentage points in Mean Absolute Percentage Error (MAPE). Furthermore, the study observed positive correlations between the width of confidence intervals and estimation errors, suggesting that LLMs exhibit a form of metacognitive awareness when assessing their own uncertainty. These results have important implications for both research and practical applications, providing actionable guidance for integrating LLM swarms into organizational decision-making processes.

Why it matters

This research offers a novel approach to leveraging LLMs for more accurate decision-making by aggregating their responses, potentially overcoming individual model limitations and enhancing reliability in critical business applications.

How to implement this in your domain

  1. 1Experiment with querying the same LLM multiple times for a single task and aggregating the responses to improve accuracy.
  2. 2Implement a multi-LLM strategy, combining outputs from different models (e.g., GPT-5, Gemini, Claude) for critical estimations.
  3. 3Develop internal benchmarks to test the "swarm intelligence" effect on specific business problems.
  4. 4Utilize LLM-generated confidence intervals as an indicator of uncertainty when aggregating responses.
  5. 5Design decision-making workflows that incorporate aggregated LLM insights for improved reliability.

Who benefits

ConsultingFinancial ServicesMarket ResearchHealthcareGovernment

Key takeaways

  • LLMs can approximate human swarm intelligence, leading to improved collective accuracy.
  • Both intra-model sampling and inter-model aggregation significantly reduce estimation errors.
  • LLMs show signs of metacognitive awareness regarding their uncertainty.
  • Artificial swarm intelligence can enhance organizational decision-making.

Original post by Justin Brenne, Christian Meske

"arXiv:2606.31404v1 Announce Type: new Abstract: Human swarm intelligence demonstrates remarkable collective accuracy but faces scalability constraints in cost, coordination, and time. We investigate whether large language models (LLMs) can approximate swarm intelligence effects t…"

View on X

Originally posted by Justin Brenne, Christian Meske on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

More in AI Research

AI ResearchAI Engineering & DevTools

Philosophical Foundations for Explainable AI in Healthcare Explored

This paper critically reviews the intersection of philosophy of science and explainable AI (XAI) in health sciences, examining what constitutes an adequate medical explanation. It identifies causality, trust, and epistemic adequacy as central axes for designing robust XAI systems in clinical decision-making.

Martina Mattioli, Marcello PelilloJul 1, 2026
AI ResearchAI Engineering & DevTools

New Metric Improves LLM Reinforcement Learning with Verifiable Rewards.

This research introduces the Relative Surprisal Index (RSI), an information-theoretic metric for adaptive token selection in Reinforcement Learning with Verifiable Rewards (RLVR) for LLMs. RSI-S, an entropy-adaptive filtering method based on RSI, improves reasoning accuracy by 2-3 percentage points by retaining tokens within a stable surprisal interval.

Outongyi Lv, Yanzhao Zheng, Yuanwei Zhang, Zhenghao Huang, Xingjun Wang, Baohua Dong, Hangcheng Zhu, Yingda ChenJul 1, 2026
AI Engineering & DevToolsAI Research

New ACE Module Boosts LLM Agent Context Management

Researchers introduce ACE (Adaptive Context Elasticizer), a plug-and-play module that dynamically manages historical information for LLM-based agents. ACE maintains a lossless message layer and adaptively orchestrates context, significantly improving performance across various agent frameworks without architectural changes.

Ning Liao, Zihao Long, Xiaoxing Wang, Xue Yang, Yaoming Wang, Ziyuan Zhuang, Xunliang Cai, Rongxiang Weng, Junchi YanJul 1, 2026