GPT-5 Achieves High Accuracy on Scrum Certification Question

GPT-5 Achieves High Accuracy on Scrum Certification Questions

Mirko Perkusich, Danyllo Albuquerque, Jo\~ao Paiva, Robson Vilar, Emanuel Dantas, Ademar Fran\c{c}a de Sousa Neto, Rohit Gheyi, Kyller Gorg\^onio, Angelo Perkusich· July 2, 2026 View original

Summary

This study evaluates GPT-5's factual accuracy on 993 PSM-aligned Scrum certification questions using zero-shot, chain-of-thought, and with-source citation prompting. All methods achieved over 85% accuracy, with citation-based prompting performing best, though errors clustered in interpretive areas and multi-select questions.

As Large Language Models (LLMs) become integrated into Agile software development for tasks like documentation and coaching, their reliability in reasoning about normative frameworks like Scrum is crucial. This paper specifically investigates how different prompting techniques influence the factual accuracy of GPT-5 when answering Scrum certification-style questions. Researchers used a dataset of 993 validated questions aligned with the Professional Scrum Master (PSM) assessment. GPT-5 was tested with three prompting strategies: zero-shot, chain-of-thought, and a citation-based approach. All methods demonstrated certification-level accuracy, exceeding 85%, with the citation-based variant achieving the highest performance at 89.1% and the lowest error rate. The study found that correct answers were concentrated in well-defined Scrum topics such as "Definition of Done" and "Events," and in single-answer multiple-choice questions. Conversely, multi-select questions and more interpretive areas like "Scrum Team" and "Product Value" proved less stable. Analysis of errors revealed systematic issues, including misalignment with the Scrum Guide, content outside its scope, and outdated or biased interpretations.

Why it matters

This research provides concrete evidence of GPT-5's capabilities and limitations in understanding and applying the Scrum framework, which is vital for professionals using LLMs for Agile training, coaching, or certification preparation. It highlights the importance of prompt engineering for accuracy.

How to implement this in your domain

1Utilize GPT-5 with citation-based prompting for generating Scrum-related training materials or answering factual questions.
2Cross-reference LLM-generated answers with the official Scrum Guide, especially for interpretive or multi-select questions.
3Develop custom prompts that explicitly instruct the LLM to adhere to the latest Scrum Guide (2020) to mitigate version drift.
4Integrate LLMs as a supplementary tool for learning, emphasizing human review for critical or nuanced Scrum concepts.

Who benefits

Software DevelopmentIT ServicesConsultingEdTechProject Management

Key takeaways

GPT-5 achieves high accuracy (over 85%) on Scrum certification questions.
Citation-based prompting yields the best results and lowest error rates.
Models perform better on explicit topics and single-choice questions.
Errors often stem from misinterpretation, scope issues, or outdated information.

Original post by Mirko Perkusich, Danyllo Albuquerque, Jo\~ao Paiva, Robson Vilar, Emanuel Dantas, Ademar Fran\c{c}a de Sousa Neto, Rohit Gheyi, Kyller Gorg\^onio, Angelo Perkusich

"arXiv:2607.00049v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used in Agile Software Development for documentation, coaching, and training. As practitioners adopt these tools to prepare for certifications such as Professional Scrum Master (PSM),…"

View on X

Originally posted by Mirko Perkusich, Danyllo Albuquerque, Jo\~ao Paiva, Robson Vilar, Emanuel Dantas, Ademar Fran\c{c}a de Sousa Neto, Rohit Gheyi, Kyller Gorg\^onio, Angelo Perkusich on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

GPT-5 Achieves High Accuracy on Scrum Certification Questions

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Research

Human Feedback Guides Generative Meta-Learning for Robust Generalization.

Valdi: Value Diffusion World Models for MPC

Task-Aware LLM Quantization Improves Efficiency and Performance.