GPT-5 Achieves High Accuracy on Scrum Certification Questions
Summary
This study evaluates GPT-5's factual accuracy on 993 PSM-aligned Scrum certification questions using zero-shot, chain-of-thought, and with-source citation prompting. All methods achieved over 85% accuracy, with citation-based prompting performing best, though errors clustered in interpretive areas and multi-select questions.
Why it matters
This research provides concrete evidence of GPT-5's capabilities and limitations in understanding and applying the Scrum framework, which is vital for professionals using LLMs for Agile training, coaching, or certification preparation. It highlights the importance of prompt engineering for accuracy.
How to implement this in your domain
- 1Utilize GPT-5 with citation-based prompting for generating Scrum-related training materials or answering factual questions.
- 2Cross-reference LLM-generated answers with the official Scrum Guide, especially for interpretive or multi-select questions.
- 3Develop custom prompts that explicitly instruct the LLM to adhere to the latest Scrum Guide (2020) to mitigate version drift.
- 4Integrate LLMs as a supplementary tool for learning, emphasizing human review for critical or nuanced Scrum concepts.
Who benefits
Key takeaways
- GPT-5 achieves high accuracy (over 85%) on Scrum certification questions.
- Citation-based prompting yields the best results and lowest error rates.
- Models perform better on explicit topics and single-choice questions.
- Errors often stem from misinterpretation, scope issues, or outdated information.
Original post by Mirko Perkusich, Danyllo Albuquerque, Jo\~ao Paiva, Robson Vilar, Emanuel Dantas, Ademar Fran\c{c}a de Sousa Neto, Rohit Gheyi, Kyller Gorg\^onio, Angelo Perkusich
"arXiv:2607.00049v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used in Agile Software Development for documentation, coaching, and training. As practitioners adopt these tools to prepare for certifications such as Professional Scrum Master (PSM),…"
View on XOriginally posted by Mirko Perkusich, Danyllo Albuquerque, Jo\~ao Paiva, Robson Vilar, Emanuel Dantas, Ademar Fran\c{c}a de Sousa Neto, Rohit Gheyi, Kyller Gorg\^onio, Angelo Perkusich on X · view source
Want to go deeper?
Turn these trends into skills with Learnijoy's hands-on AI & tech courses.
Explore coursesMore in AI Research
Human Feedback Guides Generative Meta-Learning for Robust Generalization.
This paper introduces Generative Meta-Learning with Human Feedback (GMHF), a framework that uses expert intuition to guide data synthesis and bridge the domain gap for machine learning models. GMHF employs a Conditional Neural ODE as a generative digital twin and an RL agent to refine latent physical parameters based on feedback, significantly reducing deployment loss and improving generalization under distribution shifts.
Valdi: Value Diffusion World Models for MPC
Valdi introduces Value Diffusion World Models, combining end-to-end online training for Model Predictive Control (MPC) with a latent diffusion dynamics model. Preliminary experiments show that Valdi, using a single diffusion step, matches deterministic MLP baselines in the CarRacing environment, highlighting a trade-off between predictive multimodality and control performance.
Task-Aware LLM Quantization Improves Efficiency and Performance.
This paper introduces TASA (Task-Aware Sensitivity Analysis), a two-level framework for mixed-precision quantization of large language models (LLMs) that optimizes calibration data composition and bit allocation. TASA addresses the "Perplexity Illusion" and the "Alignment-Diversity Tradeoff," enabling 3.5-bit models to match or surpass 4-bit baselines by jointly considering perplexity and reasoning-oriented sensitivity.