Semi-CoT Improves Chain-of-Thought Reasoning with Unlabeled

Semi-CoT Improves Chain-of-Thought Reasoning with Unlabeled Data

Hongyang He, Jiuming Liu, Victor Sanchez· July 3, 2026 View original

Summary

Semi-CoT is a framework for semi-supervised Chain-of-Thought (CoT) learning that uses unlabeled questions to construct pseudo reasoning supervision. It samples multiple pseudo-CoTs, estimates semantic entropy, and selects low-entropy chains as reliable demonstrations, showing small gains on some benchmarks.

Chain-of-Thought (CoT) reasoning has proven effective in activating latent reasoning capabilities in large language models. However, most existing CoT methods primarily use reasoning chains as inference-time prompts, rarely leveraging generated traces as semi-supervised learning signals. A new framework, Semi-CoT, addresses this by defining Semi-supervised Chain-of-Thought Learning. It constructs pseudo reasoning supervision from unlabeled questions. The process involves sampling multiple pseudo-CoTs for each unlabeled question, estimating answer-level semantic entropy, and then selecting low-entropy reasoning chains as reliable pseudo-CoT demonstrations. This extends the self-training aspect of CoT from inference-time refinement to semi-supervised pseudo-supervision. Pilot experiments on datasets like SVAMP and GSM8K showed small gains, with pseudo-answer precision ranging from 91.36% to 100%. While AQuA showed negative transfer and MultiArith reached a ceiling, the results suggest that unlabeled questions can provide reliable pseudo reasoning signals, though effective utilization may require stronger demonstration selection or student training.

Why it matters

This research offers a method to improve LLM reasoning capabilities by leveraging abundant unlabeled data, potentially reducing the need for extensive human annotation and making CoT more scalable and accessible for various applications.

How to implement this in your domain

1Evaluate current LLM training strategies for their reliance on fully supervised CoT data.
2Explore integrating semi-supervised learning techniques to leverage unlabeled datasets for reasoning.
3Implement entropy-based methods for selecting high-quality pseudo-CoT demonstrations.
4Experiment with different demonstration selection strategies to optimize pseudo-supervision.
5Assess the impact on model performance and annotation costs for reasoning-intensive tasks.

Who benefits

AI DevelopmentEducationCustomer ServiceData AnnotationResearch & Academia

Key takeaways

Semi-CoT uses unlabeled data to generate pseudo reasoning supervision for LLMs.
It samples pseudo-CoTs and selects reliable ones based on semantic entropy.
The approach extends CoT self-training to a semi-supervised context.
It shows potential for improving reasoning with less reliance on human labels.

Original post by Hongyang He, Jiuming Liu, Victor Sanchez

"arXiv:2607.01511v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning has emerged as an effective approach for activating latent reasoning capabilities in large language models. However, most existing CoT methods use reasoning chains mainly as inference-time prompts, w…"

View on X

Originally posted by Hongyang He, Jiuming Liu, Victor Sanchez on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses

Semi-CoT Improves Chain-of-Thought Reasoning with Unlabeled Data

Why it matters

How to implement this in your domain

Who benefits

Key takeaways

Want to go deeper?

More in AI Engineering & DevTools

Fable AI Excels in Brainstorming and Intent Understanding

New Methods for Log-Density-Ratio Estimation in Gaussian Models

Dynamic Support Learning Enhances Reinforcement Learning Value Estimation