Semi-CoT Improves Chain-of-Thought Reasoning with Unlabeled Data

Hongyang He, Jiuming Liu, Victor Sanchez· July 3, 2026 View original

Summary

Semi-CoT is a framework for semi-supervised Chain-of-Thought (CoT) learning that uses unlabeled questions to construct pseudo reasoning supervision. It samples multiple pseudo-CoTs, estimates semantic entropy, and selects low-entropy chains as reliable demonstrations, showing small gains on some benchmarks.

Chain-of-Thought (CoT) reasoning has proven effective in activating latent reasoning capabilities in large language models. However, most existing CoT methods primarily use reasoning chains as inference-time prompts, rarely leveraging generated traces as semi-supervised learning signals. A new framework, Semi-CoT, addresses this by defining Semi-supervised Chain-of-Thought Learning. It constructs pseudo reasoning supervision from unlabeled questions. The process involves sampling multiple pseudo-CoTs for each unlabeled question, estimating answer-level semantic entropy, and then selecting low-entropy reasoning chains as reliable pseudo-CoT demonstrations. This extends the self-training aspect of CoT from inference-time refinement to semi-supervised pseudo-supervision. Pilot experiments on datasets like SVAMP and GSM8K showed small gains, with pseudo-answer precision ranging from 91.36% to 100%. While AQuA showed negative transfer and MultiArith reached a ceiling, the results suggest that unlabeled questions can provide reliable pseudo reasoning signals, though effective utilization may require stronger demonstration selection or student training.

Why it matters

This research offers a method to improve LLM reasoning capabilities by leveraging abundant unlabeled data, potentially reducing the need for extensive human annotation and making CoT more scalable and accessible for various applications.

How to implement this in your domain

  1. 1Evaluate current LLM training strategies for their reliance on fully supervised CoT data.
  2. 2Explore integrating semi-supervised learning techniques to leverage unlabeled datasets for reasoning.
  3. 3Implement entropy-based methods for selecting high-quality pseudo-CoT demonstrations.
  4. 4Experiment with different demonstration selection strategies to optimize pseudo-supervision.
  5. 5Assess the impact on model performance and annotation costs for reasoning-intensive tasks.

Who benefits

AI DevelopmentEducationCustomer ServiceData AnnotationResearch & Academia

Key takeaways

  • Semi-CoT uses unlabeled data to generate pseudo reasoning supervision for LLMs.
  • It samples pseudo-CoTs and selects reliable ones based on semantic entropy.
  • The approach extends CoT self-training to a semi-supervised context.
  • It shows potential for improving reasoning with less reliance on human labels.

Original post by Hongyang He, Jiuming Liu, Victor Sanchez

"arXiv:2607.01511v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning has emerged as an effective approach for activating latent reasoning capabilities in large language models. However, most existing CoT methods use reasoning chains mainly as inference-time prompts, w…"

View on X

Originally posted by Hongyang He, Jiuming Liu, Victor Sanchez on X · view source

Want to go deeper?

Turn these trends into skills with Learnijoy's hands-on AI & tech courses.

Explore courses