Semi-CoT Improves Chain-of-Thought Reasoning with Unlabeled Data
Summary
Semi-CoT is a framework for semi-supervised Chain-of-Thought (CoT) learning that uses unlabeled questions to construct pseudo reasoning supervision. It samples multiple pseudo-CoTs, estimates semantic entropy, and selects low-entropy chains as reliable demonstrations, showing small gains on some benchmarks.
Why it matters
This research offers a method to improve LLM reasoning capabilities by leveraging abundant unlabeled data, potentially reducing the need for extensive human annotation and making CoT more scalable and accessible for various applications.
How to implement this in your domain
- 1Evaluate current LLM training strategies for their reliance on fully supervised CoT data.
- 2Explore integrating semi-supervised learning techniques to leverage unlabeled datasets for reasoning.
- 3Implement entropy-based methods for selecting high-quality pseudo-CoT demonstrations.
- 4Experiment with different demonstration selection strategies to optimize pseudo-supervision.
- 5Assess the impact on model performance and annotation costs for reasoning-intensive tasks.
Who benefits
Key takeaways
- Semi-CoT uses unlabeled data to generate pseudo reasoning supervision for LLMs.
- It samples pseudo-CoTs and selects reliable ones based on semantic entropy.
- The approach extends CoT self-training to a semi-supervised context.
- It shows potential for improving reasoning with less reliance on human labels.
Original post by Hongyang He, Jiuming Liu, Victor Sanchez
"arXiv:2607.01511v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning has emerged as an effective approach for activating latent reasoning capabilities in large language models. However, most existing CoT methods use reasoning chains mainly as inference-time prompts, w…"
View on XOriginally posted by Hongyang He, Jiuming Liu, Victor Sanchez on X · view source
Want to go deeper?
Turn these trends into skills with Learnijoy's hands-on AI & tech courses.
Explore coursesMore in AI Engineering & DevTools
Fable AI Excels in Brainstorming and Intent Understanding
A user expresses strong satisfaction with Fable AI, noting its exceptional ability to understand their intent for thinking, brainstorming, and questioning compared to other models.
New Methods for Log-Density-Ratio Estimation in Gaussian Models
This research compares ridge-regularized variational and spectral log-density-ratio estimation in Gaussian location models, deriving high-dimensional asymptotic equivalents to analyze their population risks. It concludes that variational estimators perform better with many observations, while spectral estimators are favored with fewer due to lower variance.
Dynamic Support Learning Enhances Reinforcement Learning Value Estimation
This paper introduces an approach that dynamically learns the lower and upper bounds of support intervals for categorical critics in reinforcement learning, improving value function estimation. The method, which forms a tighter upper bound on the mean-squared Bellman error, enhances stability and performance on continuous-control tasks without requiring pre-defined support intervals.