Gefen Optimizer Reduces AdamW Memory Footprint by 8x, Boosts Throughput.
Summary
Gefen is a memory-efficient optimizer that significantly reduces AdamW's memory footprint by approximately 8x while maintaining performance. It achieves this by automatically sharing second-moment estimates across parameter blocks and quantizing first moments, enabling larger models or batch sizes and improving throughput in distributed training.
Why it matters
Deep learning engineers and researchers can leverage Gefen to train larger models or use bigger batch sizes, especially in distributed environments, without incurring prohibitive memory costs. This directly translates to more efficient experimentation, faster training times, and the ability to push the boundaries of model scale.
How to implement this in your domain
- 1Replace AdamW with Gefen in deep learning training pipelines to reduce optimizer memory usage.
- 2Experiment with larger batch sizes or model architectures made possible by Gefen's memory efficiency.
- 3Integrate Gefen into distributed training frameworks (e.g., FSDP, DDP) to improve throughput.
- 4Benchmark training performance and memory consumption when switching from AdamW to Gefen.
Who benefits
Key takeaways
- Gefen reduces AdamW's memory footprint by ~8x while maintaining performance.
- It shares second-moment estimates and quantizes first moments for efficiency.
- Gefen enables training larger models or using larger batch sizes.
- It significantly improves throughput in distributed training environments.
Original post by Nadav Benedek, Tomer Koren, Ohad Fried
"arXiv:2606.13894v1 Announce Type: new Abstract: AdamW is a default optimizer for modern deep learning, but its first and second moment states add roughly two parameter-sized buffers to training memory. We propose Gefen, a memory-efficient optimizer that automatically shares secon…"
View on XPrimary sources
Originally posted by Nadav Benedek, Tomer Koren, Ohad Fried on X · view source
Want to go deeper?
Turn these trends into skills with Learnijoy's hands-on AI & tech courses.
Explore coursesMore in AI Engineering & DevTools
AI-Powered Development Workflow Integrates Multiple Models
A new development workflow leverages various AI models like Grok 4.3, GPT-5.5, and Opus 4.8 for distinct stages including research, planning, coding, testing, and debugging. This structured approach aims to optimize the software development lifecycle.

Proposing AI Usage Transparency for Credible Commentary
The author suggests a requirement for individuals and organizations to publish their percentage of frontier AI usage at work and personal usage. This transparency would establish credibility before commenting on AI's utility.
MCP and A2A Protocols Standardize Agentic Internet Development
The Model Context Protocol (MCP) and Agent-to-Agent (A2A) Protocol are standardizing how AI agents discover tools, call services, and coordinate across systems. Understanding these protocols is crucial for developers building agent-compatible infrastructure.