New Mixture of Experts Architecture! Alibaba Open-Sources Qwen3-Next, Reducing Training Costs by 90%

Alibaba Tongyi team open-sourced the next-generation Large Language Model architecture, Qwen3-Next. The model has a total of 80B parameters but activates only 3B parameters, achieving a breakthrough in reducing training costs by 90% and increasing inference throughput by more than 10 times. Its core innovations include: a Hybrid Attention Mechanism combining Gated DeltaNet and Gated Attention, designed to optimize Long Context processing; an extremely Sparse MoE Structure with 512 experts, 10 routing experts, and 1 shared expert, activating only 3.7% of parameters; designs that enhance training stability (such as Zero-Centered Root Mean Square Layer Normalization); and a native Multi-Token Prediction (MTP) mechanism to improve inference efficiency. The Qwen3-Next-80B-A3B model rivals the Qwen3 flagship version in performance and outperforms State-of-the-Art (SOTA) Dense Models in multiple evaluations, demonstrating extremely high training and inference cost-effectiveness. The model has been open-sourced and launched on platforms such as Hugging Face, providing an efficient solution for future trends in Large Language Models (Context Length (上下文长度) and Parameter Scaling (参数量扩展)).




