DeepSeek-V3 Officially Released
DeepSeek-V3, DeepSeek's latest self-developed MoE model, features 671B parameters with 37B active parameters and was pre-trained on 14.8T tokens. It significantly outperformed other open-source models like Qwen2.5-72B and Llama-3.1-405B in various evaluations, achieving performance comparable to leading closed-source models such as GPT-4o and Claude-3.5-Sonnet. DeepSeek-V3 demonstrates substantial improvements in encyclopedic knowledge, long-form text generation, code, mathematics, and Chinese language processing. Algorithmic and engineering advancements have boosted generation speed from 20 TPS to 60 TPS, enhancing user experience. The API service pricing has been revised, with a 45-day discounted trial period available. DeepSeek-V3's native FP8 weights are open-sourced, supporting inference frameworks including SGLang, LMDeploy, TensorRT-LLM, and MindIE, encouraging community contributions and expanding application scenarios. DeepSeek plans to continue building upon the DeepSeek-V3 base model and share its ongoing research with the community.