This episode features Dr. Yang Songlin, a core contributor to DeltaNet, who delves into the evolution, application, and optimization of Linear Attention Mechanisms in Large Language Models (LLMs). The program begins by reviewing the core function of Attention Mechanisms and then provides a detailed introduction to the background of DeltaNet's creation, its core improvements, and its practical applications in Alibaba Qwen and Kimi Linear models. Dr. Yang explains the efficiency advantages of Linear Attention in processing long texts and why the industry tends to adopt a hybrid architecture of Linear and Full Attention to balance performance and efficiency. The podcast also discusses the unique value of Linear Attention in situations with unlimited computing power and limited data, especially its potential in improving data efficiency and state tracking. Finally, Dr. Yang Songlin shares his experience as an AI researcher in accumulating interdisciplinary skills, initiating the open-source group FLA, and looks forward to the future development direction of Attention Mechanisms (including Sparse Attention) and Continual Learning. The entire program not only provides in-depth technical insights but also incorporates reflections on scientific research, community building, and product-oriented thinking.
