Articles
This article is a comprehensive and lengthy (10,000+ words) exploration of the principles behind Large Language Models (LLMs). It begins by outlining the definition and evolution of LLMs, with a particular focus on how the Transformer Architecture has superseded traditional RNN/LSTM models to become the dominant approach. The article then delves into the pre-training process of LLMs, covering the history of neural networks, word vectorization, tokenizer techniques, and the training mechanisms of feedforward propagation and backpropagation. A core section explores the matrix calculation principles in feedforward propagation and emphasizes the complex mathematical derivations of positional encoding and the Self-Attention Mechanism within the Transformer model, including the application of QKV (Query, Key, Value) matrices. The article also touches on the concept of Multi-Head Attention (MHA), but this content is truncated towards the end. Through abundant illustrations and formula derivations, the article provides readers with a comprehensive view of the underlying working principles of LLMs.
This article shares 22 valuable experiences on Large Language Model Supervised Fine-Tuning (SFT) in a highly practical manner. It begins with the definition and purpose of SFT, and then provides a detailed comparison of SFT with Pre-training, RLHF, RAG, Incremental Pre-training, and In-context learning, clearly defining the application scenarios and core characteristics of each. Subsequently, the article delves into the classification of SFT, prerequisites, Base Model selection, training dataset construction (including data format, acquisition methods, quality evaluation, and data volume), and hardware requirements. In addition, the article introduces the SFT training process, parameter adjustment strategies, performance evaluation methods, and points out the possible adverse consequences of SFT and avoidance measures. The content is highly valuable and practical for engineers involved in Large Language Model development and research.