The article elaborates on the ModelScope Community's exploration and practice in "Image-to-LoRA" technology, leading to the introduction of the Qwen-Image-i2L model series. This model aims to directly generate corresponding LoRA model weights from a single input image, compressing the traditionally time-consuming LoRA training process (several hours) into model forward inference. The article first analyzes the feasibility of this technology and discusses the challenges and solutions encountered in the model architecture design, such as adopting a two-layer fully connected architecture combined with various powerful image encoders. Subsequently, the article details the iteration process of versions such as Qwen-Image-i2L-Style, Qwen-Image-i2L-Coarse, and Qwen-Image-i2L-Fine, as well as each version's performance in style preservation, detail reproduction, and "semantic intrusion" issues. Finally, by introducing Qwen-Image-i2L-Bias and differential training, data distribution bias was mitigated, allowing the model to serve as high-quality initialization weights for LoRA training, significantly accelerating training convergence. The article concludes by looking forward to the immense potential of this technology in accelerating LoRA training.


