Articles
The article details Tencent BAC team's latest open-source small-sized multimodal large model, TBAC-UniImage-3B. This model achieves state-of-the-art results in the TIIF-Bench test, with text-to-image long instruction understanding and following scores far exceeding similar products. Through extensive scene testing, including magazine illustrations, scientific scenarios, IP design, picture book illustrations, and artistic creation, the article comprehensively demonstrates UniImage-3B's powerful image generation capabilities. In addition, it focuses on testing its precise performance in style transfer, image-text understanding (foreground/background element recognition, detail grasp), and image editing (artistic style, portrait editing), verifying its excellent multimodal image-text understanding capabilities. The article delves into the model's highlights, especially its innovative 'Ladder Side Diffusion Tuning' mechanism, which achieves the unification of 'understanding' and 'generation' in a small-sized model, and explains its advantages in cost-effectiveness and performance. Finally, the article provides a detailed local deployment tutorial, including environment preparation, model download, and inference startup, democratizing access to AI image generation.
This article details Qwen team's latest open-source model, Qwen-Image-Edit, further trained on the 20B Qwen-Image model. It extends Qwen-Image's text rendering to image editing, enabling high-quality text editing. A key feature is its dual semantic and appearance editing, achieved by simultaneously feeding the input image to Qwen2.5-VL (for visual semantics) and VAE Encoder (for visual appearance). The model excels in advanced editing like IP creation that maintains semantic consistency, view transformation, and style transfer. It is also capable of local appearance editing, such as adding, deleting, modifying, and repairing objects. The article showcases its capabilities in original IP creation, MBTI emoji generation, view transformation, virtual avatar generation, object manipulation, text restoration, and poster editing, accompanied by rich examples. Additionally, it provides Python code for model inference and detailed LoRA fine-tuning steps and datasets via DiffSynth-Studio, lowering the barrier for developers to use and customize the model.
This article announces the official release of the DeepSeek-V3.1 Large Language Model (LLM), highlighting three core upgrades: an innovative hybrid reasoning architecture that switches between Thinking and Non-Thinking Modes, significantly improved reasoning efficiency, and enhanced Agent capabilities via Post-Training optimization. The new model demonstrates markedly improved performance in various evaluations, including Programming Agent (SWE, Terminal-Bench) and Search Agent (browsecomp, HLE) tasks. DeepSeek-V3.1 has been simultaneously upgraded on the official App, web interface, and API, with its API supporting 128K Context and strict-mode Function Calling, while also adding support for the Anthropic API format. Furthermore, the V3.1 Base Model and Post-Training model have been open-sourced on Hugging Face and ModelScope platforms. The article also details API price adjustments and resource expansion plans, aimed at lowering the barriers for developers and accelerating the model's integration and innovation across diverse application scenarios.