Hello and welcome to Issue #60 of BestBlogs.dev AI Highlights.
This week, the practical evolution of open-source models accelerated once again. New releases from DeepSeek and ByteDance introduced innovative features like switchable reasoning modes and native 512K ultra-long context windows. In the developer ecosystem, discussions around context engineering are moving from theory to practice, with JSON prompting and systematic evaluation becoming the cornerstones of building reliable AI applications. On the product front, a growing number of applications that address real-world scenarios are emerging, from universal mobile agents to AI companion hardware, while industry leaders are offering direction for the future of AI-era entrepreneurship and organizational change.
We hope this week's highlights have been insightful. See you next week!
DeepSeek officially released the V3.1 model, highlighting an innovative hybrid reasoning architecture. This allows the model to simultaneously support and freely switch between 'reasoning mode' and 'non-reasoning mode'. Through post-training optimization, the new model demonstrates significantly enhanced performance in programming agent (SWE, Terminal-Bench) and search agent (browsecomp, HLE) tasks. Regarding thinking efficiency, V3.1-Think mode reduces output tokens by 20%-50% while maintaining performance, improving response speed and offering potential cost benefits and resource optimization. The API service has been upgraded synchronously, extending the context to 128K and supporting Function Calling in strict mode, as well as the Anthropic API format. Furthermore, the Base model and post-training model of DeepSeek-V3.1 are now open-sourced on Hugging Face and ModelScope. The article also notes that the API price will be adjusted on September 6, 2025, potentially affecting users' long-term costs and strategies.
ByteDance released its 36 billion parameter Large Language Model Seed-OSS-36B, under the Apache-2.0 license, making it freely available for academic and commercial use. The model's most notable feature is its native support for a 512K ultra-long context, 4x the length of mainstream models, built during the pre-training stage rather than through post-interpolation. Seed-OSS also incorporates a unique 'Thinking Budget' mechanism, allowing users to manage the model's reasoning depth by adjusting the token count. The model architecture is robust, leveraging technologies like RoPE and GQA. In benchmark tests such as MMLU-Pro, BBH, GSM8K, MATH, and HumanEval, Seed-OSS-36B demonstrates excellent knowledge understanding, reasoning, and coding abilities, setting a new open-source model record in BBH reasoning. The article further highlights ByteDance's Seed team's other open-source initiatives, including Seed-Coder, BAGEL, and Seed Diffusion, showcasing its capabilities in foundation models and AI infrastructure. Seed-OSS's open-source release strengthens the Chinese Large Language Model ecosystem.
This article details Qwen team's latest open-source model, Qwen-Image-Edit, further trained on the 20B Qwen-Image model. It extends Qwen-Image's text rendering to image editing, enabling high-quality text editing. A key feature is its dual semantic and appearance editing, achieved by simultaneously feeding the input image to Qwen2.5-VL (for visual semantics) and VAE Encoder (for visual appearance). The model excels in advanced editing like IP creation that maintains semantic consistency, view transformation, and style transfer. It is also capable of local appearance editing, such as adding, deleting, modifying, and repairing objects. The article showcases its capabilities in original IP creation, MBTI emoji generation, view transformation, virtual avatar generation, object manipulation, text restoration, and poster editing, accompanied by rich examples. Additionally, it provides Python code for model inference and detailed LoRA fine-tuning steps and datasets via DiffSynth-Studio, lowering the barrier for developers to use and customize the model.
The article uses a classic PyTorch Handwritten Digit Recognition code example as an introduction to systematically analyze the five core steps of Deep Neural Network (DNN) training. It begins by explaining the key concepts and roles of Linear Transformation and Non-linear Activation Functions (such as ReLU). It then discusses Dropout for addressing Overfitting, Normalization (BatchNorm, LayerNorm) for stabilizing training, and Residual Connection for resolving degradation issues. It then elaborates on the application of Loss Functions (such as Cross-Entropy, Mean Squared Error) and Regularization (L1, L2) in measuring model error and preventing Overfitting. Subsequently, it delves into the mathematical principles of Backpropagation (Chain Rule, Gradient) and PyTorch's Autograd Mechanism. The article further introduces the Gradient Descent Optimization Algorithm and its limitations, leading to improved optimizers like Adam, while briefly mentioning the Vanishing Gradient/Exploding Gradient problem. Finally, it summarizes the iterative process of Iterative Training (Epoch, Batch). The article offers an accessible yet comprehensive guide, enriched with code examples and illustrations, to help readers fully grasp the DNN training process.
The article provides a detailed interpretation of the gpt-oss-20b and gpt-oss-120b open-weight models released by OpenAI, tracing their architectural evolution since GPT-2. Key changes include removing Dropout, adopting Rotary Position Embedding (RoPE), using Swish/SwiGLU activation functions, introducing Mixture of Experts (MoE), Grouped-Query Attention (GQA), and Sliding Window Attention, and replacing with RMSNorm normalization. The article also deeply compares the design differences between gpt-oss and Qwen3, a leading open model. Key differences include model width and depth, expert configuration, attention bias, and sinks.
The article provides an in-depth interpretation of Google DeepMind Chief Scientist Denny Zhou's authoritative views on the reasoning capabilities of Large Language Models in the Stanford University CS25 course. He proposed that the key to LLM reasoning lies in generating a series of intermediate tokens, rather than simply expanding the model size, a mechanism that enables Transformer models to become extremely powerful. The article elaborates on how pre-trained models already possess reasoning abilities but need to be effectively stimulated and presented through CoT decoding, Prompt Engineering techniques (such as CoT), Supervised Fine-Tuning (SFT), and the currently most powerful Reinforcement Learning from Human Feedback (RLHF). Denny Zhou particularly emphasized the potential of RLHF to achieve model self-improvement through machine-generated data and pointed out that aggregating multiple responses (Self-Consistency) and incorporating retrieval mechanisms can significantly enhance the reasoning ability of LLMs. Finally, he advocated for AI research to prioritize building real-world applications over excelling in isolated benchmark tests, highlighting the scalable nature of learning as fundamental to AI advancement.
The article, an interview with Jeff Huber, CEO of Chroma, introduces the provocative idea that 'RAG is dead' and 'Context Engineering is King.' Huber posits that as AI workloads evolve from simple chatbots to complex agents and context windows expand, a more sophisticated approach to managing and utilizing context is crucial. He emphasizes moving beyond the 'alchemy' of demo-to-production AI development to a more engineering-driven process. The discussion delves into the intricacies of modern search infrastructure for AI, differentiating it from classic search systems based on tools, workload, developer, and consumer. Huber provides five practical retrieval tips and outlines detailed ingest and query pipelines, including hybrid recall, re-ranking, and respecting 'context rot.' He also touches on Chroma's journey, its focus on developer experience, and the importance of a strong company culture in a competitive AI market. The core message revolves around the necessity of disciplined, structured context management for building reliable and performant AI applications.
The article, based on compiler principles, profoundly elucidates the evolutionary path in AI programming (or AI system development) from Prompt Engineering to Context Engineering, and then to Anthropic's Think Tool. The author first reviews the necessity of language formalization and introduces the Chomsky Hierarchy as a yardstick for measuring the degree of language formalization, pointing out the trade-off between expressive power and predictability, and drawing a parallel to the challenges currently faced by AI engineers. Next, the article provides a detailed analysis of the informal weaknesses of Prompt Engineering and how Context Engineering enhances system reliability by leveraging structured context. Finally, it focuses on how Think Tool achieves verifiability and policy adherence through explicit reasoning, surpassing the traditional Chain-of-Thought (CoT) Paradigm, indicating that AI programming will move towards more rigorous formalization and verifiability, just as the correctness of a compiler can be proven, which is crucial for deploying autonomous agents in high-risk, mission-critical domains.
The article delves into the core role and significant advantages of JSON prompts in AI interaction. The author first introduces the basic concepts of JSON prompts and compares them with traditional text prompts, emphasizing the significant superiority of JSON structured input in terms of clarity, consistency, and thoroughness. Next, the article explains the scientific basis of AI's sensitivity to structured data from the perspective of AI model training, pointing out that JSON prompts can effectively reduce ambiguity and cognitive load, enhancing AI performance. The article also reviews the evolution of JSON prompts, from simple instructions to large-scale enterprise applications. It showcases their practical impact on content generation, marketing automation, and customer service through case studies. These include improved accuracy, consistent scaling, seamless system integration, and reduced error rates. Ultimately, the article emphasizes that JSON prompts have become a key technology for building reliable AI systems, providing enterprises with an important competitive advantage.
The article explores the importance of Evaluation (Evals) in AI product development, highlighting its criticality compared to model training in the second half of AI products. It likens Evals to a 'driving test' for AI systems, detailing three methods: manual Evals, code-based Evals, and LLM-based Evals, emphasizing the scalability of the 'LLM-as-judge' model. The article also provides an iterative process for building Evals, including data collection, initial evaluation, iterative optimization, and production environment monitoring, and lists common evaluation criteria such as hallucination, toxicity/tone, and overall correctness. Finally, the article gives common mistakes to avoid in Evals design and specific steps to get started quickly, emphasizing that Evals are key to ensuring AI systems continue to create value.
This article provides a detailed introduction to building next-generation intelligent programming assistants based on Large Language Models (LLMs). It begins by reviewing the evolution of code intelligence, from traditional autocompletion to the Agent-based approach, highlighting the significant potential of LLMs in enhancing development efficiency, reducing memory burden, and bridging knowledge gaps. Subsequently, the article elaborates on the technical architecture of Agents, including the user interface, core functionalities (plan execution, tool invocation), and foundational capabilities (Code Knowledge Graph, LLM Adapter). It focuses on Prompt structure design, context-aware mechanisms (such as the construction and consumption of Code Knowledge Graphs, model side effects, and user operation information tracking), and memory management strategies in multi-turn dialogues (truncation, compression summarization, and engineering trade-offs). To address cost issues, the article also introduces the practice of Prompt caching. Furthermore, through practical examples like developing the Snake game, adding features, and fixing bugs, the article vividly demonstrates the powerful capabilities of Agent deep integration with IDEs. Finally, the article summarizes the engineering challenges of model uncertainty, service stability, and Prompt debugging, and envisions future development directions such as cognitive enhancement, tool integration, collective intelligence and multi-Agent collaboration, and autonomy improvement.
The article explores Claude Code, an AI-assisted programming command-line tool launched by Anthropic, which combines the powerful Claude AI model with the terminal environment familiar to developers, greatly enhancing development convenience. The article details the five core advantages of Claude Code: native terminal integration, custom slash commands, Sub-Agents multi-role collaboration, powerful project control and personalized configuration, and SDK and system integration. For users in China, the article provides two practical solutions: using the Kimi platform compatible with the Claude API, or building an open-source claude-code-proxy project to connect to the OpenAI-compatible API. In addition, the article also explains in detail the advanced features of Claude Code, such as permission configuration, memory management, custom slash commands (short commands starting with '/'), the creation and application of Subagents, Hooks event mechanism, and MCP tool integration, and provides rich configuration examples and security warnings, providing developers with a comprehensive and practical guide.
The article provides an in-depth review of the mysterious AI drawing model Nano Banana, which has not yet been officially released. The model currently appears randomly in LMArena blind tests but is widely considered a Google product by the author and the community. Its core highlight is its impressive character consistency, which can accurately preserve the facial features and expressions of the reference image, far surpassing existing mainstream models such as GPT-4o, Flux, and Seedream. Through multiple real-world examples, including single-subject action transfer, multi-subject character replacement, background replacement, subject-background combination, character emotion expression, detail modification, and style transfer, the article compares Nano Banana's performance with other models in detail. The results show that Nano Banana outperformed other models in most tests. The author emphasizes Nano Banana's practical value in generating video covers and other scenarios that require a high degree of character consistency and provides instructions on how to experience the model on LMArena. The article concludes that Nano Banana demonstrates superior character consistency in AI drawing, highlighting Google's leading position in the AI field.
The article details Zhipu's latest release of the world's first mobile general Agent - AutoGLM. Its core innovation lies in adopting a cloud execution model, providing users with a 'cloud phone' or 'cloud computer' environment, thereby solving the computational power limitations and resource occupation issues of traditional local Agents, and achieving cross-application automated processing of complex tasks, such as ordering takeout, comparing prices across multiple platforms, and generating reports and PPTs. This product is based on the fully Domestic GLM-4.5 and GLM-4.5V models, is open to the public for free, and provides API support for the developer ecosystem. AutoGLM is a key step for Zhipu towards AGI (L3 Autonomous Learning Agent). It also aligns with the industry trend of Agent 'cloud execution,' indicating that AI Agents will evolve from 'telling you how to do it' to 'directly doing it for you.' This greatly enhances the practicality and user experience of AI.
Through an in-depth interview with Sun Zhaozhi, founder of Luobo Intelligence, the article explores the design philosophy, market positioning, and commercialization strategies of the AI companion hardware product 'Fu Zai'. Sun Zhaozhi shifted from embodied AI to AI companion, emphasizing that focusing on emotional value based on real user needs is key in the context of AI hardware exploration setbacks. As a 399 RMB AI nurturing trendy toy, Fu Zai aims to become Generation Z's 'digital pet' and alleviate their loneliness through its plush appearance, blinking screen, touch and voice interaction, and 'shared memories' system. The article highlights the 'subtraction' philosophy in product design, the importance of appearance, and the role of Large Language Models (such as DeepSeek) in driving the growth of AI companion products. It also elaborates on how AI simulates a sense of life through intent understanding, emotion extraction, personality development, and the 'Echo Chain' memory system, proposing that 'AI companion' will become an important discrete market.
The article, through an interview with YouMind founder Yubo, deeply analyzes the positioning, core concepts, and future vision of his AI creation tool, YouMind. YouMind is defined as an AI tool designed to provide creators with efficient research and writing services. Its core concept shifts from traditional 'Knowledge Management' to 'Project-Based Creation,' emphasizing high-quality deliverables. The article elaborates on how YouMind empowers professional creators and enthusiasts through in-depth research, high editability, and user control, achieving an end-to-end AIGC workflow of 'Everything to Draft, Draft to Everything'. Yubo proposes the unique perspective of clipping as a form of AI preference signaling, pointing out that user clipping behavior provides AI with valuable personalized preference data, enabling AI tools to understand and respond to user needs more accurately. Additionally, the interview shares Yubo's entrepreneurial rhythm of 'Fast but Not Hasty' and the entrepreneurial principle of 'Context is Everything,' emphasizing the importance of self-awareness and situational judgment in a rapidly changing era. Finally, the article envisions YouMind becoming the 'GitHub for Creators,' aiming to stimulate creative motivation and further lower the barrier to creation through community, building a positive creative ecosystem.
This article features an in-depth interview with Perplexity co-founder and CEO Aravind Srinivas, primarily discussing the positioning and future of Comet, their Agent Browser. Aravind proposes that Comet aims to become an AI Operating System capable of automating repetitive tasks by deeply integrating 'intelligence' and 'context,' emphasizing the browser as the ultimate carrier for acquiring a holistic understanding of users' work and life. He believes this is key to the success of AI Agents. Perplexity is taking a 'disruptor' approach by launching products early to pioneer the 'Agent Browser' category. They believe a subscription model can support a business worth hundreds of billions of dollars. Additionally, Aravind elucidates his views on AI Hardware, arguing that mobile browsers are more critical because they can acquire context in a safer, more user-friendly manner. The article also mentions Perplexity's infrastructure construction, business model considerations, and unique distribution strategy in competition with Google, and envisions the future of AI Agents as the 'autopilot' of the digital workforce.
This article details the launch of the Google Pixel 10 series, emphasizing its centerpiece: the inaugural, fully custom-designed Tensor G5 chip. Fabricated using TSMC's 3nm process, this chip significantly enhances CPU and TPU performance, establishing a robust hardware foundation for the Gemini on-device AI experience. The article further explores Gemini's innovative features, including 'Magic Prompt,' 'Camera Coach,' and 'Best Take,' showcasing the evolution of the smartphone from a passive tool to a proactive assistant. Additionally, it covers hardware enhancements in the Pixel 10 series such as advancements in the imaging system, eSIM implementation, and the Pixelsnap magnetic ecosystem. The article also highlights the Pixel 10 Pro Fold's durability as the first IP68-rated foldable phone and the integration of Gemini-powered personal health coaching and intelligent assistant capabilities within the concurrently released Pixel Watch 4 and Pixel Buds 2a. The author concludes that the Pixel 10 series represents Google's coherent and well-executed response in the AI-driven smartphone market, underscoring the pivotal role of deep vertical integration between software and hardware in realizing genuinely intelligent functionality.
The article provides an in-depth overview of Andrew Ng's latest thoughts on the current AI wave. He first clarifies the definition of "Agentic AI". He argues that the biggest obstacle to its realization is not technology, but the lack of talent and processes for rigorous system iteration. Andrew Ng emphasizes that AI-assisted coding is significantly improving development efficiency, shifting the core bottleneck of startups from engineering implementation to product decision-making. This requires founders to have stronger user empathy and technical intuition to make rapid product judgments. He further points out that in the rapidly evolving AI era, "technology-oriented product leaders" who master generative AI technology will be more likely to succeed than those with a purely business orientation. Finally, Andrew Ng predicts that the future belongs to "small and lean" teams empowered by top talent and powerful AI tools. This efficient organizational model will reshape talent recruitment and the nature of work, giving individuals unprecedented power.
The article revolves around the controversy surrounding the release of OpenAI's latest model, GPT-5, pointing out its excellent performance in enterprise-level complex tasks (such as coding and long-form reasoning), despite limited perceived gains in consumer applications due to task saturation. In an interview, OpenAI co-founder Greg Brockman elaborated on the company's evolution from 'next token prediction' to 'reasoning paradigm,' highlighting reinforcement learning's role in enhancing reliability and generalization. He pointed out that computing power is an eternal bottleneck for AI development, but model costs have decreased dramatically, and he envisions AI models leaving the 'ivory tower' to become human intellectual partners. The article also discusses agent robustness and the profound impact of AI on software engineering and the entire socio-economic landscape.
This article provides a detailed interpretation of Bessemer Venture Partners' annual report, 'The State of AI 2025.' The report first analyzes the two current AI startup models: 'Supernova' and 'Meteor,' and updates growth benchmarks for startups in the AI era. It also points out challenges such as deceptive growth indicators, fierce competition, and the unpredictable nature of the industry. Next, the article delves into the evolution roadmap of AI in five major directions: infrastructure (such as the 'second chapter' of AI infrastructure), developer platforms (such as the Model Context Protocol MCP), enterprise applications, vertical fields, and consumer applications. It particularly emphasizes the importance of 'memory' and 'context' in building competitive advantages for AI applications. Finally, the report proposes five key predictions, including AI browser competition, the popularization of generative video, the necessity of evaluation and data traceability for development, the rise of AI-native social media, and industry mergers and acquisitions. The article provides AI professionals with in-depth insights into future development trends and entrepreneurial opportunities.
This issue of the 'Global LLM Quarterly Report' focuses on two key keywords in the current AI LLM field: divergence and product. First, the podcast analyzes how leading model companies (such as OpenAI and Google) are developing towards general capabilities, while Anthropic, Thinking Machines Lab, and others are choosing to differentiate deeply in specific fields such as coding, Agent technology, and multimodal interaction. Second, the program emphasizes the importance of products in the AI era, pointing out that the past model of over-focusing on intelligent exploration is shifting towards an emphasis on productization and user experience. Guests believe that the key to successful AI products lies in providing L4-level 'wow moment' experiences, such as ChatGPT's Deep Research and Claude Code, which can effectively transform model dividends into brand and commercial value, building non-technical barriers. Facing the all-in-one package (ๅ จๅฎถๆกถ) strategy and vertical integration (ๅ็ดๆดๅ) of leading companies (such as OpenAI and Google), AI startups face huge challenges and need to find unconventional opportunities, deeply cultivate vertical fields or innovative product forms to avoid head-on competition. Finally, the podcast also discusses AI investment strategies, pointing out that technology is changing rapidly, the value of leading companies is converging, and investors need to support the most promising entrepreneurs. It also shares an optimistic outlook on Chinese AI entrepreneurs, as well as views on the AGI bubble and the future trend of technological integration, such as the integration of search, short video and social functions.
This podcast features a four-hour interview between Luo Yonghao and Li Xiang, the founder of Li Auto. Li Xiang shares for the first time his story of growing up in the countryside, how his family instilled optimism and self-discipline, and how he achieved financial independence in high school by writing, assembling computers, and building websites, thus beginning his entrepreneurial journey. He details his experiences from PCPOP and Autohome to Li Auto, including navigating the Internet bubble, cash flow challenges, production bottlenecks, and online smear campaigns, demonstrating his resilience and problem-solving skills. The interview explores Li Auto's strategy of using extended-range technology, building a core team, managing supply chain challenges, and product design and user positioning. Additionally, Li Xiang discusses his views on the future of artificial intelligence and how family values shape his entrepreneurial and product thinking. The program is not just Li Xiang's personal story but also offers profound and unconventional insights on business models, talent management, learning and iteration, and public relations strategies, providing valuable insights for tech professionals, entrepreneurs, and managers.
As a monthly tech observation report, this article comprehensively reviews the latest developments in global AI for July 2025. The 'Trend Observation' section emphasizes that Chinese LLMs like K2, GLM-4.5, and others have surpassed leading international counterparts in programming, AI Agents, and multi-modal capabilities. Released largely as open-source, these models leverage the open-source ecosystem and cost-effectiveness, solidifying China's central position in the AI competition, suggesting that China and the US are now on par in the language model arena. Simultaneously, the article notes the evolution of image, video, and audio fields towards 'generation-by-understanding,' with 3D generation technology overcoming single-object limitations to enable the creation of combinable parts and complete scenes. AI Coding is advancing towards L4 full automation, while vertical AI Agent applications in finance and imaging are rapidly expanding. The increasing number of mergers and acquisitions suggests a shift in the AI landscape. The industry is transitioning from a period of emerging players (akin to the Spring and Autumn period) to one of intense competition and consolidation (similar to the Warring States period). The 'Time Machine' section meticulously lists key events of the month, including model open-sourcing, application releases, financing, and M&A activities, highlighting the active involvement of Chinese tech giants like Zhipu, Alibaba, and Moonshot AI in open-source AI, alongside updates from international firms such as Hugging Face, Google, and OpenAI, providing readers with a holistic industry overview.