Featured Newsletter

BestBlogs Issue #84: Orchestration

Hey there! Welcome to BestBlogs.dev Issue #84.

Happy Chinese New Year! We took a two-week break for the holiday, so this issue is packed with extra content—take your time with it.

The most significant shift over the past two weeks isn't any single model topping a new benchmark. It's the accelerating transformation of the engineer's role—from writing code to orchestrating AI agents that write code . Boris Cherny, creator of Claude Code, says the programming problem has largely been solved. Engineers at OpenAI are already managing 10 to 20 agents simultaneously on hour-long tasks. Anthropic's trend report calls it a systematic shift from humans writing code to humans orchestrating agents. Meanwhile, Claude Sonnet 4.6, Gemini 3.1 Pro, GLM-5, and MiniMax M2.5 all dropped within weeks of each other. The stronger the models get, the more valuable orchestration and judgment become.

On my end, I've been deep in building BestBlogs 2.0's core features—orchestrating multiple AI coding tools and agents through Spec documents for requirement discussions, architecture design, demo development, and interaction reviews. Almost no hand-written code involved. Aiming for a late March launch, and I'll share more details then.

Here are 10 highlights worth your attention this week:

🏆 The model arms race is heating up fast. Claude Sonnet 4.6 brings a million-token context window and upgraded agentic capabilities, outperforming the previous flagship Opus 4.5 in 59% of real-world tests—at the same price as Sonnet 4.5. Gemini 3.1 Pro jumped from 31% to 77% on ARC-AGI-2 reasoning benchmarks, introduced three-level thinking modes for flexible compute allocation, and costs less than half of Claude Opus 4.6. More capability at the same price is the new normal.

🤖 GLM-5 and MiniMax M2.5 tackle the same question from different angles: how to make agents actually work in production. GLM-5 is designed around agent engineering from the ground up, achieving state-of-the-art open-source performance through asynchronous RL and sparse attention. MiniMax M2.5 pushes continuous agent operation costs below $1 per hour, making unconstrained complex agent deployment a practical reality.

🎨 Seedance 2.0 and Nano Banana 2 push boundaries in video and image generation respectively. Seedance 2.0 goes beyond generating visuals—it understands directorial thinking, autonomously handling storyboard design and emotional pacing. Nano Banana 2 slashes API pricing significantly, and while hands-on testing shows results aren't quite as impressive as the marketing suggests, it genuinely makes high-quality image generation accessible to everyone.

🛠️ Two interviews with Boris Cherny, creator of Claude Code, are the must-reads of this issue. He traces Claude Code's journey from a two-upvote internal project to powering 4% of GitHub commits. The core philosophy: build for the model six months from now, not today's model. He hasn't written a single line of code since Opus 4.5, and believes the next frontier is AI evolving from executor to a colleague that proactively suggests ideas.

⚡ OpenAI's engineering lead Sherwin Wu reveals how AI tools are reshaping engineering teams: 95% of engineers use Codex daily, PR output gaps between high and low performers reach 70%, and engineers who can manage 10 to 20 agents simultaneously are pulling far ahead. He also candidly notes that many enterprise AI deployments have negative ROI, and that the second and third-order effects of one-person billion-dollar companies are severely underestimated.

📁 The next frontier in LLM engineering is shifting from parameter tuning to memory. An InfoQ talk systematically covers memory layering, proactive scheduling, and mind-map-style information organization. The key insight: instead of reactively handling retrieval at query time, front-load memory management during interaction gaps so relevant memories are ready before the query arrives. Datawhale's breakdown of Skill design reveals a critical dividing line: lock down fragile operations with scripts, guide creative tasks with natural language.

💡 Vibe Coding is moving from concept to large-scale production. Alibaba's internal practice exposes real challenges—code quality consistency, debugging efficiency, and security vulnerabilities—while offering battle-tested solutions like templatizing successful paths and abstracting agents as reusable tools. Meanwhile, a product manager with no coding background built a personal AI agent on their own server in one afternoon using Claude Code, proving that product sense is scarcer than coding ability.

🧩 Anthropic's agentic coding trend report maps out a systematic transformation across eight dimensions: multi-agent collaboration, long-running autonomous tasks, and programming democratization among them. The core thesis: AI amplifies the judgment engineers already have rather than replacing it. System design, task decomposition, and quality assurance—the old fundamentals—are worth more than ever in the agent era.

🔬 Google's Chief AI Scientist Jeff Dean walks through the full arc from loading Google's entire index into memory in 2001 to TPU co-design, offering two key predictions: personalized models that attend to all of a user's data, and specialized hardware enabling ultra-low latency that will fundamentally reshape human-AI collaboration.

👨‍💼 The debate over whether AI will end software engineering continues. UML creator Grady Booch pushes back on Dario's claims, pointing out that software engineering has survived multiple existential crises—each time emerging into a new golden age. Naval offers a different angle: agency is humanity's real moat against AI replacement, because AI has no desires, no survival pressure, and can't make autonomous decisions in truly unknown territory. The only way to overcome AI anxiety is to open the hood, understand it, and then act.

Hope this issue sparks some new ideas. Stay curious, and see you next week!

1Claude's Strongest Sonnet Model 4.6 is Here, Featuring a Million-Token Context Window
2Google to Reclaim the Throne? Gemini 3.1 Pro Reasoning Scores Double, Hallucination Rates Decline, Prices Unchanged
3Just Released: Nano Banana 2! Affordable and Powerful—Here Are the Details After My Hands-on Experience
4GLM-5 Technical Report: Full Disclosure of Technical Details
5MiniMax M2.5 Released: $1/Hour, the King of Real-World Work
6China Now Has a World-Leading Model, and Its Name Is Seedance 2.0.
7Boris Cherny: How We Built Claude Code
8How to Write Great Skills? Deconstructing the Design Behind skill-creator!
9OpenAI Frontline Development Observations: Those Who Can Manage 10-20 Agents Simultaneously and Run Hour-Long Tasks Are Leaving Other Engineers Far Behind
10From Context to Long-Term Memory: Architectural Design and Practice of LLM Memory Engineering
11Practices and Reflections on Vibe Coding in Code Generation and Collaboration
12From Clawdbot to the 2026 AI Coding Explosion | A Conversation with PingCAP CTO Dongxu
13OpenAI Applied CTO and Codex Lead: AI is Reshaping How Software is Built
14The Era of Vibe Coding: Why 'Product Sense' is Scarcer Than 'Coding Skills'?
15The Great Shift in Programming 2026: Anthropic Report Reveals Eight Major Trends in Agentic Programming
16Owning the AI Pareto Frontier — Jeff Dean
17Head of Claude Code: What happens after coding is solved | Boris Cherny
18#434. Survival Rules in the AI Era: Naval on Vibe Coding, Personal Leverage, and the Future of Creativity
19Father of UML: Dario is Dead Wrong and Doesn't Understand Software Engineering! Software Engineering Will Not Die! Software Has Entered Its Third Golden Age! Industry Response: With AI, SaaS Will Only Prosper Further!
20150: Year-End AI Review: From Models to Applications, Technology to Business Wars, Grasping the Thread of Meaning in the Torrent

Claude's Strongest Sonnet Model 4.6 is Here, Featuring a Million-Token Context Window

机器之心

mp.weixin.qq.com

02-18

1537 words · 7 min

Claude's Strongest Sonnet Model 4.6 is Here, Featuring a Million-Token Context Window

Anthropic released Claude Sonnet 4.6 during the Spring Festival, with broad upgrades across coding, computer use, long-context reasoning, and agentic planning, along with a 1-million-token context window. Notably, it outperforms the previous flagship Opus 4.5 in user preference tests 59% of the time, while keeping the same price point as Sonnet 4.5 — making frontier-level agentic capability genuinely accessible at scale.

Google to Reclaim the Throne? Gemini 3.1 Pro Reasoning Scores Double, Hallucination Rates Decline, Prices Unchanged

腾讯科技

mp.weixin.qq.com

02-20

3322 words · 14 min

Google to Reclaim the Throne? Gemini 3.1 Pro Reasoning Scores Double, Hallucination Rates Decline, Prices Unchanged

Google's Gemini 3.1 Pro jumps from 31% to 77% on the ARC-AGI-2 reasoning benchmark and introduces a three-tier thinking mode that lets developers tune compute intensity per task without juggling multiple models. What makes this more than a benchmark story is the price: performance surged while costs held steady, with API spend coming in at under half of Claude Opus 4.6 — a calculated counterattack that put Google back at the top of the intelligence index rankings.

Just Released: Nano Banana 2! Affordable and Powerful—Here Are the Details After My Hands-on Experience

爱范儿

ifanr.com

02-27

3031 words · 13 min

Just Released: Nano Banana 2! Affordable and Powerful—Here Are the Details After My Hands-on Experience

This hands-on review tests Nano Banana 2 across Chinese text rendering, complex UI generation, manga storyboarding, and subject consistency, offering a more grounded take than the official blog: speed and quality improvements are less dramatic than advertised, and some edge cases actually regress from the previous model. But with API pricing cut in half and a genuinely lower barrier to entry, the model makes high-quality image generation practically accessible in a way its predecessors didn't quite manage.

GLM-5 Technical Report: Full Disclosure of Technical Details

智谱

mp.weixin.qq.com

02-22

24347 words · 98 min

GLM-5 Technical Report: Full Disclosure of Technical Details

GLM-5, the latest open-source flagship from Zhipu AI, is built around agentic engineering as a first-class design goal, combining an asynchronous RL framework with sparse attention to dramatically cut inference costs while boosting performance. It achieves state-of-the-art results among open models on benchmarks like SWE-bench and BrowseComp, and demonstrates end-to-end software engineering capabilities that rival top closed-source systems in real-world tasks.

MiniMax M2.5 Released: $1/Hour, the King of Real-World Work

MiniMax 稀宇科技

mp.weixin.qq.com

02-12

3205 words · 13 min

MiniMax M2.5 Released: $1/Hour, the King of Real-World Work

MiniMax M2.5 reaches top-tier performance across coding, tool use, and office productivity, but the more significant story is two structural bets: first, baking complex task decomposition directly into the model through a native Agent RL framework, achieving 37% faster completion while reducing token consumption; second, driving continuous Agent operation costs below one dollar per hour, turning the vision of economically unconstrained Agent deployment from aspiration into practical reality.

China Now Has a World-Leading Model, and Its Name Is Seedance 2.0.

数字生命卡兹克

mp.weixin.qq.com

02-11

5162 words · 21 min

China Now Has a World-Leading Model, and Its Name Is Seedance 2.0.

Seedance 2.0 represents a genuine leap in AI video generation — it doesn't just produce footage, it understands cinematic thinking, autonomously handling shot composition, emotional pacing, and scene transitions. The author walks through a range of creative applications, from dramatic short films and fan recreations to product ads and real-world video editing, while candidly reflecting on what it feels like to watch 18 months of hard-built workflows become obsolete almost overnight.

Boris Cherny: How We Built Claude Code

Y Combinator

youtube.com

02-17

9395 words · 38 min

Claude Code creator Boris Cherny traces the full arc from accidental prototype to paradigm-shifting tool, anchored by one core philosophy: build for the model six months from now, not today. He shares that he no longer writes a single line of code by hand since Opus 4.5, and digs into multi-agent system design, the right way to use ClaudeMD, and why the most valuable skill for engineers in a rapidly evolving model landscape is no longer technical expertise — it's beginner's mind and first-principles thinking.

How to Write Great Skills? Deconstructing the Design Behind skill-creator!

Datawhale

mp.weixin.qq.com

02-22

9159 words · 37 min

How to Write Great Skills? Deconstructing the Design Behind skill-creator!

Writing an AI Skill is fundamentally different from writing documentation for humans. This article uses skill-creator as a case study to reveal the core principle: every line must earn its place in the context window. Defining what the AI should not do proves more precise than describing what it should, and the key to quality lies in knowing when to lock down behavior with scripts versus when to guide with natural language.

OpenAI Frontline Development Observations: Those Who Can Manage 10-20 Agents Simultaneously and Run Hour-Long Tasks Are Leaving Other Engineers Far Behind

InfoQ 中文

mp.weixin.qq.com

02-18

21952 words · 88 min

OpenAI Frontline Development Observations: Those Who Can Manage 10-20 Agents Simultaneously and Run Hour-Long Tasks Are Leaving Other Engineers Far Behind

Sherwin Wu, OpenAI's engineering lead, offers a rare inside look at how AI tools are reshaping software teams: 95% of engineers use Codex daily, top performers ship 70% more PRs, and the engineer's role is shifting from writing code to orchestrating AI agents. He also breaks down why so many enterprise AI deployments fail to deliver ROI, and explores the underappreciated second and third-order effects of the emerging one-person billion-dollar company era.

From Context to Long-Term Memory: Architectural Design and Practice of LLM Memory Engineering

InfoQ 中文

mp.weixin.qq.com

02-25

9347 words · 38 min

From Context to Long-Term Memory: Architectural Design and Practice of LLM Memory Engineering

As LLM scaling yields diminishing returns, memory engineering is emerging as the next core infrastructure challenge. This talk systematically covers three key mechanisms — hierarchical memory modeling, proactive scheduling, and mind-map-style information organization — with a central insight that cuts through current RAG limitations: instead of reactive retrieval that blocks inference, shift memory preparation into the idle gaps between user interactions, so the right context is ready before the query arrives.

Practices and Reflections on Vibe Coding in Code Generation and Collaboration

InfoQ 中文

mp.weixin.qq.com

02-10

8682 words · 35 min

Practices and Reflections on Vibe Coding in Code Generation and Collaboration

Real-world deployment of Vibe Coding tools at Alibaba's scale surfaces challenges that benchmarks rarely capture: inconsistent code quality, longer debugging cycles, security vulnerabilities in AI-generated code, and runaway token costs. The team's practical responses — templating successful task patterns to lift completion rates, replacing closed-source models with domestic alternatives, and treating Agents themselves as composable tools — offer a grounded blueprint for anyone building or evaluating the next generation of AI coding products.

From Clawdbot to the 2026 AI Coding Explosion | A Conversation with PingCAP CTO Dongxu

42章经

xiaoyuzhoufm.com

02-07

1121 words · 5 min

From Clawdbot to the 2026 AI Coding Explosion | A Conversation with PingCAP CTO Dongxu

PingCAP CTO Huang Dongxu draws on hands-on experience to trace AI Coding's leap from assistant tool to autonomous agent, arguing that everything is converging toward a "Coding Agent" paradigm. He explains why context engineering has become the key performance differentiator, introduces his concept of "Box" isolated environments for safer multi-agent collaboration, and reflects on how engineers can stay relevant as the technical barrier to coding disappears.

OpenAI Applied CTO and Codex Lead: AI is Reshaping How Software is Built

宝玉的分享

baoyu.io

02-21

5968 words · 24 min

OpenAI Applied CTO and Codex Lead: AI is Reshaping How Software is Built

OpenAI's CTO of Applications and Codex engineering lead offer a rare inside look at how AI coding has evolved from assistance to true delegation inside OpenAI — engineers now shut their laptops, go to meetings, and return to find the work done. What's equally striking is the chain of shifting bottlenecks: solving code generation surfaces code review, then deployment, then understanding user needs. The emerging competitive edge for engineers is no longer writing code, but product intuition and the ability to move fluidly across layers of abstraction.

The Era of Vibe Coding: Why 'Product Sense' is Scarcer Than 'Coding Skills'?

少数派

mp.weixin.qq.com

02-20

12073 words · 49 min

The Era of Vibe Coding: Why 'Product Sense' is Scarcer Than 'Coding Skills'?

A product manager with no coding background built a fully functional personal AI Agent in a single afternoon using Claude Code. The article shares six practical Vibe Coding techniques, with a central insight that cuts through the hype: AI has eliminated the technical barrier to building, but the ability to define what to build and why remains irreplaceable — and for non-technical builders, that product intuition is now their greatest competitive advantage.

The Great Shift in Programming 2026: Anthropic Report Reveals Eight Major Trends in Agentic Programming

宝玉的分享

baoyu.io

02-10

4960 words · 20 min

The Great Shift in Programming 2026: Anthropic Report Reveals Eight Major Trends in Agentic Programming

Anthropic's 2026 Agentic Coding Trends Report maps out a fundamental shift in software development — from writing code to orchestrating agents that write code. Spanning eight trends from multi-agent collaboration to programming democratization, the report's sharpest insight is that AI amplifies existing engineering judgment rather than replacing it. Skills like system design, task decomposition, and quality evaluation become more valuable as implementation gets delegated to agents.

Owning the AI Pareto Frontier — Jeff Dean

Latent Space

latent.space

02-12

15864 words · 64 min

Owning the AI Pareto Frontier — Jeff Dean

Google Chief AI Scientist Jeff Dean traces the full arc of AI infrastructure evolution through his own experience: from the 2001 decision to load Google's entire search index into memory, to the co-design philosophy behind TPUs, the logic of distillation and sparse models, and his two forward-looking bets — personalized models that can attend to everything you've ever seen, and specialized hardware driving ultra-low latency that will fundamentally reshape how humans and AI systems collaborate.

Head of Claude Code: What happens after coding is solved | Boris Cherny

Lenny's Podcast

youtube.com

02-19

19102 words · 77 min

Head of Claude Code: What happens after coding is solved | Boris Cherny

Claude Code's lead Boris Cherny traces the product's journey from an internal post that got two likes to driving 4% of GitHub's global code commits. His sharpest take: coding as a problem is largely solved, and the next frontier is AI that proactively surfaces ideas rather than just executing them. In this transition, the rarest skill will not be writing code but the generalist capacity to define what is worth building across multiple domains.

#434. Survival Rules in the AI Era: Naval on Vibe Coding, Personal Leverage, and the Future of Creativity

跨国串门儿计划

xiaoyuzhoufm.com

02-24

1642 words · 7 min

#434. Survival Rules in the AI Era: Naval on Vibe Coding, Personal Leverage, and the Future of Creativity

Naval reframes AI-era competitiveness through several sharp lenses: vibe coding turns taste and judgment directly into productivity, but only those who understand the underlying architecture can patch the leaks when AI makes mistakes; human agency — driven by desire and survival instinct — remains the one moat AI cannot replicate; and the only antidote to AI anxiety is opening the hood to understand how it works, then acting on that understanding.

Father of UML: Dario is Dead Wrong and Doesn't Understand Software Engineering! Software Engineering Will Not Die! Software Has Entered Its Third Golden Age! Industry Response: With AI, SaaS Will Only Prosper Further!

51CTO技术栈

mp.weixin.qq.com

02-10

14961 words · 60 min

Father of UML: Dario is Dead Wrong and Doesn't Understand Software Engineering! Software Engineering Will Not Die! Software Has Entered Its Third Golden Age! Industry Response: With AI, SaaS Will Only Prosper Further!

In this interview, "father of UML" Grady Booch offers a historical antidote to today's AI anxiety: software engineering has survived multiple existential crises, and each one gave way to a new golden age. He pushes back sharply on claims that AI will fully automate software engineering, arguing that the real work of software engineering — navigating tradeoffs across competing forces — cannot be automated. What we are witnessing is simply another rise in abstraction level, and the engineers who thrive will be those who develop systems thinking and the judgment to manage complexity at scale.

150: Year-End AI Review: From Models to Applications, Technology to Business Wars, Grasping the Thread of Meaning in the Torrent

晚点聊 LateTalk

xiaoyuzhoufm.com

02-09

1414 words · 6 min

150: Year-End AI Review: From Models to Applications, Technology to Business Wars, Grasping the Thread of Meaning in the Torrent

Wanlater's year-end review covers the full 2025 AI landscape across seven dimensions: the rise of reasoning models sparked by DeepSeek R1, the emergence of Agent as a paradigm, the talent and organizational battles among ByteDance, Alibaba, and Tencent, the investment surge and real-world limits of embodied intelligence, and finally, the human dimension — how people navigate skill devaluation and the search for meaning in an era of accelerating automation.

BestBlogs Issue #84: Orchestration

Contents