Skip to main content
Featured Newsletter

BestBlogs Issue #84: Orchestration

Hey there! Welcome to BestBlogs.dev Issue #84.

Happy Chinese New Year! We took a two-week break for the holiday, so this issue is packed with extra content—take your time with it.

The most significant shift over the past two weeks isn't any single model topping a new benchmark. It's the accelerating transformation of the engineer's role—from writing code to orchestrating AI agents that write code . Boris Cherny, creator of Claude Code, says the programming problem has largely been solved. Engineers at OpenAI are already managing 10 to 20 agents simultaneously on hour-long tasks. Anthropic's trend report calls it a systematic shift from humans writing code to humans orchestrating agents. Meanwhile, Claude Sonnet 4.6, Gemini 3.1 Pro, GLM-5, and MiniMax M2.5 all dropped within weeks of each other. The stronger the models get, the more valuable orchestration and judgment become.

On my end, I've been deep in building BestBlogs 2.0's core features—orchestrating multiple AI coding tools and agents through Spec documents for requirement discussions, architecture design, demo development, and interaction reviews. Almost no hand-written code involved. Aiming for a late March launch, and I'll share more details then.

Here are 10 highlights worth your attention this week:

🏆 The model arms race is heating up fast. Claude Sonnet 4.6 brings a million-token context window and upgraded agentic capabilities, outperforming the previous flagship Opus 4.5 in 59% of real-world tests—at the same price as Sonnet 4.5. Gemini 3.1 Pro jumped from 31% to 77% on ARC-AGI-2 reasoning benchmarks, introduced three-level thinking modes for flexible compute allocation, and costs less than half of Claude Opus 4.6. More capability at the same price is the new normal.

🤖 GLM-5 and MiniMax M2.5 tackle the same question from different angles: how to make agents actually work in production. GLM-5 is designed around agent engineering from the ground up, achieving state-of-the-art open-source performance through asynchronous RL and sparse attention. MiniMax M2.5 pushes continuous agent operation costs below $1 per hour, making unconstrained complex agent deployment a practical reality.

🎨 Seedance 2.0 and Nano Banana 2 push boundaries in video and image generation respectively. Seedance 2.0 goes beyond generating visuals—it understands directorial thinking, autonomously handling storyboard design and emotional pacing. Nano Banana 2 slashes API pricing significantly, and while hands-on testing shows results aren't quite as impressive as the marketing suggests, it genuinely makes high-quality image generation accessible to everyone.

🛠️ Two interviews with Boris Cherny, creator of Claude Code, are the must-reads of this issue. He traces Claude Code's journey from a two-upvote internal project to powering 4% of GitHub commits. The core philosophy: build for the model six months from now, not today's model. He hasn't written a single line of code since Opus 4.5, and believes the next frontier is AI evolving from executor to a colleague that proactively suggests ideas.

⚡ OpenAI's engineering lead Sherwin Wu reveals how AI tools are reshaping engineering teams: 95% of engineers use Codex daily, PR output gaps between high and low performers reach 70%, and engineers who can manage 10 to 20 agents simultaneously are pulling far ahead. He also candidly notes that many enterprise AI deployments have negative ROI, and that the second and third-order effects of one-person billion-dollar companies are severely underestimated.

📁 The next frontier in LLM engineering is shifting from parameter tuning to memory. An InfoQ talk systematically covers memory layering, proactive scheduling, and mind-map-style information organization. The key insight: instead of reactively handling retrieval at query time, front-load memory management during interaction gaps so relevant memories are ready before the query arrives. Datawhale's breakdown of Skill design reveals a critical dividing line: lock down fragile operations with scripts, guide creative tasks with natural language.

💡 Vibe Coding is moving from concept to large-scale production. Alibaba's internal practice exposes real challenges—code quality consistency, debugging efficiency, and security vulnerabilities—while offering battle-tested solutions like templatizing successful paths and abstracting agents as reusable tools. Meanwhile, a product manager with no coding background built a personal AI agent on their own server in one afternoon using Claude Code, proving that product sense is scarcer than coding ability.

🧩 Anthropic's agentic coding trend report maps out a systematic transformation across eight dimensions: multi-agent collaboration, long-running autonomous tasks, and programming democratization among them. The core thesis: AI amplifies the judgment engineers already have rather than replacing it. System design, task decomposition, and quality assurance—the old fundamentals—are worth more than ever in the agent era.

🔬 Google's Chief AI Scientist Jeff Dean walks through the full arc from loading Google's entire index into memory in 2001 to TPU co-design, offering two key predictions: personalized models that attend to all of a user's data, and specialized hardware enabling ultra-low latency that will fundamentally reshape human-AI collaboration.

👨‍💼 The debate over whether AI will end software engineering continues. UML creator Grady Booch pushes back on Dario's claims, pointing out that software engineering has survived multiple existential crises—each time emerging into a new golden age. Naval offers a different angle: agency is humanity's real moat against AI replacement, because AI has no desires, no survival pressure, and can't make autonomous decisions in truly unknown territory. The only way to overcome AI anxiety is to open the hood, understand it, and then act.

Hope this issue sparks some new ideas. Stay curious, and see you next week!

机器之心
mp.weixin.qq.com
02-18
1537 words · 7 min
93
Claude's Strongest Sonnet Model 4.6 is Here, Featuring a Million-Token Context Window

Anthropic released Claude Sonnet 4.6 during the Spring Festival, with broad upgrades across coding, computer use, long-context reasoning, and agentic planning, along with a 1-million-token context window. Notably, it outperforms the previous flagship Opus 4.5 in user preference tests 59% of the time, while keeping the same price point as Sonnet 4.5 — making frontier-level agentic capability genuinely accessible at scale.

腾讯科技
mp.weixin.qq.com
02-20
3322 words · 14 min
95
Google to Reclaim the Throne? Gemini 3.1 Pro Reasoning Scores Double, Hallucination Rates Decline, Prices Unchanged

Google's Gemini 3.1 Pro jumps from 31% to 77% on the ARC-AGI-2 reasoning benchmark and introduces a three-tier thinking mode that lets developers tune compute intensity per task without juggling multiple models. What makes this more than a benchmark story is the price: performance surged while costs held steady, with API spend coming in at under half of Claude Opus 4.6 — a calculated counterattack that put Google back at the top of the intelligence index rankings.

爱范儿
ifanr.com
02-27
3031 words · 13 min
93
Just Released: Nano Banana 2! Affordable and Powerful—Here Are the Details After My Hands-on Experience

This hands-on review tests Nano Banana 2 across Chinese text rendering, complex UI generation, manga storyboarding, and subject consistency, offering a more grounded take than the official blog: speed and quality improvements are less dramatic than advertised, and some edge cases actually regress from the previous model. But with API pricing cut in half and a genuinely lower barrier to entry, the model makes high-quality image generation practically accessible in a way its predecessors didn't quite manage.

智谱
mp.weixin.qq.com
02-22
24347 words · 98 min
93
GLM-5 Technical Report: Full Disclosure of Technical Details

GLM-5, the latest open-source flagship from Zhipu AI, is built around agentic engineering as a first-class design goal, combining an asynchronous RL framework with sparse attention to dramatically cut inference costs while boosting performance. It achieves state-of-the-art results among open models on benchmarks like SWE-bench and BrowseComp, and demonstrates end-to-end software engineering capabilities that rival top closed-source systems in real-world tasks.

MiniMax 稀宇科技
mp.weixin.qq.com
02-12
3205 words · 13 min
94
MiniMax M2.5 Released: $1/Hour, the King of Real-World Work

MiniMax M2.5 reaches top-tier performance across coding, tool use, and office productivity, but the more significant story is two structural bets: first, baking complex task decomposition directly into the model through a native Agent RL framework, achieving 37% faster completion while reducing token consumption; second, driving continuous Agent operation costs below one dollar per hour, turning the vision of economically unconstrained Agent deployment from aspiration into practical reality.

数字生命卡兹克
mp.weixin.qq.com
02-11
5162 words · 21 min
93
China Now Has a World-Leading Model, and Its Name Is Seedance 2.0.

Seedance 2.0 represents a genuine leap in AI video generation — it doesn't just produce footage, it understands cinematic thinking, autonomously handling shot composition, emotional pacing, and scene transitions. The author walks through a range of creative applications, from dramatic short films and fan recreations to product ads and real-world video editing, while candidly reflecting on what it feels like to watch 18 months of hard-built workflows become obsolete almost overnight.

Y Combinator
youtube.com
02-17
9395 words · 38 min
94
Boris Cherny: How We Built Claude Code

Claude Code creator Boris Cherny traces the full arc from accidental prototype to paradigm-shifting tool, anchored by one core philosophy: build for the model six months from now, not today. He shares that he no longer writes a single line of code by hand since Opus 4.5, and digs into multi-agent system design, the right way to use ClaudeMD, and why the most valuable skill for engineers in a rapidly evolving model landscape is no longer technical expertise — it's beginner's mind and first-principles thinking.

Datawhale
mp.weixin.qq.com
02-22
9159 words · 37 min
93
How to Write Great Skills? Deconstructing the Design Behind skill-creator!

Writing an AI Skill is fundamentally different from writing documentation for humans. This article uses skill-creator as a case study to reveal the core principle: every line must earn its place in the context window. Defining what the AI should not do proves more precise than describing what it should, and the key to quality lies in knowing when to lock down behavior with scripts versus when to guide with natural language.

InfoQ 中文
mp.weixin.qq.com
02-18
21952 words · 88 min
93
OpenAI Frontline Development Observations: Those Who Can Manage 10-20 Agents Simultaneously and Run Hour-Long Tasks Are Leaving Other Engineers Far Behind

Sherwin Wu, OpenAI's engineering lead, offers a rare inside look at how AI tools are reshaping software teams: 95% of engineers use Codex daily, top performers ship 70% more PRs, and the engineer's role is shifting from writing code to orchestrating AI agents. He also breaks down why so many enterprise AI deployments fail to deliver ROI, and explores the underappreciated second and third-order effects of the emerging one-person billion-dollar company era.

InfoQ 中文
mp.weixin.qq.com
02-25
9347 words · 38 min
93
From Context to Long-Term Memory: Architectural Design and Practice of LLM Memory Engineering

As LLM scaling yields diminishing returns, memory engineering is emerging as the next core infrastructure challenge. This talk systematically covers three key mechanisms — hierarchical memory modeling, proactive scheduling, and mind-map-style information organization — with a central insight that cuts through current RAG limitations: instead of reactive retrieval that blocks inference, shift memory preparation into the idle gaps between user interactions, so the right context is ready before the query arrives.

InfoQ 中文
mp.weixin.qq.com
02-10
8682 words · 35 min
93
Practices and Reflections on Vibe Coding in Code Generation and Collaboration

Real-world deployment of Vibe Coding tools at Alibaba's scale surfaces challenges that benchmarks rarely capture: inconsistent code quality, longer debugging cycles, security vulnerabilities in AI-generated code, and runaway token costs. The team's practical responses — templating successful task patterns to lift completion rates, replacing closed-source models with domestic alternatives, and treating Agents themselves as composable tools — offer a grounded blueprint for anyone building or evaluating the next generation of AI coding products.

42章经
xiaoyuzhoufm.com
02-07
1121 words · 5 min
94
From Clawdbot to the 2026 AI Coding Explosion | A Conversation with PingCAP CTO Dongxu

PingCAP CTO Huang Dongxu draws on hands-on experience to trace AI Coding's leap from assistant tool to autonomous agent, arguing that everything is converging toward a "Coding Agent" paradigm. He explains why context engineering has become the key performance differentiator, introduces his concept of "Box" isolated environments for safer multi-agent collaboration, and reflects on how engineers can stay relevant as the technical barrier to coding disappears.

宝玉的分享
baoyu.io
02-21
5968 words · 24 min
91
OpenAI Applied CTO and Codex Lead: AI is Reshaping How Software is Built

OpenAI's CTO of Applications and Codex engineering lead offer a rare inside look at how AI coding has evolved from assistance to true delegation inside OpenAI — engineers now shut their laptops, go to meetings, and return to find the work done. What's equally striking is the chain of shifting bottlenecks: solving code generation surfaces code review, then deployment, then understanding user needs. The emerging competitive edge for engineers is no longer writing code, but product intuition and the ability to move fluidly across layers of abstraction.

少数派
mp.weixin.qq.com
02-20
12073 words · 49 min
92
The Era of Vibe Coding: Why 'Product Sense' is Scarcer Than 'Coding Skills'?

A product manager with no coding background built a fully functional personal AI Agent in a single afternoon using Claude Code. The article shares six practical Vibe Coding techniques, with a central insight that cuts through the hype: AI has eliminated the technical barrier to building, but the ability to define what to build and why remains irreplaceable — and for non-technical builders, that product intuition is now their greatest competitive advantage.

宝玉的分享
baoyu.io
02-10
4960 words · 20 min
93
The Great Shift in Programming 2026: Anthropic Report Reveals Eight Major Trends in Agentic Programming

Anthropic's 2026 Agentic Coding Trends Report maps out a fundamental shift in software development — from writing code to orchestrating agents that write code. Spanning eight trends from multi-agent collaboration to programming democratization, the report's sharpest insight is that AI amplifies existing engineering judgment rather than replacing it. Skills like system design, task decomposition, and quality evaluation become more valuable as implementation gets delegated to agents.

Latent Space
latent.space
02-12
15864 words · 64 min
93
Owning the AI Pareto Frontier — Jeff Dean

Google Chief AI Scientist Jeff Dean traces the full arc of AI infrastructure evolution through his own experience: from the 2001 decision to load Google's entire search index into memory, to the co-design philosophy behind TPUs, the logic of distillation and sparse models, and his two forward-looking bets — personalized models that can attend to everything you've ever seen, and specialized hardware driving ultra-low latency that will fundamentally reshape how humans and AI systems collaborate.

Lenny's Podcast
youtube.com
02-19
19102 words · 77 min
94
Head of Claude Code: What happens after coding is solved | Boris Cherny

Claude Code's lead Boris Cherny traces the product's journey from an internal post that got two likes to driving 4% of GitHub's global code commits. His sharpest take: coding as a problem is largely solved, and the next frontier is AI that proactively surfaces ideas rather than just executing them. In this transition, the rarest skill will not be writing code but the generalist capacity to define what is worth building across multiple domains.

跨国串门儿计划
xiaoyuzhoufm.com
02-24
1642 words · 7 min
93
#434. Survival Rules in the AI Era: Naval on Vibe Coding, Personal Leverage, and the Future of Creativity

Naval reframes AI-era competitiveness through several sharp lenses: vibe coding turns taste and judgment directly into productivity, but only those who understand the underlying architecture can patch the leaks when AI makes mistakes; human agency — driven by desire and survival instinct — remains the one moat AI cannot replicate; and the only antidote to AI anxiety is opening the hood to understand how it works, then acting on that understanding.

51CTO技术栈
mp.weixin.qq.com
02-10
14961 words · 60 min
92
Father of UML: Dario is Dead Wrong and Doesn't Understand Software Engineering! Software Engineering Will Not Die! Software Has Entered Its Third Golden Age! Industry Response: With AI, SaaS Will Only Prosper Further!

In this interview, "father of UML" Grady Booch offers a historical antidote to today's AI anxiety: software engineering has survived multiple existential crises, and each one gave way to a new golden age. He pushes back sharply on claims that AI will fully automate software engineering, arguing that the real work of software engineering — navigating tradeoffs across competing forces — cannot be automated. What we are witnessing is simply another rise in abstraction level, and the engineers who thrive will be those who develop systems thinking and the judgment to manage complexity at scale.

晚点聊 LateTalk
xiaoyuzhoufm.com
02-09
1414 words · 6 min
93
150: Year-End AI Review: From Models to Applications, Technology to Business Wars, Grasping the Thread of Meaning in the Torrent

Wanlater's year-end review covers the full 2025 AI landscape across seven dimensions: the rise of reasoning models sparked by DeepSeek R1, the emergence of Agent as a paradigm, the talent and organizational battles among ByteDance, Alibaba, and Tencent, the investment surge and real-world limits of embodied intelligence, and finally, the human dimension — how people navigate skill devaluation and the search for meaning in an era of accelerating automation.

    BestBlogs Issue #84: Orchestration | BestBlogs.dev