Featured Newsletter

BestBlogs Issue #85: Harness Engineering

Hey there! Welcome to BestBlogs.dev Issue #85.

One keyword threads through this week's articles: harnessing. Essays published on martinfowler.com argue that developers' core work is shifting from writing code to building the harness agents depend on—specs, quality gates, and workflow guides. A Chinese podcast title puts it more bluntly: stop working, start setting up the office for your AI. OpenAI's team shipped a million lines of Codex-generated code over five months, not by using a stronger model, but by enforcing structured knowledge bases and rigid architectural constraints. As agents grow more capable, the real competitive edge isn't whether you use AI, but whether you can harness it.

On the BestBlogs.dev front, we've been going deep on AI coding to build out version 2.0. The focus is custom subscription sources and personalized feeds, so everyone can shape their reading experience around their own interests. I'm also developing Skills on top of open APIs for content search, deep reading, and daily operations—all aimed at truly harnessing the future of reading.

Here are 10 highlights worth your attention this week:

🤖 GPT-5.4 lands as OpenAI's first model to unify reasoning, coding, native computer use, deep search, and million-token context in a single package. The standout is native computer use: the model reads screenshots, moves the mouse, and types on the keyboard, surpassing average human performance on OSWorld desktop tasks. A tool-search mechanism cuts agent token consumption by 47%, achieving high capability and low cost simultaneously. Meanwhile, GPT-5.3 Instant optimizes for feel over benchmarks, reducing web hallucination rates by 26.8%—a meaningful step toward making ChatGPT a reliable daily tool.

🏗️ Two essays on martinfowler.com form a cohesive argument this week. The first positions developers "on the loop": the core job becomes building and maintaining the harness that agents run on, with an agentic flywheel where agents not only execute tasks but continuously improve the harness itself. The second introduces a Design-First collaboration framework, aligning on capabilities, components, interactions, interfaces, and implementation before any code is generated, preventing architectural decisions from being silently embedded by AI.

🎬 Pragmatic Engineer sat down with Boris Cherny, the creator of Claude Code, tracing its journey from an Anthropic side project to one of the fastest-growing developer tools. Boris ships 20–30 PRs daily, all 100% AI-generated, without editing a single line by hand. The conversation also reveals the internal debate at Anthropic over whether to release it publicly, how code review is evolving in the AI era, and the layered security architecture behind Claude Code.

🔧 Alibaba's Tmall engineering team identifies the real bottleneck in enterprise AI coding: not agent execution capability, but accurately conveying complex task goals to AI. Their solution is a layered, unified expert knowledge base for systematic entropy reduction, driving a shift from tool-based efficiency to knowledge-driven intelligent development. OpenAI's Codex practice confirms the same insight: 1,500 PRs over five months with zero human coding, scaled through structured knowledge management, rigid architectural constraints, and periodic code entropy cleanup.

📁 Tencent Cloud published what may be the most thorough Chinese-language teardown of OpenClaw's context management, covering a three-tier defense system: preemptive pruning, LLM-based summarization, and post-overflow recovery, plus a cost analysis of each operation's impact on provider KV cache. Essential reading for anyone building long-session agents.

⚡ Small models are rewriting performance expectations. Qwen3.5 releases four models from 0.8B to 9B parameters, all Apache 2.0, fine-tunable on consumer GPUs. The 4B stands out for multimodal and agent capabilities, while 9B punches close to much larger models. Xiaohongshu's open-source FireRed-OCR takes a different angle, turning Qwen3-VL-2B into a dedicated document parsing model through three-stage progressive training, scoring 92.94% on OmniDocBench v1.5 and ranking first among end-to-end solutions, with support for formulas, tables, and handwriting. Both projects prove the same point: targeted training strategies beat brute-force parameter scaling.

🎨 Anthropic's head of design Jenny Wen shares a striking observation: the traditional design process is dead, not because designers chose to change, but because engineers shipping at AI speed forced the shift. Her time on polished mockups dropped from 60–70% to 30–40%, replaced by direct pairing with engineers and even editing code herself. Design work is splitting into two tracks: real-time collaboration supporting engineering execution, and vision design that sets direction 3 to 6 months out.

💡 A three-hour conversation between Meng Yan and Li Jigang starts from one powerful premise: the industrial revolution took away physical labor, AI is taking away mental labor, what remains for humans is heart force. The dialogue extends into the nature of vector spaces, business models shifting from weaving nets to digging wells, and education transforming from pouring water to lighting fire. Two insights worth unpacking on their own: "Your feed is your fate" and "prompts have shapes."

📈 Zapier's VP of Product shares first-hand lessons from running 800 AI agents internally, emphasizing that technology adoption and business transformation must be treated as separate efforts, and that leadership must personally use AI tools for transformation to stick. Insight Partners' co-founder goes further: autonomous agents are the real core of this wave, SaaS per-seat pricing will give way to consumption-based models, and white-collar job displacement will become an election issue within two years.

🌐 A thought experiment written from a 2028 vantage point deserves attention: white-collar job losses trigger consumer spending contraction, which triggers private credit defaults, which pressures mortgage markets, forming a negative feedback loop with no natural brake. Not a prediction, but a systematic framework for reasoning about left-tail risks. Worth a careful read for anyone thinking about AI's economic impact.

Hope this issue sparks some new ideas. Stay curious, and see you next week!

Subscribe Now

1GPT-5.4 Released: OpenAI's First Unified Model, Truly Native
2OpenAI Drops New Default GPT-5.3 Model Overnight! Focus on 'De-cringing'! Hands-on: Blazing Fast Instant Gratification! Enhanced Search! OpenAI Staff Reveal Model Switching Strategy
3Qwen3.5 Small-Size Models Are Here!
4FireRed-OCR Open Source Release: New End-to-End SOTA! Xiaohongshu Proposes Low-Cost Document Recognition Training Paradigm
5The Architecture Behind Open-Source LLMs
6Building Claude Code with Boris Cherny
7Humans and Agents in Software Engineering Loops
8Design-First Collaboration
9Reflections on AI Coding: From Tool Efficiency to Paradigm Shift—What Are We Still Missing?
10Deep Dive into OpenClaw's Context Window Compression: All for Performance and Cost Savings
11Stop Talking About '10x Developers': AI Agents Aren't Accelerating the SDLC, They're Ending It
121,500 PRs, 0 Humans Coding: Codex-Driven Million-Line Internal Product Practice
13Stop Working! Go Set Up an Office for AI!
14The design process is dead. Here’s what’s replacing it. | Jenny Wen (head of design at Claude)
15Zapier VP of Product on Orchestrating 800+ AI Agents to Manage Everything
16Deep Dive into Nano Banana 2: Fun Use Cases, High Speed, and High Quality!
17Four Steps of AI Transformation: Individual, Organization, Product, and Business (Part 2)
18E45 Meng Yan in Conversation with Li Jigang: How Humans Find Their Place
19#445. 20VC: Why Cursor is Dead | The AI Tsunami is Coming, and You Need to be Ready
20The 2028 Global Intelligence Crisis: Who Pays the Price?

GPT-5.4 Released: OpenAI's First Unified Model, Truly Native

量子位

qbitai.com

03-06

4124 words · 17 min

GPT-5.4 Released: OpenAI's First Unified Model, Truly Native

OpenAI's GPT-5.4 marks the first time reasoning, coding, native computer use, deep web search, and million-token context have been unified in a single model — without sacrificing performance in any individual area. The standout capability is native computer operation: the model interprets screen state and executes mouse and keyboard actions, surpassing human average on desktop benchmarks. A new tool search mechanism also cuts token usage by 47% for Agent tasks, making this a rare case where capability gains and cost efficiency arrive together.

OpenAI Drops New Default GPT-5.3 Model Overnight! Focus on 'De-cringing'! Hands-on: Blazing Fast Instant Gratification! Enhanced Search! OpenAI Staff Reveal Model Switching Strategy

51CTO技术栈

mp.weixin.qq.com

03-04

3461 words · 14 min

OpenAI Drops New Default GPT-5.3 Model Overnight! Focus on 'De-cringing'! Hands-on: Blazing Fast Instant Gratification! Enhanced Search! OpenAI Staff Reveal Model Switching Strategy

GPT-5.3 Instant skips benchmark chasing and focuses on the user experience: fewer preachy disclaimers, sharper intent detection, and better search integration. With hallucination rates down 26.8% in connected mode, it's a meaningful step toward making ChatGPT a reliable daily tool.

Qwen3.5 Small-Size Models Are Here!

通义大模型

mp.weixin.qq.com

03-03

174 words · 1 min

Qwen3.5 releases four compact models from 0.8B to 9B under Apache 2.0, fine-tunable on consumer GPUs. The 4B impresses for multimodal and Agent tasks; the 9B rivals larger models — both well-suited for cost-efficient vertical deployment.

FireRed-OCR Open Source Release: New End-to-End SOTA! Xiaohongshu Proposes Low-Cost Document Recognition Training Paradigm

小红书技术REDtech

mp.weixin.qq.com

03-02

4293 words · 18 min

FireRed-OCR Open Source Release: New End-to-End SOTA! Xiaohongshu Proposes Low-Cost Document Recognition Training Paradigm

FireRed-OCR fine-tunes Qwen3-VL-2B into a specialized document parser via three-stage progressive training, achieving 92.94% on OmniDocBench v1.5 — the top end-to-end result. It handles formulas, tables, and handwriting at industrial quality, and is fully open-source.

The Architecture Behind Open-Source LLMs

ByteByteGo Newsletter

blog.bytebytego.com

03-02

1654 words · 7 min

The Architecture Behind Open-Source LLMs

A concise comparative breakdown of six leading open-weight LLMs, covering MoE design, attention mechanism tradeoffs, and post-training strategies — with practical guidance on what to actually evaluate when choosing a model. Essential reading for engineers navigating the current open-source landscape.

Building Claude Code with Boris Cherny

The Pragmatic Engineer

youtube.com

03-04

21313 words · 86 min

Pragmatic Engineer sits down with Boris Cherny, the creator of Claude Code, tracing its journey from an internal Anthropic side project to one of the fastest-growing developer tools available. Boris walks through his daily workflow — 20-30 PRs per day, 100% AI-generated, not a single line written by hand — and shares the internal debate over whether to release it at all. The conversation covers how code review is evolving in an AI-first environment, the layered security design behind Claude Code's architecture, and Boris's take on which engineering skills will matter most going forward. The printing press analogy running through the episode makes for a thought-provoking listen.

Humans and Agents in Software Engineering Loops

Martin Fowler

martinfowler.com

03-04

1603 words · 7 min

Humans and Agents in Software Engineering Loops

This article offers a clear framework for thinking about the developer's role in an AI-assisted workflow, identifying three positions: humans outside the loop (vibe coding), humans inside the loop (manually reviewing every artifact), and humans on the loop. The author argues the third is the right place to be — where developers shift from writing code to building and maintaining the "harness" that guides agents: the specs, quality checks, and workflow instructions that shape what agents produce. The piece goes further to describe an "agentic flywheel," where agents not only execute tasks but continuously evaluate and improve the harness itself.

Design-First Collaboration

Martin Fowler

martinfowler.com

03-03

2226 words · 9 min

The author identifies a core problem with AI-assisted coding: AI skips the design phase entirely, silently embedding every architectural decision into generated code and turning code review into an exhausting exercise in reverse-engineering. The proposed solution, "Design-First," structures the collaboration into five sequential levels — capabilities, components, interactions, contracts, implementation — with no code until the design is agreed at each step. This isn't process ceremony; it's cognitive load management that forces decisions to happen at the right level of abstraction. For developers who regularly use AI coding assistants, this article offers a practical and well-reasoned collaboration framework worth adopting.

Reflections on AI Coding: From Tool Efficiency to Paradigm Shift—What Are We Still Missing?

大淘宝技术

mp.weixin.qq.com

03-02

7601 words · 31 min

Reflections on AI Coding: From Tool Efficiency to Paradigm Shift—What Are We Still Missing?

A thoughtful piece from Alibaba's Tmall tech team: the core bottleneck in enterprise AI Coding isn't agent execution, but accurately conveying complex task goals to AI. The solution is a layered, unified expert knowledge base that reduces information entropy and drives a shift from tool-level efficiency gains to a knowledge-driven intelligent development paradigm.

Deep Dive into OpenClaw's Context Window Compression: All for Performance and Cost Savings

腾讯云开发者

mp.weixin.qq.com

03-04

9892 words · 40 min

Deep Dive into OpenClaw's Context Window Compression: All for Performance and Cost Savings

A deep dive into OpenClaw's context management source code, covering its three-layer defense — preventive pruning, LLM-based compaction, and overflow recovery — plus a detailed analysis of how each operation affects Provider KV cache costs. One of the most thorough technical breakdowns of long-session context management for AI agents available in Chinese.

Stop Talking About '10x Developers': AI Agents Aren't Accelerating the SDLC, They're Ending It

InfoQ 中文

mp.weixin.qq.com

03-01

3384 words · 14 min

Stop Talking About '10x Developers': AI Agents Aren't Accelerating the SDLC, They're Ending It

This piece makes a sharp argument: AI agents aren't accelerating the software development lifecycle — they're ending it. The author works through each SDLC phase systematically: requirements become a byproduct of iteration, design emerges through collaboration, tests are generated alongside code, PR review becomes a legacy ritual, and observability evolves from a passive dashboard into the feedback loop driving the entire system. The conclusion is that only two core capabilities survive: context engineering and observability. A clearly argued piece worth engaging with seriously.

1,500 PRs, 0 Humans Coding: Codex-Driven Million-Line Internal Product Practice

AI前线

mp.weixin.qq.com

03-01

6416 words · 26 min

OpenAI's engineering team used Codex to generate 1 million lines of code and 1,500 PRs over 5 months — with no human-written code. The article distills what made it work: structured knowledge management, rigid architectural constraints, agent-accessible observability tooling, and periodic entropy cleanup. A concise blueprint for scaling agent-driven development.

Stop Working! Go Set Up an Office for AI!

AI炼金术

xiaoyuzhoufm.com

03-03

28034 words · 113 min

Stop Working! Go Set Up an Office for AI!

Two AI founders share their engineering workflow in the Agent era: the job has shifted from writing code to building environments for AI, a three-step flow (plan → run → validate) is now the daily norm, and judgment bandwidth — not execution speed — is the new productivity ceiling. Worth listening to for anyone curious about what AI-first engineering actually looks like in practice.

The design process is dead. Here’s what’s replacing it. | Jenny Wen (head of design at Claude)

Lenny's Podcast

youtube.com

03-01

6060 words · 25 min

The design process is dead. Here’s what’s replacing it. | Jenny Wen (head of design at Claude)

Anthropic's head of design Jenny Wen shares her firsthand observations on how the design role is transforming in the AI era. Her core argument: the traditional "discover-diverge-converge" design process is dead — not because designers chose to abandon it, but because engineers shipping at AI-assisted speed forced the change. Design work is now splitting into two modes: real-time execution support alongside engineers, and shorter-horizon vision work spanning 3 to 6 months rather than years. She also details her own workflow shift — time spent on polished mockups has dropped from 60-70% to 30-40%, with far more time now spent on paired collaboration with engineers and direct code-level work. A valuable firsthand account for any designer navigating this transition.

Zapier VP of Product on Orchestrating 800+ AI Agents to Manage Everything

Product School

youtube.com

03-03

11522 words · 47 min

Zapier VP of Product on Orchestrating 800+ AI Agents to Manage Everything

Zapier's VP of Product shares firsthand enterprise AI transformation practices: 800 internal AI agents, a clear distinction between adoption and transformation, and a strong argument that leadership must personally use AI tools for change to take hold. The core difference between traditional and agentic workflows? The ability to reason and dynamically reroute.

Deep Dive into Nano Banana 2: Fun Use Cases, High Speed, and High Quality!

阿真Irene

mp.weixin.qq.com

03-04

13755 words · 56 min

Deep Dive into Nano Banana 2: Fun Use Cases, High Speed, and High Quality!

A thorough hands-on review of Gemini 3.1 Flash Image: improved text rendering accuracy, significantly better character consistency, and new support for extreme aspect ratios like 1:4 and 1:8. Comes with extensive ready-to-use prompts — useful for anyone looking to iterate quickly on design and content creation at lower cost.

Four Steps of AI Transformation: Individual, Organization, Product, and Business (Part 2)

AI炼金术

xiaoyuzhoufm.com

03-02

2573 words · 11 min

The biggest mistake in AI product development is adding AI to features instead of helping users complete real jobs. This podcast presents a three-step framework — deconstruct, redesign, disrupt — and maps out four AI-native startup paths: unlocking new markets, wrapping mature tech, selling infrastructure, and transforming traditional industries. Practical and grounded with real-world examples throughout.

E45 Meng Yan in Conversation with Li Jigang: How Humans Find Their Place

无人知晓

xiaoyuzhoufm.com

03-03

3805 words · 16 min

E45 Meng Yan in Conversation with Li Jigang: How Humans Find Their Place

This three-hour conversation between Meng Yan and Li Jigang centers on a deceptively simple premise: the Industrial Revolution took physical labor, AI is taking cognitive labor, and what remains for humans is what they call "heart force" — will, intuition, and aesthetic sensibility. From there, the conversation moves through the nature of the vector world, the shift in business models from "weaving networks" to "drilling deep wells," the fork between amplifying human distinctiveness through AI versus withdrawing from thinking altogether, and the evolution of education from knowledge-pouring to spark-finding. Li Jigang's two observations — "Your feed is your fate" and "prompts have shape" — are each worth sitting with independently. Worth listening to for anyone thinking seriously about what it means to be human in the AI era, beyond the technical discussion.

#445. 20VC: Why Cursor is Dead | The AI Tsunami is Coming, and You Need to be Ready

跨国串门儿计划

xiaoyuzhoufm.com

03-02

1576 words · 7 min

Insight Partners co-founder Jerry Murdock argues autonomous agents are the real core of this AI wave — tools like Cursor are already facing obsolescence, seat-based SaaS pricing will give way to consumption models, and white-collar displacement will become an election issue within two years.

The 2028 Global Intelligence Crisis: Who Pays the Price?

Datawhale

mp.weixin.qq.com

02-27

12468 words · 50 min

The 2028 Global Intelligence Crisis: Who Pays the Price?

A thought experiment written from a 2028 vantage point, tracing AI's economic left-tail risks: white-collar displacement → consumer spending contraction → private credit defaults → mortgage market stress — a feedback loop with no natural brake. Not a prediction, but a rigorous risk scenario worth reading carefully for anyone thinking about AI's macroeconomic implications.

BestBlogs Issue #85: Harness Engineering

Contents