Skip to main content
Featured Newsletter

BestBlogs Issue #85: Harness Engineering

Hey there! Welcome to BestBlogs.dev Issue #85.

One keyword threads through this week's articles: harnessing. Essays published on martinfowler.com argue that developers' core work is shifting from writing code to building the harness agents depend on—specs, quality gates, and workflow guides. A Chinese podcast title puts it more bluntly: stop working, start setting up the office for your AI. OpenAI's team shipped a million lines of Codex-generated code over five months, not by using a stronger model, but by enforcing structured knowledge bases and rigid architectural constraints. As agents grow more capable, the real competitive edge isn't whether you use AI, but whether you can harness it.

On the BestBlogs.dev front, we've been going deep on AI coding to build out version 2.0. The focus is custom subscription sources and personalized feeds, so everyone can shape their reading experience around their own interests. I'm also developing Skills on top of open APIs for content search, deep reading, and daily operations—all aimed at truly harnessing the future of reading.

Here are 10 highlights worth your attention this week:

🤖 GPT-5.4 lands as OpenAI's first model to unify reasoning, coding, native computer use, deep search, and million-token context in a single package. The standout is native computer use: the model reads screenshots, moves the mouse, and types on the keyboard, surpassing average human performance on OSWorld desktop tasks. A tool-search mechanism cuts agent token consumption by 47%, achieving high capability and low cost simultaneously. Meanwhile, GPT-5.3 Instant optimizes for feel over benchmarks, reducing web hallucination rates by 26.8%—a meaningful step toward making ChatGPT a reliable daily tool.

🏗️ Two essays on martinfowler.com form a cohesive argument this week. The first positions developers "on the loop": the core job becomes building and maintaining the harness that agents run on, with an agentic flywheel where agents not only execute tasks but continuously improve the harness itself. The second introduces a Design-First collaboration framework, aligning on capabilities, components, interactions, interfaces, and implementation before any code is generated, preventing architectural decisions from being silently embedded by AI.

🎬 Pragmatic Engineer sat down with Boris Cherny, the creator of Claude Code, tracing its journey from an Anthropic side project to one of the fastest-growing developer tools. Boris ships 20–30 PRs daily, all 100% AI-generated, without editing a single line by hand. The conversation also reveals the internal debate at Anthropic over whether to release it publicly, how code review is evolving in the AI era, and the layered security architecture behind Claude Code.

🔧 Alibaba's Tmall engineering team identifies the real bottleneck in enterprise AI coding: not agent execution capability, but accurately conveying complex task goals to AI. Their solution is a layered, unified expert knowledge base for systematic entropy reduction, driving a shift from tool-based efficiency to knowledge-driven intelligent development. OpenAI's Codex practice confirms the same insight: 1,500 PRs over five months with zero human coding, scaled through structured knowledge management, rigid architectural constraints, and periodic code entropy cleanup.

📁 Tencent Cloud published what may be the most thorough Chinese-language teardown of OpenClaw's context management, covering a three-tier defense system: preemptive pruning, LLM-based summarization, and post-overflow recovery, plus a cost analysis of each operation's impact on provider KV cache. Essential reading for anyone building long-session agents.

⚡ Small models are rewriting performance expectations. Qwen3.5 releases four models from 0.8B to 9B parameters, all Apache 2.0, fine-tunable on consumer GPUs. The 4B stands out for multimodal and agent capabilities, while 9B punches close to much larger models. Xiaohongshu's open-source FireRed-OCR takes a different angle, turning Qwen3-VL-2B into a dedicated document parsing model through three-stage progressive training, scoring 92.94% on OmniDocBench v1.5 and ranking first among end-to-end solutions, with support for formulas, tables, and handwriting. Both projects prove the same point: targeted training strategies beat brute-force parameter scaling.

🎨 Anthropic's head of design Jenny Wen shares a striking observation: the traditional design process is dead, not because designers chose to change, but because engineers shipping at AI speed forced the shift. Her time on polished mockups dropped from 60–70% to 30–40%, replaced by direct pairing with engineers and even editing code herself. Design work is splitting into two tracks: real-time collaboration supporting engineering execution, and vision design that sets direction 3 to 6 months out.

💡 A three-hour conversation between Meng Yan and Li Jigang starts from one powerful premise: the industrial revolution took away physical labor, AI is taking away mental labor, what remains for humans is heart force. The dialogue extends into the nature of vector spaces, business models shifting from weaving nets to digging wells, and education transforming from pouring water to lighting fire. Two insights worth unpacking on their own: "Your feed is your fate" and "prompts have shapes."

📈 Zapier's VP of Product shares first-hand lessons from running 800 AI agents internally, emphasizing that technology adoption and business transformation must be treated as separate efforts, and that leadership must personally use AI tools for transformation to stick. Insight Partners' co-founder goes further: autonomous agents are the real core of this wave, SaaS per-seat pricing will give way to consumption-based models, and white-collar job displacement will become an election issue within two years.

🌐 A thought experiment written from a 2028 vantage point deserves attention: white-collar job losses trigger consumer spending contraction, which triggers private credit defaults, which pressures mortgage markets, forming a negative feedback loop with no natural brake. Not a prediction, but a systematic framework for reasoning about left-tail risks. Worth a careful read for anyone thinking about AI's economic impact.

Hope this issue sparks some new ideas. Stay curious, and see you next week!

量子位
qbitai.com
03-06
4124 words · 17 min
94
GPT-5.4 Released: OpenAI's First Unified Model, Truly Native

OpenAI's GPT-5.4 marks the first time reasoning, coding, native computer use, deep web search, and million-token context have been unified in a single model — without sacrificing performance in any individual area. The standout capability is native computer operation: the model interprets screen state and executes mouse and keyboard actions, surpassing human average on desktop benchmarks. A new tool search mechanism also cuts token usage by 47% for Agent tasks, making this a rare case where capability gains and cost efficiency arrive together.

51CTO技术栈
mp.weixin.qq.com
03-04
3461 words · 14 min
93
OpenAI Drops New Default GPT-5.3 Model Overnight! Focus on 'De-cringing'! Hands-on: Blazing Fast Instant Gratification! Enhanced Search! OpenAI Staff Reveal Model Switching Strategy

GPT-5.3 Instant skips benchmark chasing and focuses on the user experience: fewer preachy disclaimers, sharper intent detection, and better search integration. With hallucination rates down 26.8% in connected mode, it's a meaningful step toward making ChatGPT a reliable daily tool.

通义大模型
mp.weixin.qq.com
03-03
174 words · 1 min
93
Qwen3.5 Small-Size Models Are Here!

Qwen3.5 releases four compact models from 0.8B to 9B under Apache 2.0, fine-tunable on consumer GPUs. The 4B impresses for multimodal and Agent tasks; the 9B rivals larger models — both well-suited for cost-efficient vertical deployment.

The Architecture Behind Open-Source LLMs

A concise comparative breakdown of six leading open-weight LLMs, covering MoE design, attention mechanism tradeoffs, and post-training strategies — with practical guidance on what to actually evaluate when choosing a model. Essential reading for engineers navigating the current open-source landscape.

The Pragmatic Engineer
youtube.com
03-04
21313 words · 86 min
94
Building Claude Code with Boris Cherny

Pragmatic Engineer sits down with Boris Cherny, the creator of Claude Code, tracing its journey from an internal Anthropic side project to one of the fastest-growing developer tools available. Boris walks through his daily workflow — 20-30 PRs per day, 100% AI-generated, not a single line written by hand — and shares the internal debate over whether to release it at all. The conversation covers how code review is evolving in an AI-first environment, the layered security design behind Claude Code's architecture, and Boris's take on which engineering skills will matter most going forward. The printing press analogy running through the episode makes for a thought-provoking listen.

Martin Fowler
martinfowler.com
03-04
1603 words · 7 min
93
Humans and Agents in Software Engineering Loops

This article offers a clear framework for thinking about the developer's role in an AI-assisted workflow, identifying three positions: humans outside the loop (vibe coding), humans inside the loop (manually reviewing every artifact), and humans on the loop. The author argues the third is the right place to be — where developers shift from writing code to building and maintaining the "harness" that guides agents: the specs, quality checks, and workflow instructions that shape what agents produce. The piece goes further to describe an "agentic flywheel," where agents not only execute tasks but continuously evaluate and improve the harness itself.

Martin Fowler
martinfowler.com
03-03
2226 words · 9 min
93

The author identifies a core problem with AI-assisted coding: AI skips the design phase entirely, silently embedding every architectural decision into generated code and turning code review into an exhausting exercise in reverse-engineering. The proposed solution, "Design-First," structures the collaboration into five sequential levels — capabilities, components, interactions, contracts, implementation — with no code until the design is agreed at each step. This isn't process ceremony; it's cognitive load management that forces decisions to happen at the right level of abstraction. For developers who regularly use AI coding assistants, this article offers a practical and well-reasoned collaboration framework worth adopting.

大淘宝技术
mp.weixin.qq.com
03-02
7601 words · 31 min
92
Reflections on AI Coding: From Tool Efficiency to Paradigm Shift—What Are We Still Missing?

A thoughtful piece from Alibaba's Tmall tech team: the core bottleneck in enterprise AI Coding isn't agent execution, but accurately conveying complex task goals to AI. The solution is a layered, unified expert knowledge base that reduces information entropy and drives a shift from tool-level efficiency gains to a knowledge-driven intelligent development paradigm.

腾讯云开发者
mp.weixin.qq.com
03-04
9892 words · 40 min
92
Deep Dive into OpenClaw's Context Window Compression: All for Performance and Cost Savings

A deep dive into OpenClaw's context management source code, covering its three-layer defense — preventive pruning, LLM-based compaction, and overflow recovery — plus a detailed analysis of how each operation affects Provider KV cache costs. One of the most thorough technical breakdowns of long-session context management for AI agents available in Chinese.

InfoQ 中文
mp.weixin.qq.com
03-01
3384 words · 14 min
92
Stop Talking About '10x Developers': AI Agents Aren't Accelerating the SDLC, They're Ending It

This piece makes a sharp argument: AI agents aren't accelerating the software development lifecycle — they're ending it. The author works through each SDLC phase systematically: requirements become a byproduct of iteration, design emerges through collaboration, tests are generated alongside code, PR review becomes a legacy ritual, and observability evolves from a passive dashboard into the feedback loop driving the entire system. The conclusion is that only two core capabilities survive: context engineering and observability. A clearly argued piece worth engaging with seriously.

AI炼金术
xiaoyuzhoufm.com
03-03
28034 words · 113 min
93
Stop Working! Go Set Up an Office for AI!

Two AI founders share their engineering workflow in the Agent era: the job has shifted from writing code to building environments for AI, a three-step flow (plan → run → validate) is now the daily norm, and judgment bandwidth — not execution speed — is the new productivity ceiling. Worth listening to for anyone curious about what AI-first engineering actually looks like in practice.

Lenny's Podcast
youtube.com
03-01
6060 words · 25 min
93
The design process is dead. Here’s what’s replacing it. | Jenny Wen (head of design at Claude)

Anthropic's head of design Jenny Wen shares her firsthand observations on how the design role is transforming in the AI era. Her core argument: the traditional "discover-diverge-converge" design process is dead — not because designers chose to abandon it, but because engineers shipping at AI-assisted speed forced the change. Design work is now splitting into two modes: real-time execution support alongside engineers, and shorter-horizon vision work spanning 3 to 6 months rather than years. She also details her own workflow shift — time spent on polished mockups has dropped from 60-70% to 30-40%, with far more time now spent on paired collaboration with engineers and direct code-level work. A valuable firsthand account for any designer navigating this transition.

Product School
youtube.com
03-03
11522 words · 47 min
92
Zapier VP of Product on Orchestrating 800+ AI Agents to Manage Everything

Zapier's VP of Product shares firsthand enterprise AI transformation practices: 800 internal AI agents, a clear distinction between adoption and transformation, and a strong argument that leadership must personally use AI tools for change to take hold. The core difference between traditional and agentic workflows? The ability to reason and dynamically reroute.

AI炼金术
xiaoyuzhoufm.com
03-02
2573 words · 11 min
92

The biggest mistake in AI product development is adding AI to features instead of helping users complete real jobs. This podcast presents a three-step framework — deconstruct, redesign, disrupt — and maps out four AI-native startup paths: unlocking new markets, wrapping mature tech, selling infrastructure, and transforming traditional industries. Practical and grounded with real-world examples throughout.

无人知晓
xiaoyuzhoufm.com
03-03
3805 words · 16 min
94
E45 Meng Yan in Conversation with Li Jigang: How Humans Find Their Place

This three-hour conversation between Meng Yan and Li Jigang centers on a deceptively simple premise: the Industrial Revolution took physical labor, AI is taking cognitive labor, and what remains for humans is what they call "heart force" — will, intuition, and aesthetic sensibility. From there, the conversation moves through the nature of the vector world, the shift in business models from "weaving networks" to "drilling deep wells," the fork between amplifying human distinctiveness through AI versus withdrawing from thinking altogether, and the evolution of education from knowledge-pouring to spark-finding. Li Jigang's two observations — "Your feed is your fate" and "prompts have shape" — are each worth sitting with independently. Worth listening to for anyone thinking seriously about what it means to be human in the AI era, beyond the technical discussion.

Datawhale
mp.weixin.qq.com
02-27
12468 words · 50 min
92
The 2028 Global Intelligence Crisis: Who Pays the Price?

A thought experiment written from a 2028 vantage point, tracing AI's economic left-tail risks: white-collar displacement → consumer spending contraction → private credit defaults → mortgage market stress — a feedback loop with no natural brake. Not a prediction, but a rigorous risk scenario worth reading carefully for anyone thinking about AI's macroeconomic implications.

    BestBlogs Issue #85: Harness Engineering | BestBlogs.dev