Addy Osmani —《工厂模型:Coding Agent 是如何改变软件工程的》(全文)
原文: https://addyosmani.com/blog/factory-model/ 作者: Addy Osmani(Google 软件工程师,负责 Google Cloud / Gemini;《Learning JavaScript Design Patterns》作者) 原文发布日期: 2026 年 2 月 25 日 本译版定位: 完整逐段翻译 + 译注。配套精读:Addy Osmani 三连:Factory Model / Comprehension Debt / Harness Engineering
译者前言
这是 2026 Q1 中文前端 / engineering 圈最值得逐句读的一篇文章。Addy Osmani 这篇在 2 月 25 日发布,后来又跟进了 Comprehension Debt 和 Harness Engineering 形成”三连发”,共同重塑了”AI 时代软件工程”的方法论叙事。
为什么这篇值得读全文,而不是只看精读?三个理由:
第一,它替你处理掉了 AI 编程的二元焦虑。中文圈关于 “AI 会不会取代程序员”的讨论,长期停留在两个极端:要么是”AI 让人人都能写代码,程序员要失业了”,要么是”AI 写的都是垃圾,根本用不了”。Addy 在这篇里给出了第三条路——Coding 变了,软件工程没变(coding has changed dramatically, software engineering, at its core, has not)。这个区分是这一轮所有讨论的基本盘,你最好原文读一遍。
第二,“工厂模型”这个比喻不是修辞,是工程实务。文中明确指出:工厂有质量管控、有流程文档、有对输入精确度的要求、环境不稳定就停工——这四点全部映射到 agentic 软件开发。读完这篇,你会知道为什么 2026 年”agent 编排能力”成为高级工程师的硬技能,以及具体投资什么(spec quality / test infrastructure / 架构理解)。
第三,它对”工程师价值上移”有非常具体的拆解。文章末尾给出 6 项关键能力(systems thinking / problem decomposition / architectural judgment / specification clarity / output evaluation / orchestration skill),这是一份非常具体的”AI 时代工程师能力地图”。和”职业规划”这种空泛建议比起来,这份清单可以直接拿来对照自己的弱项。
工厂模型:Coding Agent 是如何改变软件工程的
2026 年 2 月 25 日
原文:Something shifted recently with agentic engineering that “feels” like the level of abstraction changed again. Not the usual kind of shift where tools get marginally better and workflows gradually evolve. A step change. Developers who have been writing software for decades are describing it the same way: the center of gravity of the craft moved.
最近,在 agentic engineering(智能体工程)领域有些变化,“感觉”上像是抽象层级又上了一个台阶。这不是那种”工具略微变好、工作流缓慢演进”的常规变动,而是阶跃式的变化(step change)。已经写了几十年软件的开发者,正用同一种说法来描述它:这个手艺的重心,移动了。
原文:The most useful thing you can do right now is hold two ideas in tension simultaneously. Coding has changed dramatically. Software engineering, at its core, has not. That gap between the two is where the interesting story lives, and understanding it clearly is the difference between engineers who thrive in this era and engineers who get left behind by it.
你现在能做的最有用的事情,是同时把两个想法放在张力里:**编码(coding)发生了剧烈的改变;软件工程(software engineering)的内核,没变。**两者之间的那道缝,就是这个故事最有意思的地方。把它看清楚,是”在这个时代蓬勃发展的工程师”和”被这个时代甩下的工程师”之间的差别。
原文:I was reading Michael Truell (of Cursor’s) thoughts on this and wanted to expand on them.
我读了 Cursor 的 Michael Truell 关于这件事的一些想法,想在他的基础上展开聊一聊。
抽象的演进弧
原文:The history of software engineering is the history of raising abstraction. We moved from bits to instructions, from instructions to functions, from functions to objects, from objects to services, from services to distributed systems. Every jump in the stack made individual developers more productive and expanded the total population of people who could participate in building software.
**软件工程的历史,就是抽象层级不断上升的历史。**我们从比特走到指令,从指令走到函数,从函数走到对象,从对象走到服务,从服务走到分布式系统。技术栈上的每一次跃升,既让单个开发者更高效,也扩大了”能参与造软件的人”的总数。
原文:Assembly gave way to C. C gave way to managed languages and garbage collection. Managed languages gave way to frameworks, package ecosystems, and cloud infrastructure. Each transition felt disruptive at the time. Each one, in hindsight, was simply the next step in a long and consistent arc.
汇编让位给 C;C 让位给托管语言(managed languages)和垃圾回收;托管语言让位给框架、包生态、云基础设施。每一次切换在当时都让人觉得颠覆,但回头看,每一次都不过是同一条长长的、连贯的演进弧上的下一步。
原文:What we are living through right now is another step in that same arc. We are moving from writing code to orchestrating systems that write code.
我们当下正在经历的,是同一条弧线上的又一步。我们正在从”写代码”,走向”编排出能写代码的系统”。
原文:This framing was articulated by Grady Booch, who refers to it as software’s third age — a new golden era defined by rising abstraction, where the job of the developer shifts from writing instructions to defining intent.
这个框架由 Grady Booch 提出,他称之为软件的第三个黄金时代——一个由抽象上升所定义的新黄金时代,在这个时代里,开发者的工作从写指令变成定义意图(defining intent)。
🟢 译者注:Grady Booch 是 UML(统一建模语言)的三位主要发明人之一,IBM Fellow,业内做软件架构 60 年的元老级人物。这位老爷子说”第三个黄金时代”,分量很足。
原文:That framing matters because it tells you what to hold onto and what to let go of.
这个框架很重要,因为它告诉你什么该握紧,什么该放手。
AI 编程工具的三代演进
原文:It helps to be precise about the progression, because conflating the generations leads to underestimating how much has actually changed.
把这条演进路径讲准一点很有必要,因为把几代混为一谈,会让人低估到底发生了多大的变化。
原文:The first generation was accelerated autocomplete. Tools that predicted the next line, filled in boilerplate, and saved keystrokes on repetitive patterns. Useful. Genuinely time-saving. But the workflow stayed identical to what it had always been: you drove, the tool assisted. The feedback loop was: write code, run it, debug, repeat. The AI just reduced friction inside that loop.
第一代是加速版的自动补全(accelerated autocomplete)。工具预测下一行、填充模板代码、在重复模式上替你省按键。有用,实打实节省时间。但工作流和过去完全一样:你开车,工具协助。反馈循环还是那个老循环——写代码、跑代码、debug、重复。AI 只是降低了这个循环内部的摩擦力。
🟢 译者注:这一代的代表是 GitHub Copilot 早期版本(2021—2022)、Tabnine 等。
原文:The second generation introduced synchronous agents. You described a task in natural language. The model generated code. You reviewed it, corrected it, iterated toward a working result. This moved further up the stack. Less typing, more describing intent. But you were still present for every step. The agent was a collaborator, not an autonomous worker. You held the context, directed the next move, and caught mistakes in real time.
第二代引入了同步式 agent(synchronous agents)。你用自然语言描述一个任务,模型生成代码,你审一遍,纠正它,迭代到一个可用结果。这往栈上又移了一格——少打字,多描述意图。但你仍然在每一步都在场。Agent 是一个协作者,不是一个自主工作者。你把握上下文、指挥下一步、实时抓住错误。
🟢 译者注:这一代的代表是 Cursor、Windsurf、Claude Code 同步模式、ChatGPT-pair-programming 工作流(2023—2024)。
原文:The third generation introduced autonomous agents. These agents can take a specification and run with it for thirty minutes, an hour, several hours and increasingly days. They set up environments, install dependencies, write tests, hit failures, research solutions online, fix the failures, write the implementation, test it again, set up services, and produce artifacts you can review. You hand them a task, move on to something else, and come back to logs, previews, and pull requests. You are no longer interacting line by line. You are defining outcomes and reviewing results. This is where swarms of agents and even self-improving agents come into play.
第三代引入了自主型 agent(autonomous agents)。这些 agent 拿到一份规格说明书(spec),就能自己跑 30 分钟、1 小时、几个小时,甚至好几天。它们配置环境、装依赖、写测试、撞到失败、上网查解决方案、修好失败、写出实现、再次测试、起服务,最后产出你可以审阅的工件(artifacts)。你交给它一个任务,转头去做别的事,回来时面对的是日志、预览页面和 pull request。你不再是逐行交互了,你是在定义结果、审阅产出。这就是 agent 集群(swarms of agents)甚至自我改进型 agent登场的地方。
🟢 译者注:这一代的代表是 Claude Code subagents、Cursor
/multitask、Cognition Devin、Anthropic skill.md long-running agent(2025—2026)。
原文:That changes the cadence of work in ways that are hard to fully communicate until you experience it. Tasks that were weekend projects three months ago are now something you kick off and check on thirty minutes later.
这件事对工作节奏的改变,你不亲身体验过,就很难完全感受到。三个月前还是”周末项目”规模的任务,现在你启动一下,30 分钟后回来检查就行了。
工厂心智模型(Factory Mental Model)
原文:The most useful mental model for this new paradigm is that you are no longer just writing code. You are building the factory that builds your software.
这个新范式最有用的心智模型是:你不再仅仅是在写代码。你是在搭建那个能造出你软件的工厂。
原文:That factory consists of fleets of agents. Each agent has a task, a toolbelt (repositories, test runners, deployment scripts, documentation), context (specs, architecture decisions, prior constraints), and a feedback loop. Instead of hand-holding a single agent through a single task, you spin up many agents in parallel. One handles backend refactors. Another implements a feature. Another writes integration tests. Another updates documentation. You review outputs, give feedback, refine specs, and redeploy.
这个工厂由一队队 agent组成。每个 agent 有它的任务、它的工具腰带(代码仓库、测试运行器、部署脚本、文档)、它的上下文(specs、架构决策、既有约束)、还有它的反馈循环。你不再一手把一个 agent 牵过一个任务,而是并行起一大批 agent:一个搞后端重构,一个实现新功能,一个写集成测试,一个更新文档。你审输出、给反馈、精修 spec、重新部署。
原文:The analogy runs deeper than it might first appear. A factory has quality control. A factory has process documentation. A factory has inputs that need to be precisely specified or the outputs come out wrong. A factory stalls when the environment is unreliable. All of these properties map directly onto agentic software development, and taking the analogy seriously points you toward the investments that actually matter.
这个类比比乍看上去要深得多。工厂有质量管控;工厂有流程文档;工厂的输入必须被精确规定,否则输出就是错的;工厂会在环境不可靠时停工。所有这些性质,都直接映射到 agentic 软件开发上。把这个类比认真对待,就会指向那些”真正值得投资”的方向。
原文:Inside teams that have adopted this model aggressively, a substantial portion of merged pull requests now originate from agents running autonomously in cloud environments. That is not theoretical anymore. It is production reality for a growing number of engineering organizations.
在已经激进地采用这套模型的团队里,目前合入的 pull request 中相当一部分,来自在云环境里自主运行的 agent。这已经不是理论,而是越来越多工程组织的生产现实。
原文:The sentiments from Cursor around “The developer’s job is becoming building the system that builds the software, the factory, not just the product” and “reviewing ideas is a lot more fun than reviewing code” (video) resonate with these points.
Cursor 那边的一些表述——“开发者的工作正在变成搭建那个能造出软件的系统、那个工厂,而不只是产品本身”以及”审阅想法比审阅代码有趣得多”(视频)——和上面这些观点呼应得很好。
这里有一个新人 onboarding 的对应关系
原文:One of the most striking patterns in how agents actually behave is how closely their work loop mirrors onboarding a new engineer.
agent 实际行为中最显眼的一个模式是:它们的工作循环,和新工程师 onboarding 的循环,惊人地相似。
原文:You hand them a spec. They break it into subtasks. They explore the codebase to understand the lay of the land. When they get stuck, they search commit history. They run git blame to figure out who last touched a subsystem. They escalate to the appropriate human for domain knowledge, via Slack or similar communication channels, and then continue. They iterate until the output meets the acceptance criteria.
你给它一份 spec。它把它拆成子任务。它探索代码库以了解地形。卡住的时候,它去搜提交历史。它跑 git blame 搞清楚某个子系统上一次是谁动的。它通过 Slack 之类的渠道,升级请求给合适的人去要领域知识,然后继续。它迭代到输出满足验收标准为止。
原文:That loop is familiar because it is how people work. The implication is significant. Slack and email are becoming interfaces between humans and agents, not just between humans and humans. Git history is evolving into a knowledge graph that agents navigate to understand architectural decisions. Documentation is becoming training material for autonomous execution.
这个循环之所以让你觉得熟悉,是因为它就是人是怎么工作的。这个对应关系的意义非常大:Slack 和 email 正在变成”人 ↔ agent”之间的接口,不再只是”人 ↔ 人”之间。Git history 正在演化为一张知识图谱,agent 在上面导航以理解架构决策。文档正在变成自主执行的训练材料。
原文:If you want to think clearly about what investment to make in your codebase right now, ask yourself: could a new engineer, given only the documentation and commit history available, understand why the code is structured this way? If the answer is no, agents will struggle there too, and the leverage you could be getting will be limited.
如果你想清晰地思考”我现在应该往代码库里投资什么”,问自己一个问题:**一个只有文档和提交历史可用的新工程师,能不能理解代码为什么是这样组织的?**如果答案是 no,那么 agent 在这块同样会挣扎,你能拿到的杠杆就被卡住了。
你的 spec 才是真正的杠杆
原文:Here is the insight that reshapes how you think about your own value as an engineer.
下面这个洞见,会重新塑造你怎么看待”自己作为工程师的价值”。
原文:If you can orchestrate twenty, thirty, fifty agents running in parallel, the difference between mediocre output and exceptional output comes down almost entirely to the quality of your specification. At that scale, vague thinking does not just slow you down. It multiplies. Ambiguous requirements propagate through dozens of parallel autonomous runs, each one going slightly wrong in a slightly different direction. Poor architectural decisions made upfront do not affect one implementation. They propagate across the entire fleet.
如果你能并行编排 20、30、50 个 agent 一起跑,平庸输出和卓越输出的区别,几乎全部归结到 spec 的质量。在这个规模下,模糊的思考不是把你拖慢——它会被放大几十倍。模糊的需求会在几十次并行自主运行中扩散开,每一次都朝着略微不同的方向偏一点点。前期糟糕的架构决策不再只影响一份实现,它会扩散到整支舰队上。
原文:You cannot write a spec that survives that environment unless you deeply understand the architecture, the integration boundaries, the edge cases, the failure modes, and the invariants that must never break. The spec is not a prompt anymore. The spec is the product thinking made explicit.
**你写不出能在这种环境里存活下来的 spec——除非你深刻理解架构、集成边界、边界 case、失败模式,以及那些绝对不能破的不变量。**Spec 不再是一个 prompt 了,spec 是把产品思考显性化的产物。
原文:This is why strong software engineers get more leverage from these tools than weak ones, not less. The mechanical work of typing code is being automated. The cognitive work of understanding systems is being amplified. Every hour you spend developing genuine architectural understanding and systems thinking now pays dividends across an entire fleet of autonomous workers rather than just your own output.
这就是为什么强工程师从这些工具里拿到的杠杆,比弱工程师更多,而不是更少。打字这种机械工作正在被自动化,而理解系统这种认知工作正在被放大。你花在”建立真正的架构理解和系统思维”上的每一个小时,现在都不再只回报你自己的输出,而是回报整支自主工作者的舰队。
🟢 译者注:这一段是这篇文章里中文圈最容易误读的一段。“strong engineer 拿到更多杠杆”不是说”高级工程师不会被替代”,而是说杠杆比的是”系统理解 + spec 表达”两个维度——你在这两个维度上的能力,会被放大几十倍输出。这才是 AI 时代的”价值上移”具体长什么样。
什么并没有真正改变
原文:It is worth being precise here, because the hype around AI coding can create the impression that traditional software engineering skills have been deprecated.
这一段值得说精确,因为 AI 编程的热潮容易给人一种印象:传统软件工程技能已经被废弃了。
原文:They have not. Consider what agentic development still requires from you:
它们没有。看看 agentic 开发仍然要求你做的事:
原文:Clear requirements. If you cannot articulate what success looks like in a way that can be evaluated, no amount of autonomous execution will produce it. Agents cannot clarify requirements they are never given. They will fill the gaps with assumptions, and those assumptions compound.
清晰的需求。**如果你无法以一种可被评估的方式说清楚”成功长什么样”,再多的自主执行都造不出它。Agent 没办法澄清你从来没给过的需求。**它们会用假设来填补缺口,而那些假设会层层叠加。
原文:Strong abstractions. An agent given a well-designed system with clear module boundaries, coherent interfaces, and good separation of concerns will produce better results than an agent given a tangled codebase where everything depends on everything else. Clean architecture does not become less valuable when agents are doing the implementation. It becomes more valuable, because agents amplify the properties of the system they are working in.
优良的抽象。一个被丢进”模块边界清晰、接口连贯、职责分离做得好”的系统的 agent,产出的结果远好于一个被丢进”任何东西都依赖任何东西”的 agent。当 agent 来做实现的时候,干净架构不是变得没那么重要,而是变得更重要——因为 agent 会放大它所工作系统的属性。
原文:Reliable tests. This deserves its own section.
可靠的测试。这一项值得单独一节。
原文:Careful tradeoffs. Agents optimize for the stated objective. They do not naturally balance competing concerns, anticipate second-order effects, or flag when a technically correct solution is the wrong product decision. That judgment still lives with you.
谨慎的权衡。Agent 围绕你陈述的目标做优化。它们不会天然地权衡相互冲突的关切、预判二阶效应,也不会主动指出”这个方案技术上对、但作为产品决策是错的”——这种判断,仍然属于你。
原文:Human oversight. Agents do impressive work. They also make confident mistakes. The output quality is high enough to get past casual review, which means the bar for your review skills actually increases, not decreases.
人类监督。Agent 能干出令人惊艳的活,它们也会犯非常自信的错。它们的产出质量足以骗过随便看一眼的审阅,这意味着你审阅能力的门槛,不是降低了,而是提高了。
为什么测试比以往更重要
原文:Good tests and Test-driven development (TDD) were already good practice. In an agentic workflow, they becomes something close to mandatory.
好的测试和测试驱动开发(TDD)本来就是好实践。在 agentic 工作流里,它们近乎成为必需。
原文:The idea is precise enough to be worth stating clearly. Red/green TDD means you write the tests before you write the implementation. You confirm the tests fail (red phase). Then you iterate on the implementation until the tests pass (green phase). That sequence is not optional ceremony. It is the mechanism that gives you confidence the implementation is actually doing what you think it is.
这个想法精确到值得明确陈述。Red/green TDD 是这样:你先写测试再写实现;先确认测试失败(red 阶段),再迭代实现直到测试通过(green 阶段)。这个顺序不是可选的仪式——它是你对”实现真的在做你以为它在做的事”建立信心的机制。
🟢 译者注:Red/green/refactor 是 Kent Beck 在《Test-Driven Development: By Example》里的经典 TDD 三步:先写一个让测试 红(red)的失败用例,再写最小的实现让它 绿(green),最后做 refactor。中文圈里有时被翻成”红绿灯 TDD”。
原文:With a single developer writing code, the downside of skipping test-first development is that you might write a test that passes regardless of whether your implementation is correct, or miss edge cases that get caught later as regressions. Those are real costs. They are manageable.
在单个开发者写代码的场景下,跳过 test-first 的代价是:你可能写出一个不管实现对不对都会通过的测试,或漏掉一些后来作为 regression 被发现的边界 case。这是真实成本,但是可控的。
原文:With a fleet of agents generating code across dozens of parallel tasks, the costs compound severely. An agent optimizing for passing tests will find ways to pass them. If the tests were written after the implementation, they are likely testing what the implementation happens to do rather than what it should do. You now have a large surface area of code with a test suite that confirms the wrong thing. A comprehensive, test-first suite is by far the most effective lever you have for ensuring that autonomous output is actually correct and for protecting existing functionality as the codebase grows.
但当你有一队 agent 在数十个并行任务里生成代码,这个代价会严重复利。一个以”让测试通过”为优化目标的 agent,会想尽办法让测试通过。如果测试是在实现之后写的,它们大概率测的是”实现碰巧在做什么”,而不是”实现应该在做什么”。你现在面对的就是一大片代码,以及一套确认错误事情的测试。“先测试、覆盖全面”的测试套件,是你能拿到的最有效的杠杆——保证自主产出是真的正确,也保护代码库扩张时既有功能不被破坏。
原文:“Red/green TDD” is a shorthand every good model understands. It captures a specific discipline: write tests first, confirm they fail before implementing, make them pass through correct implementation rather than by gaming the test. Telling an agent to use red/green TDD is one of the highest-leverage instructions you can give at the start of a task.
“Red/green TDD” 是一个每个好模型都理解的简写。它精确地表达了一种纪律:先写测试,确认失败再做实现,通过正确的实现让它们通过——而不是靠 game 测试。在一个任务开头,告诉 agent 用 red/green TDD,是你能给的杠杆率最高的指令之一。
🟢 译者注:对实操有兴趣的可以记一下:Claude Code / Cursor 等当前的 agent 都能正确解读”red/green TDD”这个 prompt。Addy 这一句的实务含义是——你不需要详细给它讲 TDD 是什么,这个词本身就是一个高密度指令。
没解决的问题不是生成,而是验证(verification)
原文:Generation is not the bottleneck anymore. Verification is.
生成不再是瓶颈了,验证才是。
原文:Agents can produce impressive output. The challenge is knowing with confidence whether that output is correct. Several factors make this harder than it first appears.
agent 能产出令人印象深刻的输出。挑战是,你怎么有信心地判断这个输出是否正确。有几个因素让这件事比看上去更难。
原文:Tests that pass before a change does not mean they will catch regressions introduced by the change. Agents can write tests that are technically valid but miss the cases that matter. UI verification remains brittle, with visual and behavioral regressions slipping through because automated tools are not yet reliable enough to catch them all. Context window limitations mean that agents working on large codebases may miss important constraints or patterns that exist outside the window they are currently reasoning over. Flaky environments, which a single developer encounters as an annoying edge case and works around, become systemic blockers when you have forty agents hitting the same flaky test simultaneously. The factory stalls.
改动前测试通过,不代表它们会抓住改动引入的 regression。agent 能写出技术上合法、但漏掉关键 case 的测试。UI 验证依然脆弱——视觉和行为上的 regression 会溜过去,因为自动化工具还不够可靠到能全部抓住。上下文窗口限制意味着:在大代码库上工作的 agent,可能错过那些落在它当前推理窗口之外的重要约束或模式。Flaky 环境(不稳定的环境)对单个开发者来说,只是一个让人烦的边界 case,绕开就行;但是当你有四十个 agent 同时撞同一个 flaky 测试时,它就变成了系统性堵点。工厂停工。
原文:The infrastructure that needs to exist to support this model at scale includes better automated regression detection, artifact-level validation that goes beyond diffing changed lines, reliable and fast environment provisioning, and guardrails that hold up under parallel workloads. These are active areas of investment. They are not solved.
要在规模上支撑这个模型,所需的基础设施包括:更好的自动化 regression 检测、超越”diff 改动行”层级的工件级验证、可靠且快速的环境置备(environment provisioning)、以及能在并行负载下扛住的 guardrails(护栏)。这些都是当前正在被投资的活跃领域,但都还没解决。
原文:Until verification catches up with generation, human review is not optional overhead. It is the safety system. The appropriate response to impressive agent output is not to trust it because it looks good. It is to have the architectural understanding and testing discipline to evaluate it rigorously.
在验证(verification)追上生成(generation)之前,人类审阅不是可选的 overhead,它是安全系统。面对令人惊艳的 agent 输出的正确反应,不是”看上去不错所以信它”,而是用你的架构理解和测试纪律,严格评估它。
高杠杆工程的新形态
原文:The engineers who will have the most impact in this era will not be distinguished by how fast they type or how well they remember syntax. They will be distinguished by a different set of capabilities.
这个时代影响力最大的工程师,不会以打字速度或语法记忆量来定义。他们会以另一组能力来区分。
原文:Systems thinking. The ability to hold a complex architecture in mind, understand how components interact, and anticipate how a change in one place affects behavior elsewhere. This is harder to develop than typing speed and far more valuable when you are managing a fleet of agents whose outputs you have to integrate.
系统思维(Systems thinking)。能在脑中容纳一个复杂架构、理解组件如何交互、预判一处改动如何影响别处的行为。这比打字速度难培养得多,而且当你要管理一队 agent、并要把它们的产出整合起来的时候,它远比打字快重要。
原文:Problem decomposition. Knowing how to break a large, ambiguous goal into well-scoped subtasks that an agent can execute reliably. Tasks that are too large tend to go off-track. Tasks that are poorly scoped get interpreted incorrectly. The skill of decomposing problems well, and then verifying that the decomposition was right, is a genuine craft.
问题分解(Problem decomposition)。知道怎么把一个大而模糊的目标,切成一组”边界清晰、agent 能可靠执行”的子任务。任务太大会跑偏,任务边界不清会被理解错。把问题分解好、然后验证分解是否正确——这是一门真正的手艺。
原文:Architectural judgment. Understanding why a system is designed the way it is, what properties it is optimizing for, and what tradeoffs were made. Agents can implement. They cannot judge whether what they are implementing is the right design.
架构判断(Architectural judgment)。理解一个系统为什么这么设计、它在为哪些属性优化、做了哪些权衡。Agent 能实现,但它们没法判断自己在实现的东西是不是正确的设计。
原文:Specification clarity. The ability to write requirements that are unambiguous, complete with respect to the important edge cases, and structured in a way that makes evaluation straightforward. Vague specs produce vague results. Precise specs multiply into precise implementations.
规格清晰度(Specification clarity)。能写出无歧义、对重要边界 case 完备、结构上让评估变简单的需求的能力。模糊的 spec 产出模糊的结果。精确的 spec 会被乘出精确的实现。
原文:Output evaluation. The taste to recognize when something looks correct but is not, when an implementation solves the stated problem but creates a new one, when the architecture of the solution does not fit the architecture of the rest of the system. This judgment is not automatable.
产出评估(Output evaluation)。这是一种品味——能识别”看起来对其实不对”、“实现解决了陈述的问题但创造了新问题”、“方案架构不匹配系统其他部分架构”的能力。这种判断无法自动化。
原文:Orchestration skill. The practical ability to manage multiple parallel workstreams, give effective feedback on agent outputs, recognize when an agent needs to be redirected versus retasked, and maintain coherence across a fleet of autonomous workers.
编排能力(Orchestration skill)。实际地管理多个并行工作流、对 agent 产出给出有效反馈、判断”该让一个 agent 改方向”还是”该重新派任务”、并在一队自主工作者之间保持整体一致性的能力。
原文:None of these are new skills, exactly. Good engineers have always needed them. What has changed is their relative importance. The mechanical parts of software development are being increasingly handled by machines. The cognitive parts are being amplified.
严格说,这些都不是新技能,好工程师从来都需要它们。变了的是它们的相对重要性。软件开发的机械部分越来越多被机器处理,认知部分则被放大。
更大的图景是什么?
原文:New website creation is up 40% year over year. New iOS apps are up nearly 50%. GitHub code pushes jumped 35% in the US. All of these metrics were flat for years before late 2024. The graphs look like hockey sticks. People who have never written a line of code are building and launching software.
新网站创建数同比上涨 40%;新 iOS app 上涨接近 50%;GitHub 在美国的代码 push 量跃升 35%。这些指标在 2024 年下半年之前已经平了好几年。图表看起来像曲棍球棒。从未写过一行代码的人,正在构建并发布软件。
原文:Keep in mind, we can and should note that more quantity does not necessarily mean better quality. But the fact remains that the barrier to creating software has dropped dramatically, and that is a fundamental shift in the landscape of software engineering.
请记住,我们可以也应该指出:数量更多,不一定等于质量更好。但事实仍然是,创造软件的门槛大幅下降了,这是软件工程版图上一次根本性的转变。
原文:The barrier to creating software has genuinely dropped. That is not hype. What it means for professional engineers is not that their skills are less valuable, but that the skills that matter have shifted up the stack, as they have in every previous transition.
创造软件的门槛真正降了。这不是炒作。这对专业工程师意味着的不是技能贬值,而是”重要技能向上移了一格”,和过去每一次转变一样。
原文:The developers who thrived after the move from assembly to C were not the ones who could write the most clever assembly. They were the ones who understood what the machine needed to do and could express that intent clearly in a higher-level language. The developers who thrived after the move to managed languages and frameworks were not the ones most resistant to garbage collection. They were the ones who saw the freed-up cognitive capacity as an opportunity to solve harder problems.
从汇编到 C 的转变中,蓬勃发展的开发者,不是那些能写出最巧妙汇编的人,而是那些理解”机器要做什么”并能用更高层语言清晰表达意图的人。从原生语言到托管语言 + 框架的转变中,蓬勃发展的开发者,不是最抗拒垃圾回收的那批人,而是把腾出来的认知容量当作”解决更难问题”的机会的那批人。
原文:The developers who will thrive in the agentic era are the ones who understand this as another step in the same arc and invest accordingly. Not in resisting the tools. Not in deferring to them uncritically. In developing the judgment, clarity, and systems thinking that make the tools maximally effective.
在 agentic 时代蓬勃发展的开发者,会是那些把它理解成”同一弧线上的下一步”、并据此投资的人。不是抵抗工具,不是不加批判地服从工具,而是培养能让工具最大限度有效的那种判断力、清晰度、系统思维。
原文:That means writing better specs. Investing in test infrastructure. Developing genuine architectural understanding rather than surface familiarity. Building the taste to evaluate output rigorously. Practicing problem decomposition until it becomes second nature.
具体说来,这意味着:写更好的 spec;投资测试基础设施;发展真正的架构理解,而不是表面熟悉;培养严格评估产出的品味;反复练习问题分解,直到它成为第二天性。
原文:The era of programming as primarily a keystroke activity is over. The era of programming as primarily a thinking and judgment activity has been accelerating for decades and just shifted into a higher gear.
“以敲键盘为主”的编程时代结束了。“以思考和判断为主”的编程时代,已经加速了几十年,刚刚又升上了一档。
原文:The factory model is not a metaphor about losing control of software. It is a metaphor about building leverage. The engineers who understand that will build the most interesting things of the next decade.
工厂模型不是一个”工程师失去对软件控制权”的比喻,它是一个”建立杠杆”的比喻。理解这一点的工程师,会在下一个十年造出最有意思的东西。
译者总评
5 个 takeaway,2026 中文工程师最该带走的:
-
“Coding 变了,Software Engineering 没变”是这一轮所有 AI 编程讨论的基本盘。Addy 这个区分极其重要——它一刀切掉了”AI 取代程序员”的二元焦虑,把战场拉回到”软件工程”这件事上。在中文圈 AI 编程话题动不动滑向极端的当下,这一句够你单独打印贴墙上。
-
“工厂模型”不是修辞,是 4 个工程实务的具体清单:质量管控、流程文档、精确输入、稳定环境。如果你在团队里做工程化设计,这 4 项可以直接做成投资清单。读完这篇你会立刻知道:CI/CD、测试基础设施、环境置备、guardrails、context 工程,这些不是”锦上添花”,而是工厂能不能开动的电源。
-
“spec 是新代码、TDD 是新护栏”。在 agent 并行场景下,前期一个模糊需求会被乘 30 倍变成 30 份不同的错误实现;后写的测试会变成”确认错误事情”的工具。Red/green TDD 是一个所有好模型都理解的简写——这是一个非常具体、立刻可用的 prompt 工程技巧。
-
6 项核心能力是 AI 时代工程师的能力地图:systems thinking / problem decomposition / architectural judgment / specification clarity / output evaluation / orchestration skill。建议做自己的能力对照表——你会发现自己的弱项可能不是某种新框架,而是”问题分解”或”产出评估”这种被严重低估的内功。
-
“verification is the bottleneck”是 2026 年要看的最重要工程方向。这个观点和 Anthropic / Cursor / Cognition 等头部公司的工程实践方向高度一致——agent 编排、artifact-level 验证、guardrails、context 工程,整个 infrastructure 层会在 2026—2027 进入大投资期。如果你在做工具、平台、infra 创业,这是趋势风口。
🔗 调研来源
- 原文: https://addyosmani.com/blog/factory-model/(2026-02-25)
- 配套精读: Addy Osmani 三连
- 关联阅读: Comprehension Debt(全文)
- 关联阅读: Grady Booch “The Third Golden Age of Software”
- 关联阅读: Michael Truell (Cursor) 推文
📝 配套精读 + 译者点评:Addy Osmani 三连:Factory Model / Comprehension Debt / Harness Engineering