为什么我不看好 2025 年人工智能代理的出现

https://utkarshkanwat.com/writing/betting-against-agents/

Everyone says 2025 is the year of AI agents. The headlines are everywhere: "Autonomous AI will transform work," "Agents are the next frontier," "The future is agentic." Meanwhile, I've spent the last year building many different agent systems that actually work in production. And that's exactly why I'm betting against the current hype.
人人都说2025年是人工智能代理之年。各种头条新闻随处可见：“自主人工智能将改变工作”、“代理是下一个前沿”、“未来属于代理”。与此同时，我去年花了一年时间构建了许多实际投入生产的代理系统。这正是我反对当前炒作的原因。

I'm not some AI skeptic writing from the sidelines. Over the past year, I've built more than a dozen production agent systems across the entire software development lifecycle:
我并非旁观者笔下的人工智能怀疑论者。在过去的一年里，我构建了十多个贯穿整个软件开发生命周期的生产代理系统：

Development agents: UI generators that create functional React components from natural language, code refactoring agents that modernize legacy codebases, documentation generators that maintain API docs automatically, and function generators that convert specifications into working implementations.
开发代理 ：使用自然语言创建功能性 React 组件的 UI 生成器、使遗留代码库现代化的代码重构代理、自动维护 API 文档的文档生成器以及将规范转换为工作实现的函数生成器。

Data & Infrastructure agents: Database operation agents that handle complex queries and migrations, DevOps automation AI systems managing infrastructure-as-code across multiple cloud providers.
数据和基础设施代理 ：处理复杂查询和迁移的数据库操作代理，跨多个云提供商管理基础设施即代码的 DevOps 自动化 AI 系统。

Quality & Process agents: AI-powered CI/CD pipelines that fix lint issues, generate comprehensive test suites, perform automated code reviews, and create detailed pull requests with proper descriptions.
质量和流程代理 ：由 AI 驱动的 CI/CD 管道，可修复 lint 问题、生成全面的测试套件、执行自动代码审查并创建具有适当描述的详细拉取请求。

These systems work. They ship real value. They save hours of manual work every day. And that's precisely why I think much of what you're hearing about 2025 being "the year of agents" misses key realities.
这些系统确实有效，它们创造了真正的价值，每天节省了数小时的人工工作。正因如此，我认为大家听到的关于2025年是“代理商之年”的说法，很多都忽略了关键的现实。

**TL;DR: Three Hard Truths About AI Agents

关于人工智能代理的三个残酷事实**

After building AI systems, here's what I've learned:
在构建人工智能系统之后，我学到了以下几点：

Error rates compound exponentially in multi-step workflows. 95% reliability per step = 36% success over 20 steps. Production needs 99.9%+.
在多步骤工作流程中，错误率会呈指数级增长。每步95%的可靠性=20步内36%的成功率。生产需要99.9%以上的可靠性。
Context windows create quadratic token costs. Long conversations become prohibitively expensive at scale.
上下文窗口会产生二次方的令牌成本。长时间对话的成本在规模化之后会变得极其高昂。
The real challenge isn't AI capabilities, it's designing tools and feedback systems that agents can actually use effectively.
真正的挑战不是人工智能的能力，而是设计代理可以真正有效使用的工具和反馈系统。

The Mathematical Reality No One Talks About

无人谈论的数学现实

Here's the uncomfortable truth that every AI agent company is dancing around: error compounding makes autonomous multi-step workflows mathematically impossible at production scale.
这是每个人工智能代理公司都在回避的一个令人不安的事实：错误复合使得自主的多步骤工作流程在生产规模上在数学上不可能实现。

Error Compounding in AI Agent Workflows
AI 代理工作流中的错误复合

Let's do the math. If each step in an agent workflow has 95% reliability, which is optimistic for current LLMs,then:
让我们来算一下。如果代理工作流程中每个步骤的可靠性达到 95%（这对于目前的 LLM 来说是一个乐观的估计），那么：

5 steps = 77% success rate
5个步骤=77%的成功率
10 steps = 59% success rate
10步=59%成功率
20 steps = 36% success rate
20步=36%成功率

Production systems need 99.9%+ reliability. Even if you magically achieve 99% per-step reliability (which no one has), you still only get 82% success over 20 steps. This isn't a prompt engineering problem. This isn't a model capability problem. This is mathematical reality.
生产系统需要 99.9% 以上的可靠性。即使你神奇地实现了每步 99% 的可靠性（目前还没有人做到），你在 20 步中的成功率仍然只有 82%。这不是一个即时工程问题。这不是一个模型能力问题。这是数学现实。

My DevOps agent works precisely because it's not actually a 20-step autonomous workflow. It's 3-5 discrete, independently verifiable operations with explicit rollback points and human confirmation gates. The "agent" handles the complexity of generating infrastructure code, but the system is architected around the mathematical constraints of reliability.
我的 DevOps 代理之所以能够有效运作，正是因为它实际上并非一个包含 20 步的自主工作流程。它由 3-5 个独立的、可独立验证的操作组成，并带有明确的回滚点和人工确认门。“代理”负责处理生成基础设施代码的复杂性，但整个系统的架构设计却围绕着可靠性的数学约束展开。

Every successful agent system I've built follows the same pattern: bounded contexts, verifiable operations, and human decision points (sometimes) at critical junctions. The moment you try to chain more than a handful of operations autonomously, the math kills you.
我构建的每个成功的代理系统都遵循相同的模式：有界上下文、可验证操作，以及（有时）在关键节点处的人为决策点。一旦你尝试自主地串联多个操作，数学计算就会把你逼上绝路。

The Token Economics That Don't Add Up

不合理的代币经济学

There's another mathematical reality that agent evangelists conveniently ignore: context windows create quadratic cost scaling that makes conversational agents economically impossible.
代理布道者很容易忽略另一个数学现实：上下文窗口会产生二次成本缩放，从而让对话代理在经济上变得不可能。

Here's what actually happens when you build a "conversational" agent:
当你构建一个“对话”代理时，实际上会发生以下情况：

Each new interaction requires processing ALL previous context
每次新的互动都需要处理所有先前的上下文
Token costs scale quadratically with conversation length
代币成本与对话长度呈二次方关系
A 100-turn conversation costs $50-100 in tokens alone
一次 100 轮对话仅代币成本就高达 50-100 美元
Multiply by thousands of users and you're looking at unsustainable economics
乘以数千名用户，你就会面临不可持续的经济效益

I learned this the hard way when prototyping a conversational database agent. The first few interactions were cheap. By the 50th query in a session, each response was costing multiple dollars - more than the value it provided. The economics simply don't work for most scenarios.
我在设计一个对话数据库代理的原型时，吃过不少苦头才明白这一点。最初的几次交互成本很低。但到了会话中的第 50 个查询时，每次响应的成本就高达数美元——远远超过了它所提供的价值。在大多数情况下，这种经济效益根本行不通。

Token Cost Scaling in Conversational Agents
对话代理中的令牌成本缩放

My function generation agent succeeds because it's completely stateless: description → function → done. No context to maintain, no conversation to track, no quadratic cost explosion. It's not a "chat with your code" experience, it's a focused tool that solves a specific problem efficiently.
我的函数生成代理之所以成功，是因为它完全无状态：描述 → 函数 → 完成。无需维护上下文，无需跟踪对话，也不会出现二次成本爆炸。它并非“与代码对话”的体验，而是一个能够高效解决特定问题的专注工具。

The most successful "agents" in production aren't conversational at all. They're smart, bounded tools that do one thing well and get out of the way.
生产中最成功的“代理”根本不是对话型的。它们是智能的、有界限的工具，能把一件事做好，然后就不再干扰别人。

The Tool Engineering Reality Wall

工具工程现实墙

Even if you solve the math problems, you hit a different kind of wall: building production-grade tools for agents is an entirely different engineering discipline that most teams underestimate.
即使你解决了数学问题，你也会遇到另一种障碍：为代理构建生产级工具是一门完全不同的工程学科，大多数团队都低估了这一点。

Tool calls themselves are actually quite precise now. The real challenge is tool design. Every tool needs to be carefully crafted to provide the right feedback without overwhelming the context window. You need to think about:
现在，工具调用本身实际上已经相当精确了。真正的挑战在于工具设计。每个工具都需要精心设计，才能提供正确的反馈，而不会让上下文窗口变得过于复杂。你需要考虑：

How does the agent know if an operation partially succeeded? How do you communicate complex state changes without burning tokens?
代理如何知道操作是否部分成功？如何在不销毁代币的情况下传达复杂的状态变化？
A database query might return 10,000 rows, but the agent only needs to know "query succeeded, 10k results, here are the first 5." Designing these abstractions is an art.
数据库查询可能会返回 10,000 行，但代理只需要知道“查询成功，10,000 个结果，这里是前 5 个”。设计这些抽象是一门艺术。
When a tool fails, what information does the agent need to recover? Too little and it's stuck; too much and you waste context.
当工具出现故障时，代理需要哪些信息来恢复？信息太少，系统就会卡住；信息太多，就会浪费上下文。
How do you handle operations that affect each other? Database transactions, file locks, resource dependencies.
如何处理相互影响的操作？数据库事务、文件锁、资源依赖关系。

My database agent works not because the tool calls are unreliable, but because I spent weeks designing tools that communicate effectively with the AI. Each tool returns structured feedback that the agent can actually use to make decisions, not just raw API responses.
我的数据库代理之所以能正常工作，并非因为工具调用不可靠，而是因为我花了数周时间设计能够与人工智能有效通信的工具。每个工具都会返回结构化的反馈，代理可以实际使用这些反馈来做出决策，而不仅仅是原始的 API 响应。

The companies promising "just connect your APIs and our agent will figure it out" haven't done this engineering work. They're treating tools like human interfaces, not AI interfaces. The result is agents that technically make successful API calls but can't actually accomplish complex workflows because they don't understand what happened.
那些承诺“只需连接你的 API，我们的代理就能搞定一切”的公司并没有完成这项工程工作。他们把工具当成了人机界面，而不是 AI 界面。结果就是，代理虽然技术上能够成功调用 API，但实际上却无法完成复杂的工作流程，因为它们根本不了解发生了什么。

The dirty secret of every production agent system is that the AI is doing maybe 30% of the work. The other 70% is tool engineering: designing feedback interfaces, managing context efficiently, handling partial failures, and building recovery mechanisms that the AI can actually understand and use.
每个生产代理系统都有一个不可告人的秘密：AI 可能只完成了 30% 的工作。剩下的 70% 则属于工具工程：设计反馈接口、高效管理上下文、处理部分故障，以及构建 AI 能够真正理解和使用的恢复机制。

The Integration Reality Check

整合现实检验

But let's say you solve the reliability problems and the economics. You still have to integrate with the real world, and the real world is a mess.
但即使你解决了可靠性问题和经济性问题，你仍然需要与现实世界融合，而现实世界是一团糟。

Enterprise systems aren't clean APIs waiting for AI agents to orchestrate them. They're legacy systems with quirks, partial failure modes, authentication flows that change without notice, rate limits that vary by time of day, and compliance requirements that don't fit neatly into prompt templates.
企业系统并非等待 AI 代理进行编排的简洁 API。它们是遗留系统，存在各种怪癖、部分故障模式、未经通知而更改的身份验证流程、随时间变化的速率限制以及无法完全符合提示模板的合规性要求。

My database agent doesn't just "autonomously execute queries." It navigates connection pooling, handles transaction rollbacks, respects read-only replicas, manages query timeouts, and logs everything for audit trails. The AI handles query generation; everything else is traditional systems programming.
我的数据库代理不仅仅是“自主执行查询”。它还能导航连接池、处理事务回滚、尊重只读副本、管理查询超时，并记录所有内容以供审计跟踪。AI 负责处理查询生成；其余一切都由传统的系统编程完成。

The companies promising "autonomous agents that integrate with your entire tech stack" are either overly optimistic or haven't actually tried to build production systems at scale. Integration is where AI agents go to die.
那些承诺“自主代理能与你的整个技术栈集成”的公司，要么过于乐观，要么实际上并没有尝试构建大规模生产系统。集成才是 AI 代理的死穴。

What Actually Works (And Why)

什么真正有效（以及为什么）

After building more than a dozen different agent systems across the entire software development lifecycle, I've learned that the successful ones share a pattern:
在整个软件开发生命周期中构建了十几个不同的代理系统之后，我了解到成功的系统都有一个共同的模式：

My UI generation agent works because humans review every generated interface before deployment. The AI handles the complexity of translating natural language into functional React components, but humans make the final decisions about user experience.
我的 UI 生成代理之所以有效，是因为在部署之前，每个生成的界面都会经过人工审核。AI 负责处理将自然语言转化为 React 功能组件的复杂工作，而最终用户体验的决定权则掌握在人类手中。
My database agent works because it confirms every destructive operation before execution. The AI handles the complexity of translating business requirements into SQL, but humans maintain control over data integrity.
我的数据库代理之所以有效，是因为它在执行之前会确认每个破坏性操作。AI 负责处理将业务需求转换为 SQL 的复杂性，而人类则掌控着数据完整性。
My function generation agent works because it operates within clearly defined boundaries. Give it a specification, get back a function. No side effects, no state management, no integration complexity.
我的函数生成代理之所以有效，是因为它在明确定义的边界内运行。输入一个规范，它就会返回一个函数。没有副作用，无需状态管理，也没有任何集成复杂性。
My DevOps automation works because it generates infrastructure-as-code that can be reviewed, versioned, and rolled back. The AI handles the complexity of translating requirements into Terraform, but the deployment pipeline maintains all the safety mechanisms we've learned to rely on.
我的 DevOps 自动化之所以有效，是因为它能够生成可审查、可版本控制、可回滚的基础设施即代码。AI 处理了将需求转换为 Terraform 的复杂性，而部署流水线则保留了我们习以为常的所有安全机制。
My CI/CD agent works because each stage has clear success criteria and rollback mechanisms. The AI handles the complexity of analyzing code quality and generating fixes, but the pipeline maintains control over what actually gets merged.
我的 CI/CD 代理之所以有效，是因为每个阶段都有明确的成功标准和回滚机制。AI 负责处理代码质量分析和生成修复程序的复杂性，而流水线则负责控制实际合并的内容。

The pattern is clear: AI handles complexity, humans maintain control, and traditional software engineering handles reliability.
模式很清晰：人工智能处理复杂性，人类保持控制，传统软件工程处理可靠性。

My Predictions 我的预测

Here's my specific prediction about who will struggle in 2025:
以下是我对 2025 年哪些人将陷入困境的具体预测：

Venture-funded "fully autonomous agent" startups will hit the economics wall first. Their demos work great with 5-step workflows, but customers will demand 20+ step processes that break down mathematically. Burn rates will spike as they try to solve unsolvable reliability problems.
获得风险投资的“完全自主代理”初创公司将首先遭遇经济瓶颈。他们的演示在 5 步工作流程下运行良好，但客户会要求 20 多个步骤的流程，并以数学方式分解。当他们试图解决无法解决的可靠性问题时，资金消耗率将飙升。

Enterprise software companies that bolted "AI agents" onto existing products will see adoption stagnate. Their agents can't integrate deeply enough to handle real workflows.
将“AI 代理”嵌入现有产品的企业软件公司将面临采用率停滞的局面。他们的代理无法进行足够深入的集成，无法处理实际的工作流程。

Meanwhile, the winners will be teams building constrained, domain-specific tools that use AI for the hard parts while maintaining human control or strict boundaries over critical decisions. Think less "autonomous everything" and more "extremely capable assistants with clear boundaries."
与此同时，获胜者将是那些能够构建受限的、特定领域工具的团队，这些工具利用人工智能处理复杂部分，同时在关键决策上保持人为控制或严格界限。与其说是“万物自主”，不如说是“拥有明确界限的超强能力助手”。

The market will learn the difference between AI that demos well and AI that ships reliably. That education will be expensive for many companies.
市场终将学会区分演示效果好的人工智能和交付可靠的人工智能。对许多公司来说，这种教育成本高昂。

I'm not betting against AI. I'm betting against the current approach to agent architecture. But I believe future is going to be far more valuable than the hype suggests.
我并非在押注人工智能会失败。我押注的是当前的代理架构方法会失败。但我相信未来人工智能的价值将远超炒作所言。

Building the Right Way

以正确的方式建设

If you're thinking about building with AI agents, start with these principles:
如果您正在考虑使用 AI 代理进行构建，请遵循以下原则：

Define clear boundaries. What exactly can your agent do, and what does it hand off to humans or deterministic systems?
明确界限。 你的代理到底能做什么？它会把什么交给人类或确定性系统？

Design for failure. How do you handle the 20-40% of cases where the AI makes mistakes? What are your rollback mechanisms?
为故障而设计。 你如何处理 20-40% 的 AI 错误情况？你的回滚机制是什么？

Solve the economics. How much does each interaction cost, and how does that scale with usage? Stateless often beats stateful.
解决经济问题。 每次交互的成本是多少？成本如何随使用量变化？无状态通常胜过有状态。

Prioritize reliability over autonomy. Users trust tools that work consistently more than they value systems that occasionally do magic.
可靠性优先于自主性。 用户更信任那些稳定运行的工具，而不是那些偶尔能带来神奇效果的系统。

Build on solid foundations. Use AI for the hard parts (understanding intent, generating content), but rely on traditional software engineering for the critical parts (execution, error handling, state management).
建立在坚实的基础之上。 使用人工智能处理困难部分（理解意图、生成内容），但依赖传统软件工程处理关键部分（执行、错误处理、状态管理）。

The agent revolution is coming. It just won't look anything like what everyone's promising in 2025. And that's exactly why it will succeed.
代理革命即将到来。它与2025年大家所期待的完全不同。而这正是它成功的关键。

The Real Lessons from the Trenches

来自战壕的真正教训

The gap between "works in demo" and "works at scale" is enormous, and most of the industry is still figuring this out.
“演示工作”和“大规模工作”之间的差距是巨大的，大多数行业仍在解决这个问题。

If you're working on similar problems, I'd love to continue this conversation. The challenges around agent reliability, cost optimization, and integration complexity are fascinating engineering problems that don't have obvious solutions yet.
如果您正在处理类似的问题，我很乐意继续讨论。围绕代理可靠性、成本优化和集成复杂性的挑战是令人着迷的工程问题，目前尚无明显的解决方案。

I regularly advise teams and companies navigating these exact challenges - from architecture decisions to avoiding the pitfalls I've learned about firsthand. Whether you're evaluating build vs buy decisions, debugging why your agents aren't working in production, or just want to implement them, feel free to reach out.
我经常为团队和公司提供建议，帮助他们应对这些挑战——从架构决策到避免我亲身经历过的陷阱。无论您是在评估自建还是购买的决策，调试代理无法在生产环境中运行的原因，还是只想实现它们，欢迎随时联系我们。

The more people building real systems and sharing honest experiences, the faster we'll all figure out what actually works. You can find me at utkarshkanwat@gmail.com or X if you want to dive deeper into any of these topics.
越多的人构建真实的系统并分享真实的经验，我们就能越快地找到真正有效的方法。你可以通过电子邮件 utkarshkanwat@gmail.com 或 X 联系我。如果您想深入了解这些主题中的任何一个。

# **为什么我不看好 2025 年人工智能代理的出现**