Anthropic Claude 4 - 新一代模型Claude Opus 4 和 Claude Sonnet 4

Claude 4 最新更新

Today, we’re introducing the next generation of Claude models: Claude Opus 4 and Claude Sonnet 4, setting new standards for coding, advanced reasoning, and AI agents.
今天，我们介绍了 Claude 模型的下一代：Claude Opus 4 和 Claude Sonnet 4，为编程、高级推理和 AI 代理设定了新的标准。

Claude Opus 4 is the world’s best coding model, with sustained performance on complex, long-running tasks and agent workflows. Claude Sonnet 4 is a significant upgrade to Claude Sonnet 3.7, delivering superior coding and reasoning while responding more precisely to your instructions.
Claude Opus 4 是全球最好的编码模型，在复杂、长时间运行的任务和代理工作流程中表现出持续的性能。Claude Sonnet 4 是对 Claude Sonnet 3.7 的重大升级，提供了更优越的编码和推理能力，同时更精确地响应您的指令。

Alongside the models, we’re also announcing:
除了模型之外，我们还宣布：

Extended thinking with tool use (beta): Both models can use tools—like web search—during extended thinking, allowing Claude to alternate between reasoning and tool use to improve responses.
扩展思考与工具使用（Beta 版）：两个模型都可以在扩展思考期间使用工具——例如网络搜索——允许 Claude 在推理和工具使用之间交替，以改进响应。
New model capabilities: Both models can use tools in parallel, follow instructions more precisely, and—when given access to local files by developers—demonstrate significantly improved memory capabilities, extracting and saving key facts to maintain continuity and build tacit knowledge over time.
新模型功能：两个模型都可以并行使用工具，更精确地遵循指令，并且——当开发者提供本地文件访问权限时——表现出显著提高的记忆能力，提取和保存关键事实以保持连续性，并在长时间内建立隐性的知识。
Claude Code is now generally available: After receiving extensive positive feedback during our research preview, we’re expanding how developers can collaborate with Claude. Claude Code now supports background tasks via GitHub Actions and native integrations with VS Code and JetBrains, displaying edits directly in your files for seamless pair programming.
Claude 代码现已正式提供：在研究预览期间收到大量积极反馈后，我们正在扩大开发者与 Claude 的协作方式。Claude 代码现在支持通过 GitHub Actions 进行后台任务，并与 VS Code 和 JetBrains 进行原生集成，直接在您的文件中显示编辑内容，以实现无缝的结对编程。
New API capabilities: We’re releasing four new capabilities on the Anthropic API that enable developers to build more powerful AI agents: the code execution tool, MCP connector, Files API, and the ability to cache prompts for up to one hour.
新 API 功能：我们在 Anthropic API 上发布了四个新功能，使开发者能够构建更强大的 AI 代理：代码执行工具、MCP 连接器、文件 API，以及最多可缓存一小时提示的能力。

Claude Opus 4 and Sonnet 4 are hybrid models offering two modes: near-instant responses and extended thinking for deeper reasoning. The Pro, Max, Team, and Enterprise Claude plans include both models and extended thinking, with Sonnet 4 also available to free users. Both models are available on the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. Pricing remains consistent with previous Opus and Sonnet models: Opus 4 at $15/$75 per million tokens (input/output) and Sonnet 4 at $3/$15.
Claude Opus 4 和 Sonnet 4 是混合模型，提供两种模式：近乎即时的响应和扩展思考以进行更深入的推理。Pro、Max、Team 和企业级 Claude 计划均包含这两个模型和扩展思考功能，Sonnet 4 也对免费用户开放。这两个模型均可在 Anthropic API、Amazon Bedrock 和 Google Cloud 的 Vertex AI 上使用。价格与之前的 Opus 和 Sonnet 模型保持一致：Opus 4 为每百万个令牌（输入/输出）15/75 美元，Sonnet 4 为 3/15 美元。

Claude 4

Claude Opus 4 is our most powerful model yet and the best coding model in the world, leading on SWE-bench (72.5%) and Terminal-bench (43.2%). It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, with the ability to work continuously for several hours—dramatically outperforming all Sonnet models and significantly expanding what AI agents can accomplish.
Claude Opus 4 是我们迄今最强大的模型，也是全球最好的编码模型，在 SWE-bench（72.5%）和 Terminal-bench（43.2%）上领先。它在需要专注努力和数千步的长时间任务中持续表现出色，能够连续工作数小时——大幅超越所有 Sonnet 模型，并显著扩展了 AI 代理能够完成的工作。

Claude Opus 4 excels at coding and complex problem-solving, powering frontier agent products. Cursor calls it state-of-the-art for coding and a leap forward in complex codebase understanding. Replit reports improved precision and dramatic advancements for complex changes across multiple files. Block calls it the first model to boost code quality during editing and debugging in its agent, codename goose, while maintaining full performance and reliability. Rakuten validated its capabilities with a demanding open-source refactor running independently for 7 hours with sustained performance. Cognition notes Opus 4 excels at solving complex challenges that other models can’t, successfully handling critical actions that previous models have missed.
Claude Opus 4 在编程和复杂问题解决方面表现出色，为前沿智能体产品提供支持。Cursor 称其为编程领域的顶尖技术，是复杂代码库理解的重大突破。Replit 报告了在多个文件中进行复杂更改时精度的提升和显著的进步。Block 称其为首个在智能体（代号鹅）的编辑和调试过程中提升代码质量，同时保持完整性能和可靠性的模型。Rakuten 通过一个独立运行 7 小时且持续表现优异的开放源代码重构验证了其能力。Cognition 指出 Opus 4 擅长解决其他模型无法处理的复杂挑战，成功处理了先前模型遗漏的关键操作。

Claude Sonnet 4 significantly improves on Sonnet 3.7’s industry-leading capabilities, excelling in coding with a state-of-the-art 72.7% on SWE-bench. The model balances performance and efficiency for internal and external use cases, with enhanced steerability for greater control over implementations. While not matching Opus 4 in most domains, it delivers an optimal mix of capability and practicality.
Claude Sonnet 4 显著改进了 Sonnet 3.7 的行业领先能力，在编程方面以 72.7%的 SWE-bench 成绩表现优异。该模型在内部和外部用例中平衡了性能和效率，并增强了可引导性以实现更对实施的控制。虽然大多数领域无法与 Opus 4 匹敌，但它提供了能力和实用性的最佳组合。

GitHub says Claude Sonnet 4 soars in agentic scenarios and will introduce it as the model powering the new coding agent in GitHub Copilot. Manus highlights its improvements in following complex instructions, clear reasoning, and aesthetic outputs. iGent reports Sonnet 4 excels at autonomous multi-feature app development, as well as substantially improved problem-solving and codebase navigation—reducing navigation errors from 20% to near zero. Sourcegraph says the model shows promise as a substantial leap in software development—staying on track longer, understanding problems more deeply, and providing more elegant code quality. Augment Code reports higher success rates, more surgical code edits, and more careful work through complex tasks, making it the top choice for their primary model.
GitHub 表示 Claude Sonnet 4 在自主场景中表现出色，并将作为新代码代理的模型引入 GitHub Copilot。Manus 强调了它在遵循复杂指令、清晰推理和美观输出方面的改进。iGent 报道 Sonnet 4 在自主多功能应用开发方面表现出色，以及问题解决和代码库导航的显著改进——将导航错误率从 20%降至接近零。Sourcegraph 表示该模型在软件开发中显示出巨大潜力——更长时间地保持方向，更深入地理解问题，并提供更优雅的代码质量。Augment Code 报道更高的成功率、更精准的代码编辑和更仔细地完成复杂任务，使其成为他们的主要模型首选。

These models advance our customers’ AI strategies across the board: Opus 4 pushes boundaries in coding, research, writing, and scientific discovery, while Sonnet 4 brings frontier performance to everyday use cases as an instant upgrade from Sonnet 3.7.
这些模型全面提升了客户的 AI 战略：Opus 4 在编程、研究、写作和科学发现方面拓展了边界，而 Sonnet 4 则将前沿性能带到日常用例中，作为从 Sonnet 3.7 的即时升级。

Claude 4 models lead on SWE-bench Verified, a benchmark for performance on real software engineering tasks. See appendix for more on methodology.
Claude 4 模型在 SWE-bench Verified 上领先，这是一个用于评估真实软件工程任务性能的基准。有关方法论的更多信息，请参见附录。

Claude 4 models deliver strong performance across coding, reasoning, multimodal capabilities, and agentic tasks. See appendix for more on methodology.
Claude 4 模型在编程、推理、多模态能力和代理任务方面表现出色。有关方法论的更多信息，请参见附录。

Model improvements 模型改进

In addition to extended thinking with tool use, parallel tool execution, and memory improvements, we’ve significantly reduced behavior where the models use shortcuts or loopholes to complete tasks. Both models are 65% less likely to engage in this behavior than Sonnet 3.7 on agentic tasks that are particularly susceptible to shortcuts and loopholes.
除了扩展工具使用、并行工具执行和内存改进之外，我们还显著减少了模型使用捷径或漏洞完成任务的行为。与 Sonnet 3.7 相比，这两个模型在特别容易使用捷径和漏洞的代理任务上，这种行为的发生概率降低了 65%。

Claude Opus 4 also dramatically outperforms all previous models on memory capabilities. When developers build applications that provide Claude local file access, Opus 4 becomes skilled at creating and maintaining ‘memory files’ to store key information. This unlocks better long-term task awareness, coherence, and performance on agent tasks—like Opus 4 creating a ‘Navigation Guide’ while playing Pokémon.
Claude Opus 4 在内存能力方面也大幅优于所有之前的模型。当开发者构建提供 Claude 本地文件访问的应用时，Opus 4 擅长创建和维护“记忆文件”来存储关键信息。这解锁了更好的长期任务意识、连贯性和代理任务上的性能——比如在玩宝可梦时，Opus 4 创建“导航指南”。

Memory: When given access to local files, Claude Opus 4 records key information to help improve its game play. The notes depicted above are real notes taken by Opus 4 while playing Pokémon.
Memory: 当获得本地文件访问权限时，Claude Opus 4 会记录关键信息以帮助改进其游戏表现。上述笔记是 Opus 4 在玩《宝可梦》时实际记录的笔记。

Finally, we’ve introduced thinking summaries for Claude 4 models that use a smaller model to condense lengthy thought processes. This summarization is only needed about 5% of the time—most thought processes are short enough to display in full. Users requiring raw chains of thought for advanced prompt engineering can contact sales about our new Developer Mode to retain full access.
最后，我们为 Claude 4 模型引入了思考摘要功能，该功能使用较小的模型来精简冗长的思考过程。这种摘要大约只需要 5%的时间使用——大多数思考过程足够短，可以完整显示。需要原始思维链进行高级提示工程的用户可以联系销售了解我们的新开发者模式，以保留完整访问权限。

Claude Code

Claude Code, now generally available, brings the power of Claude to more of your development workflow—in the terminal, your favorite IDEs, and running in the background with the Claude Code SDK.
Claude Code 已正式推出，将 Claude 的强大功能带入更多开发工作流程——可在终端、您喜爱的 IDE 中以及通过 Claude Code SDK 在后台运行。

New beta extensions for VS Code and JetBrains integrate Claude Code directly into your IDE. Claude’s proposed edits appear inline in your files, streamlining review and tracking within the familiar editor interface. Simply run Claude Code in your IDE terminal to install.
VS Code 和 JetBrains 的新版测试扩展直接将 Claude Code 集成到您的 IDE 中。Claude 提出的修改建议将直接显示在您的文件中，简化了在熟悉的编辑器界面中的审查和跟踪。只需在 IDE 终端中运行 Claude Code 即可安装。

Beyond the IDE, we’re releasing an extensible Claude Code SDK, so you can build your own agents and applications using the same core agent as Claude Code. We’re also releasing an example of what’s possible with the SDK: Claude Code on GitHub, now in beta. Tag Claude Code on PRs to respond to reviewer feedback, fix CI errors, or modify code. To install, run /install-github-app from within Claude Code.
在 IDE 之外，我们还将发布一个可扩展的 Claude Code SDK，以便您可以使用与 Claude Code 相同的代理核心来构建自己的代理和应用程序。我们还发布了一个 SDK 可能的示例：GitHub 上的 Claude Code，目前处于 Beta 阶段。在 PR 上标记 Claude Code 以响应审阅者反馈、修复 CI 错误或修改代码。要安装，请在 Claude Code 中运行/install-github-app。

Getting started 开始使用

These models are a large step toward the virtual collaborator—maintaining full context, sustaining focus on longer projects, and driving transformational impact. They come with extensive testing and evaluation to minimize risk and maximize safety, including implementing measures for higher AI Safety Levels like ASL-3.
这些模型是迈向虚拟协作者的一大步——保持完整上下文，专注于更长的项目，并推动变革性影响。它们经过广泛的测试和评估，以降低风险并最大化安全性，包括实施更高的人工智能安全级别措施，如 ASL-3。

We’re excited to see what you’ll create. Get started today on Claude, Claude Code, or the platform of your choice.
我们很期待看到您将创造出什么。今天就开始使用 Claude、Claude Code 或您选择的其他平台。