Anthropic and OpenAI Just Shipped the Same Answer to AI Agents, Seven Days Apart

TL;DR

April 8, 2026: Anthropic launched Claude Managed Agents in public beta. Three REST endpoints, sandboxed execution, checkpointing, credential scoping, tracing. Notion, Rakuten, and Sentry in production.
April 15, 2026: OpenAI updated the Agents SDK with a model-native harness, nine sandbox providers (Unix-local, Docker, Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, Vercel), and Codex-style filesystem tools.
Same answer, different packaging: both ship an agent, an environment, and a session. Both separate harness-side state from sandbox-side state. Both bake in shell, filesystem edits, and credential scoping.
The convergence is the story, not either release. The industry has agreed on what a production agent needs: a clean split between the control plane (harness) and the execution plane (sandbox).
What it means for you: the floor for harness engineering just rose sharply. Sandboxing is a shipped primitive, not a DIY project. Stop rebuilding below the waterline.
Google ADK was arguing an adjacent question (multi-agent orchestration at enterprise deployment scale), not the same one. Still the right pick for Vertex AI shops and polyglot (Python/TS/Go/Java) teams.
Picker: OpenAI SDK if you're on GPT-5.x and want pluggable sandbox. Claude Agent SDK for the library harness. Claude Managed Agents for the hosted version. Deep Agents for model-agnostic LangGraph. ADK for Vertex and multi-agent hierarchies. Microsoft Agent Framework and CrewAI also valid, not covered in depth here.
Where the leverage moves: prompt design, skill libraries, evals, domain integration. Not the harness.

On April 15, 2026, OpenAI shipped an update to the Agents SDK that most of the industry read as "new sandbox feature, nice." That framing misses what happened, and missing it is worse once you notice what Anthropic shipped a week earlier.

On April 8, Anthropic launched Claude Managed Agents in public beta: sandboxed code execution, checkpointing, credential scoping, end-to-end tracing, and a managed agent harness, all exposed through three REST endpoints. On April 15, OpenAI updated the Agents SDK with a model-native harness, nine sandbox providers, Codex-style filesystem tools, and an explicit split between the control plane and the compute plane.

Two of the largest frontier labs, independently, shipped the same answer to the same question within one week. Same core architecture, different packaging. That is the story. Neither release is a concession to the other. Both are evidence that the industry has converged on what a production agent actually needs.

If you have been building AI agents for a while, you have almost certainly built parts of this yourself, before the primitives existed. That was not your fault. The ecosystem was still figuring out what the hard part was.

It is clearer now. Let me walk you through what changed, why it matters more than it looks, and what it means for what you are building next.

The three-way argument the labs just ended

For about a year, three camps had been quietly disagreeing about what an AI agent is, at the level of architecture.

Anthropic said: give the model a computer. They built Claude Code as an agentic coding tool, then realized the harness they wrote for it was powering research, video production, and note-taking internally. In late 2025 they renamed the Claude Code SDK to the Claude Agent SDK and shipped it as the library form of that same harness. Their bet was that a frontier model with a real terminal and a real filesystem is the primitive. Everything else is a skin on that.

LangChain said: an agent is a stateful graph with a planning discipline. Their Deep Agents library distilled the pattern behind Claude Code, Deep Research, and Manus into four pillars: a detailed system prompt, a planning tool (like a todo list), subagents for context isolation, and a file system for persistent memory. They built it on LangGraph so you got durable execution, checkpointing, streaming, and human-in-the-loop for free. Model-agnostic, provider-agnostic, middleware-first.

OpenAI said: an agent is a configurable LLM with handoffs, guardrails, and tracing. The original Agents SDK, launched in early 2025 as the production successor to the experimental Swarm project, was deliberately minimal. It had four primitives (agents, handoffs, guardrails, tracing) and assumed you would build the rest.

◆ Background · the OpenAI Agents SDK and Swarm

Where the OpenAI primitives came from

The Agents SDK is OpenAI's production-grade successor to Swarm, an experimental, education-focused multi-agent library OpenAI open-sourced in late 2024. The SDK kept Swarm's lightweight philosophy but hardened it for production around a very small set of primitives: agents (an LLM with instructions and tools), handoffs (one agent explicitly delegating to another and carrying context across the transition), guardrails (input and output validation that runs in parallel and fails fast), and built-in tracing. The Python repo, created in March 2025, sits at roughly 27,000 GitHub stars by mid-2026, with an official JS/TS port. The April 2026 update layers the sandbox and model-native harness on top of that same minimal core.

Those are three different answers to the same question, and teams building production agents kept running into the same wall: the minimal orchestration layer was not enough, the bundled-CLI-harness was powerful but opinionated, and the middleware-on-a-graph approach required buying into an ecosystem. Everyone ended up writing custom plumbing for sandboxing, memory, resume, approvals, and credential isolation.

(Fair pushback before we go further. What about Google? Google's Agent Development Kit (ADK), launched at Cloud NEXT in April 2025, is real and widely deployed. But ADK was arguing an adjacent question rather than the same one. I'll come back to where it fits. What about Microsoft Agent Framework, CrewAI, AutoGen, Strands and the dozen other tools teams actually use in production? They exist and they matter, but most of them inherit or extend the three positions above. For the specific architectural fight that just resolved, it was the three camps I'm naming.)

In April 2026, that fight ended. Both Anthropic and OpenAI closed the gap within one week of each other, and they closed it the same way.

What Anthropic shipped on April 8

Claude Managed Agents is a hosted agent harness with a sandbox baked in. You define an agent (model, system prompt, toolset), configure an environment (cloud container, pre-installed packages, networking rules), and create a session that ties them together. The service handles the agent loop, tool execution, sandboxing, credential management, checkpointing, and a server-sent event stream for observability.

The toolset that matters most is agent_toolset_20260401: bash execution, file read and write, web search, web fetch, grep, and glob, all sandboxed. Pricing is standard Claude token rates plus $0.08 per session-hour. Notion, Rakuten, and Sentry are already using it in production.

The shape to notice: three REST endpoints (/v1/agents, /v1/environments, /v1/sessions), a beta header (managed-agents-2026-04-01), and a sharp separation between what the agent is (the agent resource), where it runs (the environment resource), and when it runs (the session resource). That three-way split is the same split OpenAI would formalize a week later.

What OpenAI shipped on April 15

Two things, and they only make sense as a pair.

A model-native harness

The Agents SDK harness now owns the things you used to write yourself: configurable memory (separate from conversation history), sandbox-aware orchestration, approvals with resume bookkeeping, tracing, and filesystem tools tuned for how GPT-5.x models actually produce patches and commands. The Codex-style tools (apply_patch, shell, view_image) are the same primitives that have been powering OpenAI's coding agent for a year, now factored out into the general Agents SDK.

Read the OpenAI docs carefully and you will see the new framing: the harness is pitched as the thing that unlocks more of a frontier model's capability by aligning execution with how those models perform best. That is not a feature pitch. That is a philosophical statement. The harness is now part of the product surface.

Native sandbox execution with pluggable providers

The bigger architectural move is SandboxAgent. Your agent now gets a live Unix-like execution environment with a filesystem, a shell, installed packages, mounted storage, exposed ports, snapshots, and resumable state. The pieces factor cleanly:

Manifest describes what the workspace starts with: files, directories, repos, mounts, environment variables, OS users.
Capabilities attach sandbox-native behavior: Shell, Filesystem, Skills, Memory, Compaction.
Sandbox clients plug in providers: Unix-local, Docker, Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, Vercel.
SandboxRunConfig decides per-run whether to inject a live session, resume from serialized state, or create a fresh workspace.
Saved state splits into three surfaces: RunState (harness-side), session_state (sandbox-side), and snapshots (workspace contents).

The detail that matters most here is the provider list. OpenAI, the company whose last managed-agent primitive (the Assistants API, being retired on August 26, 2026) was hosted-only and deeply proprietary, shipped this new thing with nine providers on day one. That is a meaningful change of posture.

◆ Background · why sandboxing is the hard part

The trust boundary, and what happens without it

An agent that can run shell commands and edit files is far more useful, and far more dangerous, than a chatbot. As one security write-up puts it, in a chatbot a prompt injection produces a wrong answer; in an agentic system it produces a wrong action. The textbook failure happened in April 2026, when a Cursor coding agent reportedly deleted a startup's production database and its backups in a single run. A sandbox is the containment answer: agent-generated code runs in an isolated, ephemeral container with narrow credentials and no path back to production, so a hijacked prompt cannot reach orchestration, auth, or recovery state. This is exactly the control-plane-versus-execution-plane boundary both labs just shipped as a primitive.

The two shipments, side by side

Anthropic picked the managed-service route: you call their API, they run the sandbox. OpenAI picked the library route: you call their SDK, you pick the sandbox provider. Those look like different choices until you line up what's actually in the boxes. Both ship an agent, an environment (or manifest), and a session. Both separate harness-side state from sandbox-side state. Both expose shell, filesystem edits, and file views as the base toolset. Both bake in credential scoping outside the model's prompt. Both handle checkpointing and resume.

The packaging is different. The answer is the same.

The single idea that organizes all of this

Squint at both shipments plus the existing frameworks and you see one structural idea that everyone has now landed on: the boundary between harness and compute matters, and you should keep it clean. (ADK gets there too, but via a managed runtime rather than as a primitive you wire up yourself.)

Keep the boundary clean. Collapsing the harness into the sandbox is convenient for prototypes and wrong for production.

The harness is the control plane. It owns the agent loop, the model calls, tool routing, handoffs, approvals, tracing, guardrails, audit logs, and recovery state. It wants to live in your trusted infrastructure, version-controlled with your app, long-lived, observable.

The compute is the execution plane. It owns filesystem writes, shell commands, installed packages, mounted data, exposed ports, process state. It wants to be isolated, ephemeral by default, runnable with narrow credentials, easy to tear down.

The OpenAI docs are explicit that running the harness inside the sandbox is convenient for prototypes and wrong for production. It collapses the trust boundary. Anything the model does to the filesystem can reach orchestration, auth, or recovery state.

Everyone arrived at this from a different direction. Anthropic arrived at it incrementally: subagents with isolated contexts, then agent teams, then (in April 2026) Claude Managed Agents making the whole boundary an explicit product surface. LangChain arrived at it by putting the agent loop in a LangGraph state machine with middleware interceptors and keeping the filesystem virtual. OpenAI arrived at it by making the boundary the central design primitive of the updated Agents SDK.

Four frameworks. Four paths. One destination.

Where Google fits in: a different argument entirely

Now the Google question.

The Agent Development Kit, launched at Google Cloud NEXT in April 2025, is in the same league as the other three but was never really in the same fight. While Anthropic, OpenAI, and LangChain were arguing about the shape of a single agent's harness, Google was arguing about something adjacent: how you orchestrate a fleet of specialized agents in an enterprise deployment.

ADK's primitives reflect that different starting point. It ships a taxonomy of agent types: LLM agents for reasoning, workflow agents for explicit process control (sequential, parallel, loop), and custom agents for specialized integrations. It treats multi-agent systems as the primary unit of analysis, not the single agent. It is the only major framework that ships first-class in Python, TypeScript, Go, and Java (the others are Python-first, some with partial TypeScript). Its runtime is event-driven rather than request-response, which gives it a different debugging and observability story. And the canonical deployment path is Vertex AI Agent Engine Runtime, Google Cloud's managed service for exactly this.

ADK is model-agnostic in principle (it ships adapters for 100+ LLMs, wrappers for LangChain and CrewAI tools, and MCP support) and Gemini-optimized in practice. The features that shine brightest are the Google-ecosystem ones: bidirectional audio and video streaming via the Gemini Live API, enterprise connectors to BigQuery and AlloyDB, and the "deploy it to Vertex" happy path.

◆ Background · MCP, the shared protocol underneath

Why "MCP support" keeps showing up in every framework

The Model Context Protocol is an open standard Anthropic introduced in November 2024 for connecting models to external tools and data through a uniform interface. It solves the "N x M" integration problem: instead of writing a custom connector for every model-to-tool pairing, you expose a tool once as an MCP server and any MCP-aware client can use it. It is transported over JSON-RPC and borrows its message-flow design from the Language Server Protocol. OpenAI officially adopted MCP in March 2025, and Google ADK ships MCP support too, which is why the protocol is one of the few things every camp in this story already agrees on.

Here is the interesting observation: ADK never had to participate in the harness-vs-compute argument because it assumed a managed deployment environment from day one. Vertex AI Agent Engine Runtime handles the control-plane concerns (scaling, tracing, session persistence, safety policies) that OpenAI's April update now bakes into the open SDK. The tradeoff is obvious and works both ways. You give up portability and you gain a deployment story that is ready the moment you ship.

ADK's contribution to the broader convergence is different from the other three. It was one of the earliest enterprise-deployment-ready frameworks to treat multi-agent systems as the primary unit of analysis, alongside Microsoft's AutoGen (now merged into the Microsoft Agent Framework) and CrewAI. OpenAI's coming subagents, Anthropic's agent teams feature, and Deep Agents' subagent spawning all converge on a similar assumption: that a real production agent system is usually a hierarchy of specialized agents, not a single agent with a big toolbox.

So the clean story is this. Anthropic and OpenAI, between them, just ended the architecture argument. Google has been answering the orchestration argument. Both questions are real. Which one dominates your use case determines which framework you reach for first.

What this means for what you are building

Three concrete consequences. They stack.

The floor for production agents just rose

Before these April releases, shipping a production agent meant building a harness. You were writing session persistence, credential isolation, approvals with resume, memory compaction, and tracing yourself. Most teams got some of it right and some of it wrong. I have reviewed enough agent codebases to say this with confidence: the homegrown harness was always the weakest layer.

That work is now done for you, across the major SDKs and managed services. ADK gets most of it through Vertex, Claude Managed Agents gets it through Anthropic infrastructure, the OpenAI Agents SDK gets it through the new harness, and Deep Agents gets it through LangGraph. The time-to-production for a file-heavy agent (data-room Q&A, repo code review, report generation, long-running research with artifacts) just collapsed from weeks to days.

This expands the set of companies that can ship agents safely. It also compresses the market for "we built the agent harness" as a standalone product. If your startup's pitch is orchestration plus tracing on top of these SDKs, your pitch got much harder this month.

Compute became a commodity layer, on purpose

Nine sandbox providers is not an integration roadmap. It is a positioning statement. OpenAI is telling you that compute is something you pick per workload. GPU-heavy task? Modal. Edge latency? Cloudflare. Generic Python? E2B. Persistent dev environments? Daytona. Web artifacts? Vercel. Local iteration? Docker or Unix-local. Your SandboxAgent, Manifest, and capabilities stay stable. Only the client changes.

This has two knock-on effects. Sandbox providers now compete on the narrow thing they are good at (cold start speed, persistence semantics, compliance scope, mount performance) because the switching cost is genuinely low. And teams that want to run compute in their own VPC for compliance reasons can do that with Docker while keeping the rest of the architecture intact.

The subtle win: you no longer need to pick your compute provider based on who has the best SDK. They all plug in.

Coding agents are no longer a separate category

This is the consequence I think is most underappreciated.

For two years, "coding agent" meant something specific: a tool that clones a repo, reads files, edits files, runs tests. The canonical examples were Claude Code, Codex, Aider, and Cline. General-purpose agents were something else: research, customer support, ops automation.

That distinction is dying. The April releases put Codex-style tools (apply_patch, shell) into the general Agents SDK and wrapped the same primitive set into Claude Managed Agents. Anthropic did the equivalent move in reverse last year when they renamed the Claude Code SDK to the Claude Agent SDK and positioned the same harness as a general runtime. LangChain's Deep Agents was explicitly derived from the coding-agent pattern.

The implication: when your business agent can write a Python script, run it in a sandbox, inspect the output, and edit its own script, the question of whether it is a "coding agent" stops meaning anything. A finance agent that needs to transform a CSV does not need a bespoke CSV tool. It writes three lines of pandas, runs them, and reads the result. A legal agent that needs to diff two contracts does not need a diff tool. It runs diff in its sandbox.

This changes what you build. If you have been designing custom tools for every task your agent needs to do, you have been doing too much work. The tool is a Python REPL with a filesystem. The task is "figure out what to write."

The framework picker, honestly

Let me be direct about the tradeoffs, because the marketing pages will not.

Pick the OpenAI Agents SDK if you are committed to GPT-5.x, want a pluggable sandbox layer, and want the harness to be a library rather than a bundled CLI binary. You get the most explicit harness-compute boundary and the most provider choice. You give up model flexibility and subagent parity (still coming). TypeScript support for sandbox agents is not here yet.

Pick the Claude Agent SDK if you are committed to Claude, want the exact harness behavior that powers Claude Code, and value the filesystem-native model (CLAUDE.md, AGENTS.md, .claude/agents/, skills, plugins). You get best-in-class subagents with context isolation today. You give up model flexibility and accept that the harness is a bundled Node.js process, not a pure library. If you want sandboxing without the container work, see the next option.

Pick Claude Managed Agents if you want the sandbox, state persistence, credential scoping, and tracing handled by Anthropic as a managed service, addressed through three REST endpoints. Notion, Rakuten, and Sentry are already on it in production. You give up portability (it is Anthropic infrastructure only) and pay $0.08 per session-hour on top of token costs. Public beta as of April 2026, so treat the production readiness as "good enough for early adopters, probably not for workloads that can't tolerate beta semantics yet."

Pick Deep Agents (LangChain/LangGraph) if you need model-agnostic agents, want LangGraph's durable execution as core infrastructure, or are already on the LangChain stack. You get the most flexibility in model choice and graph structure. You give up some ergonomics and accept middleware-based extension instead of capability-based. The sandbox primitive is not first-class; the virtual filesystem is, which is different.

Pick Google ADK if your workload is a multi-agent system rather than a single agent with tools, you are deploying on Vertex AI (or want that happy path open), you need bidirectional audio or video via Gemini Live, or you are in a polyglot org (it is the only one with Python, TypeScript, Go, and Java as first-class). You get hierarchical multi-agent composition and enterprise connectors baked in. You give up some of the "harness as a clean primitive" story the other three are converging on, and you accept a gravitational pull toward the Google Cloud ecosystem even though the framework is technically model-agnostic.

Use more than one if you are running evals across model families or building a product that needs to offer multiple providers. The harnesses are similar enough that you can factor out common agent descriptions, skills, and instructions, and keep them portable.

What I would actually do this week

If you are building a new agent, start with the SDK that matches your model choice, and use a real sandbox from day one. Docker or Unix-local for development, a hosted provider when you need isolation or persistence. Do not skip the sandbox thinking you will add it later.

If you have an existing agent held together by custom code, audit where your harness ends and your sandbox begins. If those are the same process, that is your highest-leverage refactor. Not for elegance. For the moment when your agent is given a prompt it should not be allowed to act on and it has access to your production credentials.

If you are evaluating frameworks, stop picking based on feature lists. They all have roughly the same features now, or are about to. Pick based on which team's abstractions you trust, which ecosystem you want to be locked into for the next two years, and where your model bets lie.

If you are investing in skills and custom instructions, do it now. Skills and AGENTS.md files are increasingly portable across harnesses. One investment pays off across models. The best teams I know are treating their skill library as a strategic asset, versioned alongside their product code.

The uncomfortable part

Here is the thing nobody in the framework camps will say out loud.

The floor for harness engineering has risen sharply. Sandboxing is a shipped primitive, not a DIY project. MCP is a shared protocol. Single-agent orchestration is mostly a solved problem. Tracing, approvals, and resume semantics are first-class. The work that used to take teams three to six months now takes days.

That does not mean the harness is a commodity. Multi-agent orchestration, domain-specific guardrails, and truly long-horizon workflows still need real design. But the baseline just moved up, and a lot of the infrastructure code written in 2024 and 2025 is now either redundant or subtly wrong compared to what the SDKs default to.

The model is not your problem to solve. Your leverage is in what you do with it inside a now-standard harness.

The leverage moves elsewhere. The model is not your problem to solve. OpenAI, Anthropic, and others will keep making them better whether or not you do anything. Your leverage is in what you do with the model inside a now-standard harness. Prompt design. Skill libraries. Evals. Domain integration. The context you inject. The workflows you compose. The taste you bring to when to use a subagent and when to keep it in the main loop.

That sounds obvious and it is not. A lot of agent projects are still burning the majority of their engineering budget on infrastructure the SDKs now provide. The teams that will do the best work in the next phase are the ones who notice the floor has risen, stop rebuilding it below the waterline, and spend the freed-up cycles on the parts that compound.

Your agent framework was improvising the hard part because the hard part was unclear. It is clearer now. Spend your time higher up the stack.

If you build agents, I am curious which framework you are on and what the next 90 days look like for you. The comments are the right place for that.