Agents & Automation

Anthropic updates Claude Managed Agents: self-hosted sandboxes, mid-session tool swapping, and cache diagnostics

Anthropic ships three Claude Managed Agents platform updates giving enterprise developers more infrastructure control, runtime flexibility, and prompt-cache transparency.

agents automation category

Claude Managed Agents only launched in April, and Anthropic is already shipping meaningful infrastructure updates to it. Three of them dropped this week, and while they won’t make headlines the way a new model would, they matter quite a bit if you’re building production agents.

Here’s what changed and why it’s worth paying attention to.

Self-hosted sandboxes: your infrastructure, Anthropic’s orchestration

By default, when Claude Managed Agents runs tool calls, that execution happens inside Anthropic-managed cloud infrastructure. That’s convenient, but it’s a non-starter for a lot of enterprise teams, particularly in regulated industries where code, file system access, and network egress can’t leave the organisation’s environment.

Self-hosted sandboxes, now in public beta, split the model cleanly. Anthropic’s orchestration layer stays on their side. Tool execution moves to yours.

The way it works: you run an environment worker, a process on your own infrastructure, that connects to Anthropic’s work queue. When a session needs to execute a tool, Anthropic enqueues the request. Your worker picks it up, runs it locally, and posts the result back. The agent loop continues normally, but the actual execution never left your environment.

Supported managed providers include Cloudflare, Daytona, Modal, and Vercel, if you want the benefits of self-hosting without standing up bare infrastructure. And if you’re on AWS, authentication is handled through IAM with SigV4 rather than an environment key. You attach the AnthropicSelfHostedEnvironmentAccess managed policy to the IAM principal your worker runs under.

A couple of current limitations worth knowing: self-hosted sandboxes don’t yet support the Memory feature, and session creation from within Claude Platform on AWS is not yet available (though the workers themselves can run on AWS).

What this means for you: If your organisation has been interested in Claude agents but couldn’t deploy them because tool execution happens in Anthropic’s cloud, this removes that blocker. Moving between Anthropic-managed and self-hosted execution is a configuration change, not an integration rewrite.

Hot-swapping MCP server and tool configs mid-session

The second update is smaller but practically useful. You can now update an active session’s MCP server and tool configurations without tearing down and recreating the session.

For long-running agents, this matters. Agents that run over extended periods might need access to different tools at different stages of a task. Previously, changing what tools were available meant ending the session and starting a new one, losing conversation state in the process. Now you can update the configuration while the session is live.

What this means for you: If you’re building agents that handle multi-step workflows where tool requirements shift as the task progresses, you no longer need to architect around session restarts. This simplifies the logic considerably for anything that runs longer than a single interaction.

Cache diagnostics: finally understanding why your cache missed

This one is for anyone who has spent time debugging prompt caching behaviour and come away frustrated.

Prompt caching can meaningfully cut costs and latency on long conversations, but when a cache miss happens unexpectedly, the API has historically given you no information about why. You just see the miss in your usage data and start guessing.

Cache diagnostics, now in public beta, adds a diagnostics.previous_message_id field to Messages API requests. Pass it, and the API returns a cache_miss_reason that tells you exactly where the prompt cache prefix diverged from the previous turn.

This is directly useful context: in April, Anthropic had a bug where an optimisation designed to clear old thinking sections from idle sessions fired on every turn instead of just once, causing unexpected cache misses across the board. Without tooling like this, teams were left correlating usage spikes against guesses. With cache_miss_reason, you can diagnose this kind of thing programmatically.

What this means for you: If you’re using prompt caching and your cost or latency numbers are behaving unexpectedly, this gives you a concrete starting point for debugging. No more reading tea leaves from token counts.

One more thing worth noting: large output handling

Alongside the three headline updates, there’s a quiet quality-of-life improvement for agents that generate large outputs. When agent_toolset or MCP tools produce results exceeding 100,000 tokens, the output is now automatically spilled to a file in the sandbox. The model receives a truncated preview and the file path, and can read the full content from there if needed. This prevents large tool outputs from blowing up context windows or hitting response size limits mid-run.

The bigger picture

These updates reflect where Anthropic is focusing its platform work right now. Claude Managed Agents is still very new, and the gap between “interesting demo capability” and “production-ready enterprise deployment” comes down to exactly this kind of infrastructure work: where does execution happen, how do you reconfigure at runtime, and how do you observe what’s actually going on.

None of these updates change what Claude can do. They change whether Claude agents can fit into the environments where serious production deployments actually live.

If you’re evaluating Claude Managed Agents for an enterprise context, the self-hosted sandboxes documentation is the place to start. The cache diagnostics beta is worth enabling even if caching is working fine for you today. It’s much easier to understand normal behaviour before you’re debugging a problem under pressure.