Anthropic updates Claude Managed Agents: self-hosted sandboxes, mid-session tool swapping, and cache diagnostics

Update, 29 June 2026: Anthropic launches advanced tool use beta: on-demand tool discovery cuts token overhead by 85%

Anthropic has shipped three new beta features under the advanced-tool-use-2025-11-20 header that extend the platform infrastructure story covered below into the tool-loading layer itself.

The headline addition is the Tool Search Tool, which lets Claude discover tools on demand rather than receiving every definition upfront. Developers mark tools with defer_loading: true, keeping them out of Claude’s initial context. When Claude needs a capability, it searches and expands only the relevant definitions inline. Crucially, deferred tools are stripped before the prompt cache key is computed, so adding them to a request does not invalidate existing cache entries — a direct improvement on the cache diagnostics transparency discussed in the original article.

The second feature, programmatic tool calling, adds an allowed_callers array that guides Claude to orchestrate multiple tools through generated Python code rather than sequential LLM calls. On a 75-tool project-management benchmark, this reduced billed input tokens by roughly 38% with no accuracy change.

The third feature, input_examples, attaches concrete call examples to tool definitions, lifting parameter accuracy from 72% to 90% in internal testing.

Together, the three features reduce context consumption by up to 85% on Anthropic’s internal benchmarks for large tool libraries. Teams running 150-plus tools will see the clearest gains. Full details are in the Anthropic engineering post and the Tool Search Tool docs.

Update, 16 June 2026: Claude Platform on AWS launches with AWS billing, IAM, and native access to full Claude API

Claude Platform on AWS reached general availability on May 11, 2026, giving enterprise teams a native cloud path to the full Claude API stack billed and authenticated through AWS. This extends the Managed Agents story covered below: AWS customers can now deploy Claude Managed Agents alongside the self-hosted sandbox controls, mid-session tool swapping, and cache diagnostics described in the original post, without a separate Anthropic contract or billing relationship.

The integration covers the complete Claude API feature set, including Managed Agents, Agent Skills, code execution, web search, prompt caching, batch processing, and MCP connectors. Authentication runs through AWS IAM and SigV4, audit logging through CloudTrail, and billing through a single AWS invoice that retires against existing AWS commitments. New features and betas ship on the same day they go live on the native Claude API. Available models at launch include Claude Opus 4.7, Sonnet 4.6, and Haiku 4.5.

One distinction worth flagging for compliance-sensitive teams: Claude Platform on AWS is operated by Anthropic, so data is processed outside the AWS boundary. This differs from Amazon Bedrock, where AWS processes data without sharing it with Anthropic. Teams with strict data residency requirements should factor this into their deployment decision.

Pricing matches the direct Claude API. Managed Agents sessions carry the same per-millisecond runtime charge that applies elsewhere, a cost layer absent on Bedrock.

Full details at the Anthropic blog and API docs.

Claude Managed Agents only launched in April, and Anthropic is already shipping meaningful infrastructure updates to it. Three of them dropped this week, and while they won’t make headlines the way a new model would, they matter quite a bit if you’re building production agents.

Here’s what changed and why it’s worth paying attention to.

Self-hosted sandboxes: your infrastructure, Anthropic’s orchestration

By default, when Claude Managed Agents runs tool calls, that execution happens inside Anthropic-managed cloud infrastructure. That’s convenient, but it’s a non-starter for a lot of enterprise teams, particularly in regulated industries where code, file system access, and network egress can’t leave the organisation’s environment.

Self-hosted sandboxes, now in public beta, split the model cleanly. Anthropic’s orchestration layer stays on their side. Tool execution moves to yours.

The way it works: you run an environment worker, a process on your own infrastructure, that connects to Anthropic’s work queue. When a session needs to execute a tool, Anthropic enqueues the request. Your worker picks it up, runs it locally, and posts the result back. The agent loop continues normally, but the actual execution never left your environment.

Supported managed providers include Cloudflare, Daytona, Modal, and Vercel, if you want the benefits of self-hosting without standing up bare infrastructure. And if you’re on AWS, authentication is handled through IAM with SigV4 rather than an environment key. You attach the AnthropicSelfHostedEnvironmentAccess managed policy to the IAM principal your worker runs under.

A couple of current limitations worth knowing: self-hosted sandboxes don’t yet support the Memory feature, and session creation from within Claude Platform on AWS is not yet available (though the workers themselves can run on AWS).

What this means for you: If your organisation has been interested in Claude agents but couldn’t deploy them because tool execution happens in Anthropic’s cloud, this removes that blocker. Moving between Anthropic-managed and self-hosted execution is a configuration change, not an integration rewrite.

Hot-swapping MCP server and tool configs mid-session

The second update is smaller but practically useful. You can now update an active session’s MCP server and tool configurations without tearing down and recreating the session.

For long-running agents, this matters. Agents that run over extended periods might need access to different tools at different stages of a task. Previously, changing what tools were available meant ending the session and starting a new one, losing conversation state in the process. Now you can update the configuration while the session is live.

What this means for you: If you’re building agents that handle multi-step workflows where tool requirements shift as the task progresses, you no longer need to architect around session restarts. This simplifies the logic considerably for anything that runs longer than a single interaction.

Cache diagnostics: finally understanding why your cache missed

This one is for anyone who has spent time debugging prompt caching behaviour and come away frustrated.

Prompt caching can meaningfully cut costs and latency on long conversations, but when a cache miss happens unexpectedly, the API has historically given you no information about why. You just see the miss in your usage data and start guessing.

Cache diagnostics, now in public beta, adds a diagnostics.previous_message_id field to Messages API requests. Pass it, and the API returns a cache_miss_reason that tells you exactly where the prompt cache prefix diverged from the previous turn.

This is directly useful context: in April, Anthropic had a bug where an optimisation designed to clear old thinking sections from idle sessions fired on every turn instead of just once, causing unexpected cache misses across the board. Without tooling like this, teams were left correlating usage spikes against guesses. With cache_miss_reason, you can diagnose this kind of thing programmatically.

What this means for you: If you’re using prompt caching and your cost or latency numbers are behaving unexpectedly, this gives you a concrete starting point for debugging. No more reading tea leaves from token counts.

One more thing worth noting: large output handling

Alongside the three headline updates, there’s a quiet quality-of-life improvement for agents that generate large outputs. When agent_toolset or MCP tools produce results exceeding 100,000 tokens, the output is now automatically spilled to a file in the sandbox. The model receives a truncated preview and the file path, and can read the full content from there if needed. This prevents large tool outputs from blowing up context windows or hitting response size limits mid-run.

The bigger picture

These updates reflect where Anthropic is focusing its platform work right now. Claude Managed Agents is still very new, and the gap between “interesting demo capability” and “production-ready enterprise deployment” comes down to exactly this kind of infrastructure work: where does execution happen, how do you reconfigure at runtime, and how do you observe what’s actually going on.

None of these updates change what Claude can do. They change whether Claude agents can fit into the environments where serious production deployments actually live.

If you’re evaluating Claude Managed Agents for an enterprise context, the self-hosted sandboxes documentation is the place to start. The cache diagnostics beta is worth enabling even if caching is working fine for you today. It’s much easier to understand normal behaviour before you’re debugging a problem under pressure.