Threat research

LLM Supply Chain Attacks: How Compromised Models, Plugins, and Dependencies Subvert Your AI Stack

Compromised model weights, malicious MCP servers, template injection, sandbox escapes, SSRF, and framework vulnerabilities give attackers a path into your LLM stack that no prompt filter can close. Here are the six supply chain attack classes we see in production, the CVEs and advisories behind them, and the defense architecture that stops them.

Alec Burrell· Founder, Context Guard Published 3 June 2026 14 min read
LLM Supply Chain Attacks: How Compromised Models, Plugins, and Dependencies Subvert Your AI Stack

Your LLM application is only as trustworthy as its weakest dependency. Compromised model weights, malicious plugins, backdoored MCP servers, and vulnerable frameworks give attackers a path into your AI stack that no amount of prompt engineering can close. OWASP LLM03 (Supply Chain) and LLM04 (Data and Model Poisoning) describe the class. The CVEs, GitHub advisories, and real-world exploits describe the reality. Here is the full attack surface, the six supply chain attack classes we see in production, and the defense architecture that stops them.

Why supply chain is the LLM attack class nobody talks about

When security teams evaluate LLM risk, they focus on what the user types into the prompt. Prompt injection, context poisoning, and jailbreaking dominate the conversation because they are visible, testable, and easy to demonstrate. But the most consequential attacks do not come from the user. They come from the components your application trusts by default: the model weights you download, the plugins you install, the MCP servers you connect, and the frameworks you build on.

The supply chain attack surface for LLM applications is larger than most teams realize. A typical production LLM stack includes:

  • The foundation model (GPT-4, Claude, Llama, Mistral) trained on billions of tokens the provider does not fully disclose.
  • Embedding models that convert text into vectors for retrieval, trained on datasets you did not curate.
  • Plugin and tool registries that expose functions to the model, often with full network and file-system access.
  • MCP servers that extend the model with external capabilities, any one of which can be compromised.
  • Frameworks and SDKs (LangChain, LlamaIndex, CrewAI, AutoGen) that orchestrate prompts, tool calls, and agent loops.
  • Template engines that interpolate user input into prompts, creating injection vectors when the input is not sanitized.
  • Vector databases that store embeddings and retrieval chunks, often with minimal access controls.

Each of these components is a trust boundary. Each one can be compromised independently. And each one creates an attack path that bypasses input-side prompt filters entirely, because the attacker controls the component before it ever reaches your detection pipeline.

The OWASP LLM Top 10 classifies this under LLM03 (Supply Chain Vulnerabilities) and LLM04 (Data and Model Poisoning). Context Guard maps its supply-chain detection rules to both categories, covering the full range from model-level poisoning to plugin-level sandbox escapes.

Six supply chain attack classes in production

These are not theoretical. Every attack class below has been demonstrated in published research, documented in CVEs, or observed in production traffic. They share a common property: the attacker does not need to send a malicious prompt. They compromise a component that your application already trusts.

1. Model and training data poisoning

Foundation models are trained on datasets too large for any human to audit. When those datasets contain poisoned samples, the model learns behaviors the trainer never intended. The poisoning can be subtle (a slight bias in financial risk assessments) or overt (a backdoor trigger that makes the model output a specific string when it sees a particular pattern).

The DeepSeek robustness study (2026) demonstrated that semantic-character dual-space mutations (combining meaning-level rewording with character-level obfuscation) significantly degrade model defenses. When training data is poisoned, the model itself becomes the attack vector. No input-side filter can stop a model that has been trained to misbehave, because the malicious behavior is embedded in the weights.

Fine-tuning compounds the risk. Organizations that fine-tune foundation models on proprietary data often assume the fine-tuning data is clean because it is internal. But fine-tuning datasets are assembled from internal wikis, customer support logs, and user-generated content, all of which can contain injected or biased data. A single poisoned document in a fine-tuning corpus can embed a persistent backdoor that survives the fine-tuning process.

Detection: Context Guard does not inspect model weights (that is a model-provenance problem, not a runtime problem). But it does detect the downstream effects of model poisoning: backdoor triggers in user input, anomalous outputs that leak training data, and model behaviors that diverge from the expected instruction hierarchy. Rules like de_rag_knowledge_leak catch data exfiltration that may originate from poisoned retrieval. The ML judge identifies behavioral anomalies in model responses that suggest backdoored behavior.

2. Compromised plugins and MCP servers

The Model Context Protocol (MCP) is rapidly becoming the standard way to extend LLM capabilities. MCP servers expose tools, resources, and prompts to the model through a standardized interface. The problem is that MCP servers are network services, and network services have vulnerabilities.

In the v2.0 threat intelligence update alone, Context Guard added three MCP-specific detection rules:

  • ta_mcp_tool_hijack (critical): Detects attempts to modify MCP tool descriptions to hijack agent tool calls. This is the MCP Function Hijacking attack, where an attacker modifies a tool description to inject instructions that the model follows when calling the tool.
  • ta_mcp_unauth_sse (high): Detects references to unauthenticated MCP SSE transport endpoints, mapped to GHSA-8jr5-6gvj-rfpf. An MCP server that exposes an SSE endpoint without authentication allows any network participant to inject messages into the model's context stream.
  • ta_mcp_sse_injection (high): Detects injected SSE event fields that could forge MCP messages, mapped to GHSA-m9g3-3g99-mhpx. Event fields like event, data, and id can be forged to create fake tool responses that the model treats as legitimate.

Beyond these, additional MCP vulnerabilities detected by Context Guard include:

  • ta_mcp_ssrf_unauthenticated_endpoint: MCP servers that expose unauthenticated endpoints vulnerable to server-side request forgery.
  • ta_mcp_dns_rebinding_cors: DNS rebinding attacks that bypass CORS policies to reach MCP server internals.
  • ta_mcp_save_path_traversal: Path traversal vulnerabilities in MCP server file operations that let attackers read or write arbitrary files.

The CVE-2025-52573 vulnerability in the iOS Simulator MCP Server demonstrated this concretely: a command injection vulnerability in a widely-used MCP server that could be exploited by any client that connected to it. The model did not need to be prompted maliciously. The MCP server itself was the attack vector.

The supply chain risk is structural. When you install an MCP server, you are giving it access to your model's context window. If the server is compromised, the attacker controls a trusted input channel. This is the same trust boundary that makes agent tool hijacking so dangerous, but it operates at the infrastructure level rather than the prompt level.

3. Template injection in LLM frameworks

LLM frameworks use template engines to assemble prompts from multiple sources: system messages, user input, retrieved context, and tool definitions. When user input flows into a template without sanitization, the template engine evaluates it as code. This is not a new vulnerability class. Template injection has been documented in web applications for over a decade. But in LLM applications, the blast radius is dramatically larger because the template engine has access to the application's Python runtime.

CVE-2025-65106 exposed this in LangChain: a template injection vulnerability that allowed attackers to access Python object internals through Jinja2 template syntax. The attack looks like this:

attack
User input: "{{ config.__class__.__init__.__globals__ }}"

Rendered prompt: "You are a helpful assistant. {{ config.__class__.__init__.__globals__ }}"
Result: The Jinja2 engine evaluates __class__.__init__.__globals__, exposing the application's global scope, including imported modules, environment variables, and potentially secrets.

The vulnerability is not limited to Jinja2. Any template engine that evaluates user input as code (Django templates, Python f-strings with dunder access, Mako, Tornado) is susceptible. The specific detection rules Context Guard provides are:

  • et_template_injection (high): Detects Jinja2 and Django template syntax in any input that flows into a prompt. Catches {{ and {% delimiters, template filters, and variable access patterns.
  • et_fstring_injection (critical): Detects Python dunder attribute access via format strings. Catches __class__, __globals__, __init__, __builtins__, and other dunder attributes that expose the Python runtime.

Template injection is a supply chain vulnerability because it exists in the framework, not in the prompt. The attacker does not need to craft a malicious instruction. They need to craft a string that the template engine evaluates. The model never sees the raw input. The template engine processes it first, and the result (which may include sensitive data from the runtime) flows into the prompt.

This is the same vulnerability class that affects CI/CD pipelines, but it operates at the application layer rather than the CI/CD layer. The defense is the same: treat every input that flows into a template as untrusted, and inspect it before rendering.

4. Sandbox escapes and code execution

Many LLM applications give the model the ability to execute code. Code Interpreter, tool-calling agents, and sandboxed execution environments all assume that the sandbox contains the model's actions. When the sandbox is escaped, the attacker gets arbitrary code execution on the host.

Context Guard detects several sandbox escape patterns:

  • ta_sandbox_ast_bypass_builtins: Prompts that instruct the model to use Python ast module to bypass restricted builtins, accessing eval, exec, or __import__ through abstract syntax tree manipulation.
  • ta_sandbox_shell_exec_escape: Prompts that attempt to escape sandboxed execution through shell metacharacters, subprocess calls, or OS command injection.
  • ta_pickle_deserialization_rce: Deserialization of untrusted data using Python's pickle module, which allows arbitrary code execution during deserialization. This is a known supply chain vector where a poisoned dataset includes pickle payloads that execute code when loaded.
  • ta_vllm_trust_remote_code_bypass: Exploitation of the trust_remote_code flag in vLLM and Hugging Face model loading, which allows arbitrary code execution when loading a compromised model from a remote repository.

The trust_remote_code vulnerability deserves special attention because it is a direct model supply chain attack. When you load a model from Hugging Face with trust_remote_code=True, you are executing arbitrary Python code from the model repository. A compromised model repository can include malicious code in its custom modeling files that executes during model loading, before any inference ever happens. The model weights may be perfectly legitimate. The code that runs when you load them is not.

vLLM deployments are particularly vulnerable because the default configuration in many tutorials enables trust_remote_code without explaining the risk. A production deployment that loads models from public repositories without verifying the code hashes is accepting a supply chain risk equivalent to running arbitrary code from an untrusted npm package.

5. SSRF and path traversal in LLM infrastructure

LLM applications make network requests. They fetch URLs, call APIs, read files from local and remote storage, and connect to databases. Every one of these capabilities is a potential SSRF (Server-Side Request Forgery) or path traversal vector.

Context Guard detects:

  • ta_ssrf_fetch_no_validation: Tool calls that fetch URLs without validating the destination, allowing the model to reach internal services, cloud metadata endpoints, or attacker-controlled servers.
  • ta_mcp_ssrf_unauthenticated_endpoint: MCP servers that expose unauthenticated endpoints vulnerable to SSRF, allowing an attacker to proxy requests through the MCP server to internal resources.
  • ta_zip_slip_path_traversal: Path traversal through archive extraction, where a crafted ZIP file contains file paths that escape the intended directory (e.g., ../../../etc/passwd).
  • ta_path_normpath_bypass: Path normalization bypasses that use encoding tricks (double encoding, null bytes, overlong UTF-8) to evade path validation.
  • ta_mcp_save_path_traversal: Path traversal in MCP server file save operations, allowing an attacker to write files to arbitrary locations on the host.

SSRF is particularly dangerous in LLM applications because the model often has legitimate reasons to fetch URLs. A RAG pipeline that retrieves content from the web, a code interpreter that fetches data from S3, or an agent that calls external APIs all require network access. The boundary between a legitimate tool call and an SSRF attack is a validation check that most applications never implement.

Cloud metadata endpoints are the highest-value SSRF targets. An LLM application running on AWS can be tricked into fetching http://169.254.169.254/latest/meta-data/iam/security-credentials/ to retrieve IAM credentials. On GCP, the equivalent endpoint is http://metadata.google.internal. On Azure, it is http://169.254.169.254/metadata/instance. The model does not know it is fetching credentials. It is following a prompt that asks it to retrieve a URL. The credentials flow into the model's context and from there into the response, where output exfiltration techniques can extract them.

6. Framework and SDK vulnerabilities

The LLM application stack depends on frameworks and SDKs that are evolving rapidly. LangChain, LlamaIndex, CrewAI, AutoGen, and the Anthropic and OpenAI SDKs all ship with features that prioritize functionality over security. The result is a growing list of vulnerabilities in the infrastructure layer.

Context Guard detects several framework-level vulnerabilities:

  • ta_anthropic_sdk_sandbox_race and ta_anthropic_sdk_sandbox_race_explicit: Race conditions in the Anthropic SDK's sandbox implementation that allow tool calls to execute before the sandbox policy is applied.
  • ta_untrusted_manifest_deserialization: Deserialization of untrusted manifest files in agent frameworks, allowing arbitrary code execution through YAML or JSON payloads that reference malicious Python classes.
  • ta_langsmith_untrusted_manifest: Untrusted manifest loading in LangSmith that can execute arbitrary code during agent initialization.
  • ta_vllm_unbounded_cache_dos: Unbounded cache growth in vLLM that can be exploited for denial of service, consuming all available memory.
  • ta_token_placeholder_dos: Token placeholder attacks that inflate token count through specially crafted input that maximizes tokenization overhead.
  • ta_video_frame_memory_exhaustion: Memory exhaustion through video frame processing in multimodal models, where a crafted video file can consume all available GPU memory.

These are not prompt-level attacks. They are infrastructure vulnerabilities that exist in the frameworks and SDKs that LLM applications depend on. No amount of prompt filtering or input validation can prevent a race condition in the Anthropic SDK or a deserialization vulnerability in LangChain. The fix has to come from the framework itself, and the detection has to happen at the infrastructure layer.

Why prompt security is not enough

The fundamental insight of supply chain security for LLMs is that input-side prompt inspection cannot catch attacks that originate in the components your application trusts. You can have the best prompt injection detection in the world, and it will not help when:

  • The MCP server your agent connects to has been compromised and is injecting instructions into the tool descriptions.
  • The template engine your application uses evaluates user input as code and leaks environment variables.
  • The model you loaded from Hugging Face contains malicious code that executes during initialization.
  • The vLLM server processing your requests has a sandbox escape that allows arbitrary command execution.
  • The framework you built on deserializes an untrusted manifest and runs attacker-controlled code.

In each case, the attack bypasses the prompt entirely. It operates at a layer below the prompt, in the infrastructure that supports the model. This is why OWASP classifies these separately as LLM03 (Supply Chain) and LLM04 (Data and Model Poisoning). They are distinct from LLM01 (Prompt Injection) because they do not require the attacker to craft a malicious input. They require the attacker to compromise a component that the application already trusts.

The defense requires a fundamentally different approach: instead of inspecting what goes into the model, you need to inspect what the model connects to, what code it runs, and what components it trusts.

The LLM supply chain defense architecture

A complete supply chain defense operates at three layers: component verification, runtime inspection, and infrastructure hardening. No single layer is sufficient.

1. Component verification

Before any component enters your stack, verify it:

  • Model provenance: Verify model hashes against published checksums. Pin model versions. Do not use trust_remote_code=True without auditing the model repository code.
  • MCP server auditing: Audit every MCP server before connecting it. Verify authentication, validate SSE transports, and restrict network access to the minimum required endpoints.
  • Dependency pinning: Pin all framework and SDK versions. Use lockfiles. Verify package integrity hashes. Do not auto-update dependencies in production.
  • Template sandboxing: Never render user input in an unescaped template context. Use sandboxed Jinja2 environments that restrict access to dunder attributes and builtins.

2. Runtime inspection

At runtime, inspect every input that reaches the model, including inputs from trusted components:

This is where Context Guard operates. Every prompt, including tool descriptions, retrieved context, and framework-assembled inputs, flows through the detection pipeline before it reaches the model. The supply chain detection rules catch attacks that originate in the infrastructure layer:

  • Template injection detection: et_template_injection and et_fstring_injection catch template syntax and dunder access in any input, whether it comes from a user prompt or a framework-assembled context.
  • Sandbox escape detection: ta_sandbox_ast_bypass_builtins, ta_sandbox_shell_exec_escape, and ta_pickle_deserialization_rce catch escape attempts in tool-call inputs and code generation outputs.
  • MCP security detection: ta_mcp_tool_hijack, ta_mcp_unauth_sse, ta_mcp_sse_injection, ta_mcp_ssrf_unauthenticated_endpoint, ta_mcp_dns_rebinding_cors, and ta_mcp_save_path_traversal cover the full MCP attack surface.
  • Framework vulnerability detection: ta_anthropic_sdk_sandbox_race, ta_untrusted_manifest_deserialization, ta_langsmith_untrusted_manifest, ta_vllm_trust_remote_code_bypass, and ta_vllm_unbounded_cache_dos catch exploitation of known framework vulnerabilities.
  • SSRF and path traversal detection: ta_ssrf_fetch_no_validation, ta_zip_slip_path_traversal, and ta_path_normpath_bypass catch network and filesystem attacks through tool calls.

Every rule carries an OWASP reference (LLM03 for supply chain, LLM04 for data and model poisoning, LLM01 for injection through supply chain components) so your compliance team can map every event to the framework without manual work.

3. Infrastructure hardening

Beyond detection, harden the infrastructure that supports your LLM stack:

  • Network isolation: Run MCP servers, model inference endpoints, and tool execution environments on separate network segments. No LLM component should have unrestricted network access.
  • Capability scoping: Every tool the model can call should have minimum permissions. A search tool does not need filesystem access. A calculation tool does not need network access. An email tool does not need to read environment variables.
  • Sandbox enforcement: Code execution environments must be hardened containers with resource limits, network restrictions, and no access to host secrets. The sandbox should be treated as untrusted infrastructure, not as a security boundary.
  • Secret management: Secrets (API keys, database credentials, cloud tokens) must never be in environment variables accessible to the model, in configuration files that the model can read, or in system prompts that the model can leak. Use a secrets manager, not .env files.

Real-world impact

The supply chain attack surface is not hypothetical. In the last year alone:

  • CVE-2025-65106: LangChain template injection allowed Python dunder attribute access through Jinja2 templates, exposing application globals, environment variables, and secrets to any user input that flowed into a prompt template.
  • CVE-2025-52573: iOS Simulator MCP Server command injection allowed arbitrary OS command execution through a widely-used MCP server.
  • GHSA-8jr5-6gvj-rfpf: Unauthenticated MCP SSE transport endpoints allowed any network participant to inject messages into the model's context stream.
  • GHSA-m9g3-3g99-mhpx: SSE event injection in eventsource-encoder allowed forging MCP messages to manipulate tool responses.
  • GHSA-v6wj-c83f-v46x: MCP server OS command injection through crafted tool inputs.
  • GHSA-8g7g-hmwm-6rv2: n8n-mcp path traversal and SSRF through MCP server file operations.

Each of these vulnerabilities was in a component that LLM applications trusted by default. None of them could be caught by input-side prompt filtering. The attack surface exists below the prompt layer, in the infrastructure that supports the model.

The trust boundary problem

Supply chain attacks exploit a fundamental trust boundary mismatch. LLM applications trust components that were not designed to be trusted in the way LLM applications use them. A Python template engine was designed to render web pages, not to evaluate arbitrary user input in the context of an LLM prompt. An MCP server was designed to provide tools to a local development environment, not to expose authenticated endpoints to a production model. A model repository was designed to share research artifacts, not to execute arbitrary code on production inference servers.

The trust model that makes these components safe in their original context breaks down when they are composed into an LLM application. The template engine gains access to the model's context window. The MCP server gains access to the model's tool calls. The model repository gains access to the inference server's runtime. Each composition expands the attack surface.

The fix is not to stop using these components. It is to treat every component as untrusted and to enforce security boundaries at every composition point. Runtime inspection at the prompt layer catches attacks that originate in trusted components. Infrastructure hardening limits the blast radius when a component is compromised. Component verification reduces the probability of compromise in the first place.

How Context Guard secures the LLM supply chain

Context Guard runs as a reverse proxy in front of your LLM provider. Every prompt, including its system message, retrieved context, tool descriptions, and tool outputs, flows through the detection pipeline before it reaches the model. For supply chain attacks specifically:

  • MCP security: 6 detection rules covering tool hijacking, SSE injection, SSRF, DNS rebinding, and path traversal in MCP servers.
  • Template injection: 2 detection rules covering Jinja2/Django template syntax and Python dunder attribute access.
  • Sandbox escapes: 4 detection rules covering AST bypass, shell escape, pickle deserialization, and remote code execution through model loading.
  • SSRF and path traversal: 4 detection rules covering unvalidated fetch operations, ZIP slip, path normalization bypasses, and MCP file path traversal.
  • Framework vulnerabilities: 5 detection rules covering Anthropic SDK race conditions, manifest deserialization, vLLM remote code execution, cache DoS, and memory exhaustion.
  • Additional supply chain rules: Token placeholder DoS, video frame memory exhaustion, DLL assembly loading, stored SQL injection in thread IDs, LibreChat secret leaks, and Mautic Twig SSTI.

These 21 supply-chain rules join the existing 49-rule detection library, bringing the total to 70 rules covering the full OWASP LLM Top 10. Every rule carries an OWASP reference (LLM03, LLM04, LLM01, LLM10) so your compliance team can generate coverage reports without manual mapping.

Want to test supply chain detection against your own traffic? Paste a template injection payload, an MCP hijacking attempt, a sandbox escape string, or an SSRF URL into the live demo and see the detection result, risk score, and matched rule in real time. No signup required.

LLM supply chain security checklist

Before deploying an LLM application to production, verify every item on this list:

  • Model provenance is verified: hashes match published checksums, versions are pinned, and trust_remote_code is disabled or audited.
  • Every MCP server is authenticated, its SSE transport is encrypted, and its endpoints are not exposed to untrusted networks.
  • User input is never rendered in an unescaped template context. Jinja2 sandboxes restrict dunder access.
  • Code execution environments are hardened containers with resource limits, network restrictions, and no host secret access.
  • Tool descriptions are pinned and validated at runtime. No dynamic tool registration without verification.
  • SSRF protections are in place: URL validation, allowlists for external fetches, and blocking of cloud metadata endpoints.
  • Path traversal protections are in place: path normalization, allowlists for file access, and archive extraction in isolated directories.
  • Framework and SDK versions are pinned, lockfiles are used, and package integrity hashes are verified.
  • Secrets are managed through a secrets manager, not environment variables, .env files, or configuration accessible to the model.
  • Runtime detection covers template injection, sandbox escapes, SSRF, path traversal, and MCP-specific vulnerabilities.
  • OWASP LLM03 (Supply Chain) and LLM04 (Data and Model Poisoning) are covered by both detection rules and architectural mitigations.

If your LLM application connects to MCP servers, renders templates, or loads models from external repositories, and you are not inspecting those inputs at runtime, you have a supply chain blind spot. The security page has the full architecture. The free trial has the product.

supply chainLLM securityMCP securitytemplate injectionsandbox escapeOWASP LLM03OWASP LLM04CVE-2025-65106

Ready to defend your LLM stack?

Context Guard is the drop-in proxy that detects prompt injection, context poisoning, and data exfiltration in real time - mapped to OWASP LLM Top 10. Try it on your own traffic with a 14-day free trial, no credit card.

  • < 30 ms p50 inline overhead
  • Works with OpenAI, Anthropic, and any compatible upstream
  • Triage console + structured webhooks

Related posts

All posts →