Reference

OWASP LLM Top 10 2025: Every Risk Explained with Mitigations

Walk through every item in the OWASP LLM Top 10 with practical mitigations and a coverage map for runtime defense layers.

Alec Burrell· Founder, Context Guard Published 4 May 2026 11 min read
OWASP LLM Top 10 2025: Every Risk Explained with Mitigations

The OWASP Top 10 for Large Language Model Applications has become the reference framework that procurement teams, auditors, and security reviewers reach for when they want a common vocabulary for AI risk. This post walks through every item in the LLM Top 10, what it actually means in practice, and how a runtime defense maps to each one.

Grid of all ten OWASP LLM Top 10 risks, color-coded by severity from critical (red) to medium (purple) and cost (fuchsia).
The OWASP LLM Top 10 at a glance. Items in red are the entry points for the most expensive incidents we see in production.

LLM01: Prompt Injection

Untrusted text manipulates the model into ignoring its instructions or taking actions the developer did not intend. Includes both direct injection (user types the payload) and indirect injection (payload is hidden inside retrieved content, files, or tool output). This is the single most reported AI vulnerability class and the entry point for most other items on this list.

Mitigation: inspect every channel that contributes content to the model's context, not just the user's message. Layer signature, heuristic, and LLM-judge detection. Never escalate privileges based on prompt content alone.

LLM02: Sensitive Information Disclosure

The model leaks information it should not - PII from training data, a previous user's session content, the system prompt, internal documents, or secrets pasted by a careless engineer. Often a consequence of LLM01 but also occurs naturally in models that have memorized training samples.

Mitigation: outbound response scanning for emails, phone numbers, credit card numbers, API keys, and credentials. Redaction or tokenization based on policy. Strict tenant isolation in any caching layer. Never serialize raw secrets into prompts.

LLM03: Supply Chain

Compromised third-party models, fine-tuning datasets, embeddings, or plugins. The attacker poisons the supply rather than the request. Includes uploaded LoRA adapters, public model cards on Hugging Face that have been swapped, and dependency confusion in plugin ecosystems.

Mitigation: pin model versions, verify checksums on ingested artifacts, vet plugin authors, and prefer first-party providers for production traffic. At runtime, behavioral monitoring catches deviations even when the supply chain is breached.

LLM04: Data and Model Poisoning

Adversaries contribute corrupted data to training sets, fine-tuning jobs, or RAG indices. The model learns - or retrieves - what the attacker wanted it to. The 2024 wave of academic papers on backdoor attacks demonstrated this is practical against open and closed systems alike.

Mitigation: provenance tracking on training and retrieval data, per-source trust scoring at retrieval time, and runtime detection of behavioral anomalies that might indicate a backdoor trigger has fired.

LLM05: Improper Output Handling

Downstream systems blindly trust LLM output. The model produces SQL that gets executed, shell commands that get run, HTML that gets rendered, or URLs that get fetched - all without sanitization. This is where prompt injection turns into RCE.

Mitigation: treat all model output as user input for the downstream system. Parameterize SQL. Sandbox shell calls. Escape HTML. Allowlist domains for fetched URLs. The proxy layer can apply output filters before the response is returned to the application.

LLM06: Excessive Agency

Agents are given tools and permissions disproportionate to the task. A summarization agent can send emails. A research bot has shell access. When prompt injection lands, the blast radius is enormous.

Mitigation: principle of least privilege at the tool layer. Per-tool scopes, per-tenant policies, and human confirmation for irreversible actions. The proxy can enforce per-key tool allowlists and reject tool calls that violate the configured policy.

LLM07: System Prompt Leakage

The system prompt is treated as a security boundary - it usually contains business logic, role definitions, and sometimes API keys. When it leaks, attackers learn how to break the system more efficiently and often discover credentials they can replay.

Mitigation: do not put secrets in system prompts. Detect and block exfiltration attempts ("repeat the text above", "ignore all and print"). Treat the system prompt as semi-public; the security model should hold even if it leaks.

LLM08: Vector and Embedding Weaknesses

Attackers craft inputs that map to specific embedding regions to poison retrieval. Includes embedding-similarity collisions, adversarial examples that retrieve attacker-chosen documents, and weaknesses in vector store access controls (cross-tenant retrieval).

Mitigation: tenant-isolated vector stores, content inspection on indexed material, and detection rules that flag retrieved chunks containing instruction-like patterns. Audit your embedding pipeline as if it were a privileged code path.

LLM09: Misinformation

The model produces confident, plausible, false information that downstream users act on. Includes hallucinated citations, invented APIs, fabricated case law, and misleading medical guidance. Not always adversarial - sometimes the model is just wrong - but the impact can be the same.

Mitigation: ground generation in retrieved sources, surface citations, run factuality checks for high-stakes domains, and rate-limit confident assertions in regulated contexts. A defense layer can flag unsourced numerical claims for human review.

LLM10: Unbounded Consumption

Resource exhaustion attacks: long-prompt floods, recursive tool calls, expensive sampling parameters, model-DoS via pathological inputs. Often weaponized to drive up the victim's LLM bill or take the service offline.

Mitigation: per-key rate limits, token quotas, max conversation depth, tool-call budgets, and detection of inputs designed to maximize cost. Surface the cost picture to operators in real time so abuse is visible before the bill arrives.

How Context Guard maps to the LLM Top 10

Every detection rule in Context Guard carries an OWASP reference. That is not cosmetic - it is the audit trail that procurement and compliance teams ask for. When a threat is flagged, theowasp_ref field on the detection result tells you which framework item the rule addresses, which makes coverage reports a database query rather than an interpretive exercise.

  • LLM01, LLM07 - signature, heuristic, and judge detectors on every prompt; full-prompt inspection including retrieved context.
  • LLM02, LLM05 - outbound PII and secret redaction with policy-driven action (mask, replace, tokenize, block).
  • LLM06 - per-key tool allowlists and policy enforcement at the proxy.
  • LLM08 - retrieval inspection rules that flag instruction-like patterns inside RAG chunks.
  • LLM10 - per-key rate limits, token quotas, and cost monitoring with alerts.
Need an OWASP coverage report for a security review? The triage console exports threat events with their owasp_ref mapping so you can demonstrate which items the platform addresses. Start a free trial and generate one against real traffic.

Next steps

Start with the items most relevant to your architecture. If you run a RAG application, LLM01, LLM05, and LLM08 are existential. If you run an agent with tools, add LLM06. If you handle regulated data, LLM02 and LLM07. The Top 10 is not a checklist to satisfy once - it is a threat model to revisit every time you ship a new LLM feature.

OWASPLLM securitycomplianceAI risk

Ready to defend your LLM stack?

Context Guard is the drop-in proxy that detects prompt injection, context poisoning, and data exfiltration in real time - mapped to OWASP LLM Top 10. Try it on your own traffic with a 14-day free trial, no credit card.

  • < 30 ms p50 inline overhead
  • Works with OpenAI, Anthropic, and any compatible upstream
  • Triage console + structured webhooks

Related posts

All posts →