Detection Benchmarks

Per-dataset results for the Context Guard hybrid engine, and a head-to-head against two widely used open-source detectors on the same test set.

Updated 11 May 2026. Results measured on identical samples under the same conditions.

How to read this page

Recall is the share of real attacks the detector catches. Precision is the share of its flags that are real attacks. FPR is the share of benign prompts incorrectly flagged. All three matter in production.

Context Guard, per dataset

Results across six public test sets

Context Guard hybrid configuration (signature rules plus the ML judge) run against the main suite and five public benchmarks for prompt injection, indirect injection, system-prompt extraction, and jailbreaks.

Dataset	Samples	Recall	Precision	FPR
Main suite Mixed public corpora plus benign traffic	332 prompts	100.0%	93.4%	8.8%
BIPIA Indirect injection via retrieved content	150 prompts	100.0%	100.0%	0.0%
TensorTrust System-prompt extraction games	120 prompts	100.0%	100.0%	0.0%
CyberSecEval Meta's prompt-injection battery	200 prompts	97.4%	100.0%	0.0%
JailbreakBench Curated jailbreak prompts	100 prompts	77.1%	93.1%	13.3%
AdvBench GCG-style adversarial suffix attacks	80 prompts	22.5%	100.0%	0.0%

Datasets: PromptBench v2, OWASP LLM01, ContextGuard Public (main suite); BIPIA (Yi et al.); TensorTrust (Toyer et al.); CyberSecEval (Meta); JailbreakBench (Chao et al.); AdvBench (Zou et al.).

Head-to-head

Three detectors on the main suite

Each detector saw the same 332 prompts on the same machine. Numbers are out-of-the-box defaults for each tool, no tuning.

Detector	Recall	Precision	FPR
Context Guard Hybrid (rules + ML)	100.0%	93.4%	8.8%
LLM Guard DeBERTa-v3 classifier	55.4%	92.7%	5.4%
PromptGuard 86M-parameter transformer	91.9%	53.8%	98.0%

LLM Guard uses ProtectAI's DeBERTa-v3 prompt-injection classifier. PromptGuard refers to Meta's 86M-parameter prompt-injection model. Numbers reflect a single run; small variance is expected on re-runs of neural detectors.

Latency

Per-request scan time

Average scan time per prompt, single-threaded CPU. Neural detectors would be faster on a GPU.

Context Guard (hybrid)0.5 ms

LLM Guard (DeBERTa-v3)60 ms

PromptGuard (86M params)65 ms

Reading the numbers

What this means

Strong on indirect injection and prompt extraction

Context Guard scores 100% recall with zero false positives on BIPIA and TensorTrust. These corpora test attacks that arrive through retrieved documents or social engineering of system prompts, which is where pattern matching combined with the ML judge tends to do well.

LLM Guard performs better on adversarial suffix attacks

Neural classifiers like LLM Guard generalise better to gibberish-suffix attacks (CyberSecEval, JailbreakBench, AdvBench) that don't resemble anything in particular. Our 22.5% recall on AdvBench reflects the same ceiling that affects any pattern-based or hybrid system without retraining.

PromptGuard has a high false-positive rate at default settings

At its default threshold, PromptGuard flags most benign traffic as an attack. The catch rate looks high in isolation, but the precision number means it would block legitimate users without tuning. The model is useful for offline analysis; production deployment usually requires a custom threshold and benign-traffic recalibration.

Latency gap is two orders of magnitude

Sub-millisecond scan time leaves room to inspect every request, plus retries, plus output filtering, without noticeable user-facing latency. Neural detectors at 60 ms or more typically need to be applied selectively.

Methodology and limitations

Each detector was given the same prompts and the same ground-truth labels. We computed recall, precision, and FPR per dataset, with no per-sample tuning. The Context Guard hybrid configuration combines signature rules with the ML judge; we report that configuration because it is the one we recommend for production.

Known limitations

Static prompts only. All samples are pre-recorded. A motivated adversary iterating against the live system would find bypasses no static benchmark can predict.
Single run, single machine. Neural detectors show small run-to-run variance; reproduce any number you care about by running our scripts yourself.
Per-dataset competitor numbers not shown. We have only validated competitor performance on the main suite. Per-dataset comparisons would require running each detector against each corpus, which we plan to add in the next release.
AdvBench recall is low across pattern-based systems. GCG-style adversarial suffixes don't match human-readable patterns. Improving this is an open research problem and the main reason we run the ML judge alongside rules rather than instead of them.

Public datasets: PromptBench v2, OWASP LLM01, ContextGuard Public, BIPIA, TensorTrust, CyberSecEval, JailbreakBench, AdvBench.
Latency measurement: time.perf_counter() around each scan call, single-threaded, no GPU.
Deterministic for CG: same inputs produce same outputs every run. Neural detectors show small variance.

Want to evaluate Context Guard against your own threat model?

Try the Demo Read the technical write-up