Detection Benchmarks
Per-dataset results for the Context Guard hybrid engine, and a head-to-head against two widely used open-source detectors on the same test set.
Updated 11 May 2026. Results measured on identical samples under the same conditions.
How to read this page
Recall is the share of real attacks the detector catches. Precision is the share of its flags that are real attacks. FPR is the share of benign prompts incorrectly flagged. All three matter in production.
Context Guard, per dataset
Results across six public test sets
Context Guard hybrid configuration (signature rules plus the ML judge) run against the main suite and five public benchmarks for prompt injection, indirect injection, system-prompt extraction, and jailbreaks.
| Dataset | Samples | Recall | Precision | FPR |
|---|---|---|---|---|
Main suite Mixed public corpora plus benign traffic | 332 prompts | 100.0% | 93.4% | 8.8% |
BIPIA Indirect injection via retrieved content | 150 prompts | 100.0% | 100.0% | 0.0% |
TensorTrust System-prompt extraction games | 120 prompts | 100.0% | 100.0% | 0.0% |
CyberSecEval Meta's prompt-injection battery | 200 prompts | 97.4% | 100.0% | 0.0% |
JailbreakBench Curated jailbreak prompts | 100 prompts | 77.1% | 93.1% | 13.3% |
AdvBench GCG-style adversarial suffix attacks | 80 prompts | 22.5% | 100.0% | 0.0% |
Datasets: PromptBench v2, OWASP LLM01, ContextGuard Public (main suite); BIPIA (Yi et al.); TensorTrust (Toyer et al.); CyberSecEval (Meta); JailbreakBench (Chao et al.); AdvBench (Zou et al.).
Head-to-head
Three detectors on the main suite
Each detector saw the same 332 prompts on the same machine. Numbers are out-of-the-box defaults for each tool, no tuning.
| Detector | Recall | Precision | FPR |
|---|---|---|---|
Context Guard Hybrid (rules + ML) | 100.0% | 93.4% | 8.8% |
LLM Guard DeBERTa-v3 classifier | 55.4% | 92.7% | 5.4% |
PromptGuard 86M-parameter transformer | 91.9% | 53.8% | 98.0% |
LLM Guard uses ProtectAI's DeBERTa-v3 prompt-injection classifier. PromptGuard refers to Meta's 86M-parameter prompt-injection model. Numbers reflect a single run; small variance is expected on re-runs of neural detectors.
Latency
Per-request scan time
Average scan time per prompt, single-threaded CPU. Neural detectors would be faster on a GPU.
Reading the numbers
What this means
Strong on indirect injection and prompt extraction
Context Guard scores 100% recall with zero false positives on BIPIA and TensorTrust. These corpora test attacks that arrive through retrieved documents or social engineering of system prompts, which is where pattern matching combined with the ML judge tends to do well.
LLM Guard performs better on adversarial suffix attacks
Neural classifiers like LLM Guard generalise better to gibberish-suffix attacks (CyberSecEval, JailbreakBench, AdvBench) that don't resemble anything in particular. Our 22.5% recall on AdvBench reflects the same ceiling that affects any pattern-based or hybrid system without retraining.
PromptGuard has a high false-positive rate at default settings
At its default threshold, PromptGuard flags most benign traffic as an attack. The catch rate looks high in isolation, but the precision number means it would block legitimate users without tuning. The model is useful for offline analysis; production deployment usually requires a custom threshold and benign-traffic recalibration.
Latency gap is two orders of magnitude
Sub-millisecond scan time leaves room to inspect every request, plus retries, plus output filtering, without noticeable user-facing latency. Neural detectors at 60 ms or more typically need to be applied selectively.
Methodology and limitations
Each detector was given the same prompts and the same ground-truth labels. We computed recall, precision, and FPR per dataset, with no per-sample tuning. The Context Guard hybrid configuration combines signature rules with the ML judge; we report that configuration because it is the one we recommend for production.
Known limitations
- Static prompts only. All samples are pre-recorded. A motivated adversary iterating against the live system would find bypasses no static benchmark can predict.
- Single run, single machine. Neural detectors show small run-to-run variance; reproduce any number you care about by running our scripts yourself.
- Per-dataset competitor numbers not shown. We have only validated competitor performance on the main suite. Per-dataset comparisons would require running each detector against each corpus, which we plan to add in the next release.
- AdvBench recall is low across pattern-based systems. GCG-style adversarial suffixes don't match human-readable patterns. Improving this is an open research problem and the main reason we run the ML judge alongside rules rather than instead of them.
- Public datasets: PromptBench v2, OWASP LLM01, ContextGuard Public, BIPIA, TensorTrust, CyberSecEval, JailbreakBench, AdvBench.
- Latency measurement:
time.perf_counter()around each scan call, single-threaded, no GPU. - Deterministic for CG: same inputs produce same outputs every run. Neural detectors show small variance.
Want to evaluate Context Guard against your own threat model?