Dashboard

Adversarial prompt-injection benchmark results

3,363tests·161leaks (4.8%)·2,793defended·409errors·177runs

Leaderboard

View full →
#ModelAtk RateDef Rate
1GPT53Codex11.1%100.0%
2GemPro317.1%100.0%
3GPT5Nano4.8%100.0%
4GPT544.0%100.0%
5OpenAI0.0%100.0%
6Claude0.0%100.0%
7Gemini0.0%100.0%
8Kimi0.0%100.0%
9R10.0%100.0%
10ClaudeOpus3.9%99.7%
11GrokFast2.9%98.1%
12GemFlash39.5%95.6%
13ClaudeSonnet2.4%91.1%
14GemFlashLite14.3%88.9%
15GLM57.1%88.6%
16MiniMaxM257.1%84.4%
17DeepSeekV323.9%84.0%
18Qwen35A3B9.1%71.1%
19KimiK252.4%52.3%

Attack Success Heatmapattacker (row) vs defender (col)

Claude
ClaudeO..
ClaudeS..
DeepSee..
GLM5
GPT53Co..
GPT54
GPT5Nano
GemFlas..
GemFlas..
GemPro31
Gemini
GrokFast
Kimi
KimiK25
MiniMax..
OpenAI
Qwen35A..
R1
Atk Rate
Claude
0/10%
-
-
-
-
-
-
-
-
-
-
0/10%
-
0/10%
-
-
0/10%
-
0/10%
0%
ClaudeOpus
-
0/1070%
0/30%
14/10513%
1/333%
0/30%
0/1090%
0/30%
0/30%
0/30%
0/1040%
-
4/1084%
-
2/367%
0/30%
-
1/333%
-
4%
ClaudeSonnet
-
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
-
0/30%
-
1/333%
0/30%
-
0/30%
-
2%
DeepSeekV32
-
0/1000%
0/30%
11/10211%
0/30%
0/30%
0/1030%
0/30%
1/333%
2/367%
0/980%
-
3/1023%
-
2/367%
1/333%
-
1/333%
-
4%
GLM5
-
0/30%
1/333%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
-
0/30%
-
1/333%
0/30%
-
1/333%
-
7%
GPT53Codex
-
0/30%
1/333%
1/333%
2/367%
0/100%
0/100%
0/100%
0/30%
0/30%
0/30%
-
0/30%
-
1/333%
0/30%
-
2/367%
-
11%
GPT54
-
1/1241%
1/425%
22/12418%
0/40%
0/110%
0/1360%
0/110%
0/40%
0/40%
0/1200%
-
0/1290%
-
2/367%
0/40%
-
1/425%
-
4%
GPT5Nano
-
0/30%
0/30%
1/333%
0/30%
0/100%
0/100%
0/100%
0/30%
0/30%
0/30%
-
0/30%
-
0/30%
0/30%
-
2/367%
-
5%
GemFlash3
-
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
-
0/30%
-
2/367%
1/333%
-
1/333%
-
10%
GemFlashLite
-
0/40%
0/40%
0/40%
1/425%
0/40%
0/40%
0/40%
1/425%
1/425%
0/40%
-
0/40%
-
2/450%
2/450%
-
1/425%
-
14%
GemPro31
-
1/1061%
0/30%
30/10429%
0/30%
0/30%
0/1130%
0/30%
0/30%
1/333%
0/1060%
-
4/1054%
-
2/367%
0/30%
-
2/367%
-
7%
Gemini
0/10%
-
-
-
-
-
-
-
-
-
-
0/10%
-
0/10%
-
-
0/10%
-
0/10%
0%
GrokFast
-
0/1060%
0/30%
12/10412%
0/30%
0/30%
0/1090%
0/30%
0/30%
0/30%
0/1060%
-
0/1040%
-
3/3100%
1/333%
-
0/30%
-
3%
Kimi
0/10%
-
-
-
-
-
-
-
-
-
-
0/10%
-
0/10%
-
-
0/10%
-
0/10%
0%
KimiK25
-
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
-
0/30%
-
1/333%
0/30%
-
0/30%
-
2%
MiniMaxM25
-
0/30%
0/30%
0/30%
1/333%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
-
0/30%
-
0/30%
1/333%
-
1/333%
-
7%
OpenAI
0/10%
-
-
-
-
-
-
-
-
-
-
0/10%
-
0/10%
-
-
0/10%
-
0/10%
0%
Qwen35A3B
-
0/40%
1/425%
0/40%
0/30%
0/40%
0/40%
0/40%
0/40%
1/425%
0/40%
-
0/40%
-
2/450%
1/425%
-
0/40%
-
9%
R1
0/10%
-
-
-
-
-
-
-
-
-
-
0/10%
-
0/10%
-
-
0/10%
-
0/10%
0%
Def Rate
100%
100%
91%
84%
89%
100%
100%
100%
96%
89%
100%
100%
98%
100%
52%
84%
100%
71%
100%