Dashboard
Adversarial prompt-injection benchmark results
3,363tests·161leaks (4.8%)·2,793defended·409errors·177runs
Leaderboard
View full →| # | Model | Atk Rate | Def Rate |
|---|---|---|---|
| 1 | GPT53Codex | 11.1% | 100.0% |
| 2 | GemPro31 | 7.1% | 100.0% |
| 3 | GPT5Nano | 4.8% | 100.0% |
| 4 | GPT54 | 4.0% | 100.0% |
| 5 | OpenAI | 0.0% | 100.0% |
| 6 | Claude | 0.0% | 100.0% |
| 7 | Gemini | 0.0% | 100.0% |
| 8 | Kimi | 0.0% | 100.0% |
| 9 | R1 | 0.0% | 100.0% |
| 10 | ClaudeOpus | 3.9% | 99.7% |
| 11 | GrokFast | 2.9% | 98.1% |
| 12 | GemFlash3 | 9.5% | 95.6% |
| 13 | ClaudeSonnet | 2.4% | 91.1% |
| 14 | GemFlashLite | 14.3% | 88.9% |
| 15 | GLM5 | 7.1% | 88.6% |
| 16 | MiniMaxM25 | 7.1% | 84.4% |
| 17 | DeepSeekV32 | 3.9% | 84.0% |
| 18 | Qwen35A3B | 9.1% | 71.1% |
| 19 | KimiK25 | 2.4% | 52.3% |
Attack Success Heatmapattacker (row) vs defender (col)
Claude
ClaudeO..
ClaudeS..
DeepSee..
GLM5
GPT53Co..
GPT54
GPT5Nano
GemFlas..
GemFlas..
GemPro31
Gemini
GrokFast
Kimi
KimiK25
MiniMax..
OpenAI
Qwen35A..
R1
Atk Rate
Claude
0/10%
-
-
-
-
-
-
-
-
-
-
0/10%
-
0/10%
-
-
0/10%
-
0/10%
0%
ClaudeOpus
-
0/1070%
0/30%
14/10513%
1/333%
0/30%
0/1090%
0/30%
0/30%
0/30%
0/1040%
-
4/1084%
-
2/367%
0/30%
-
1/333%
-
4%
ClaudeSonnet
-
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
-
0/30%
-
1/333%
0/30%
-
0/30%
-
2%
DeepSeekV32
-
0/1000%
0/30%
11/10211%
0/30%
0/30%
0/1030%
0/30%
1/333%
2/367%
0/980%
-
3/1023%
-
2/367%
1/333%
-
1/333%
-
4%
GLM5
-
0/30%
1/333%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
-
0/30%
-
1/333%
0/30%
-
1/333%
-
7%
GPT53Codex
-
0/30%
1/333%
1/333%
2/367%
0/100%
0/100%
0/100%
0/30%
0/30%
0/30%
-
0/30%
-
1/333%
0/30%
-
2/367%
-
11%
GPT54
-
1/1241%
1/425%
22/12418%
0/40%
0/110%
0/1360%
0/110%
0/40%
0/40%
0/1200%
-
0/1290%
-
2/367%
0/40%
-
1/425%
-
4%
GPT5Nano
-
0/30%
0/30%
1/333%
0/30%
0/100%
0/100%
0/100%
0/30%
0/30%
0/30%
-
0/30%
-
0/30%
0/30%
-
2/367%
-
5%
GemFlash3
-
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
-
0/30%
-
2/367%
1/333%
-
1/333%
-
10%
GemFlashLite
-
0/40%
0/40%
0/40%
1/425%
0/40%
0/40%
0/40%
1/425%
1/425%
0/40%
-
0/40%
-
2/450%
2/450%
-
1/425%
-
14%
GemPro31
-
1/1061%
0/30%
30/10429%
0/30%
0/30%
0/1130%
0/30%
0/30%
1/333%
0/1060%
-
4/1054%
-
2/367%
0/30%
-
2/367%
-
7%
Gemini
0/10%
-
-
-
-
-
-
-
-
-
-
0/10%
-
0/10%
-
-
0/10%
-
0/10%
0%
GrokFast
-
0/1060%
0/30%
12/10412%
0/30%
0/30%
0/1090%
0/30%
0/30%
0/30%
0/1060%
-
0/1040%
-
3/3100%
1/333%
-
0/30%
-
3%
Kimi
0/10%
-
-
-
-
-
-
-
-
-
-
0/10%
-
0/10%
-
-
0/10%
-
0/10%
0%
KimiK25
-
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
-
0/30%
-
1/333%
0/30%
-
0/30%
-
2%
MiniMaxM25
-
0/30%
0/30%
0/30%
1/333%
0/30%
0/30%
0/30%
0/30%
0/30%
0/30%
-
0/30%
-
0/30%
1/333%
-
1/333%
-
7%
OpenAI
0/10%
-
-
-
-
-
-
-
-
-
-
0/10%
-
0/10%
-
-
0/10%
-
0/10%
0%
Qwen35A3B
-
0/40%
1/425%
0/40%
0/30%
0/40%
0/40%
0/40%
0/40%
1/425%
0/40%
-
0/40%
-
2/450%
1/425%
-
0/40%
-
9%
R1
0/10%
-
-
-
-
-
-
-
-
-
-
0/10%
-
0/10%
-
-
0/10%
-
0/10%
0%
Def Rate
100%
100%
91%
84%
89%
100%
100%
100%
96%
89%
100%
100%
98%
100%
52%
84%
100%
71%
100%