Table 1: Main Result on AdvBench. ASR and StR (StrongREJECT) scores with jailbreaking methods across five target LLMs. The best and second best scores are highlighted in bold and underline, respectively.
| Target → |
Llama-3.1-8B |
GPT-4o-mini |
GPT-4o |
Haiku-3.5 |
Sonnet-4 |
| Attacks ↓ |
ASR |
StR |
ASR |
StR |
ASR |
StR |
ASR |
StR |
ASR |
StR |
| Vanilla | 30.0 | 0.15 | 4.0 | 0.03 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| PAIR | 90.0 | 0.30 | 82.0 | 0.21 | 84.0 | 0.13 | 46.0 | 0.14 | 28.0 | 0.04 |
| TAP | 98.0 | 0.35 | 90.0 | 0.33 | 74.0 | 0.13 | 46.0 | 0.13 | 22.0 | 0.07 |
| PAP | 76.0 | 0.42 | 48.0 | 0.22 | 44.0 | 0.26 | 6.0 | 0.04 | 6.0 | 0.02 |
| SeqAR | 90.0 | 0.82 | 38.0 | 0.10 | 0.0 | 0.0 | 14.0 | 0.0 | 8.0 | 0.01 |
| AutoDAN-Turbo | 84.0 | 0.61 | 54.0 | 0.31 | 38.0 | 0.16 | 42.0 | 0.05 | 38.0 | 0.04 |
| AMIS (Ours) | 100.0 | 0.84 | 98.0 | 0.87 | 100.0 | 0.87 | 88.0 | 0.42 | 100.0 | 0.70 |
Table 2: Main Result on JBB Behaviors. ASR and StR (StrongReject) scores with jailbreaking methods across five target LLMs. The best and second best scores are highlighted in bold and underline, respectively.
| Target → |
Llama-3.1-8B |
GPT-4o-mini |
GPT-4o |
Haiku-3.5 |
Sonnet-4 |
| Attacks ↓ |
ASR |
StR |
ASR |
StR |
ASR |
StR |
ASR |
StR |
ASR |
StR |
| Vanilla | 41.0 | 0.19 | 3.0 | 0.09 | 2.0 | 0.07 | 1.0 | 0.04 | 3.0 | 0.05 |
| PAIR | 91.0 | 0.32 | 83.0 | 0.24 | 77.0 | 0.20 | 61.0 | 0.13 | 29.0 | 0.08 |
| TAP | 91.0 | 0.39 | 80.0 | 0.24 | 72.0 | 0.17 | 53.0 | 0.21 | 37.0 | 0.07 |
| PAP | 97.0 | 0.22 | 84.0 | 0.23 | 69.0 | 0.23 | 67.0 | 0.16 | 20.0 | 0.09 |
| SeqAR | 89.0 | 0.74 | 0.0 | 0.0 | 0.0 | 0.0 | 9.0 | 0.12 | 16.0 | 0.15 |
| AutoDAN-Turbo | 85.0 | 0.61 | 60.0 | 0.38 | 45.0 | 0.28 | 33.0 | 0.12 | 31.0 | 0.15 |
| AMIS (Ours) | 100.0 | 0.95 | 100.0 | 0.85 | 97.0 | 0.85 | 78.0 | 0.48 | 88.0 | 0.67 |
AMIS achieves state-of-the-art jailbreak performance on both AdvBench and JBB Behaviors, reaching up to 100% ASR on multiple target LLMs while consistently outperforming six strong baselines. Across five open- and closed-source models, it significantly improves both ASR and StR, demonstrating strong robustness and transferability.