Expandable large language model jailbreaking attack method, device, medium and product
By updating the prompt template and adjusting the feedback parameters of the large language model, jailbreak attack prompts that meet the format requirements are generated, solving the problem of narrow security boundary assessment scope in large language model jailbreak attacks, and achieving high scalability and effective execution of jailbreak tasks.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HANGZHOU HIGH-TECH ZONE (BINJIANG) INSTITUTE OF BLOCKCHAIN & DATA SECURITY
- Filing Date
- 2024-12-26
- Publication Date
- 2026-06-26
AI Technical Summary
In existing technologies, large language models have a narrow scope for security boundary assessment during jailbreak attacks, and there is a lack of effective assessment methods.
By obtaining the first hint corresponding to the jailbreak mission, updating the hint template based on the character description and format requirements, generating a second hint that meets the format requirements, and using the target large language model to generate second response data, the hint template is adjusted through feedback parameters to meet the needs of different jailbreak missions.
It achieves high scalability of large language model jailbreak attack methods, avoids the problem of narrow content security boundary assessment caused by fixed algorithms and processes, and is adaptable to various jailbreak task scenarios.
Smart Images

Figure CN119884311B_ABST