Signed-Prompt
Robust defense against prompt injection. Signatures for prompts using rare character combos ➡️*️⃣⬇️↘️2️⃣
Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications
Xuchen Suo, Department of Electrical and Electronic Engineering, The Hong Kong Polytechnic University | Link to Paper
This paper addresses the critical challenge of prompt injection attacks in Large Language Models (LLMs) integrated applications, which has become a growing concern in the AI field. Traditional defense strategies like output and input filtering, and delimiter use have been inadequate. The paper introduces the 'Signed-Prompt' method as a novel solution, involving signing sensitive instructions within command segments by authorized users, allowing the LLM to recognize trusted instruction sources. The study presents a comprehensive analysis of prompt injection attack patterns and explains the Signed-Prompt concept, its basic architecture, implementation through prompt engineering, and fine-tuning of LLMs. Experiments show the effectiveness of Signed-Prompt in resisting various types of prompt injection attacks, validating it as a robust defense strategy.
The basic concept of Signed-Prompt is to sign the instructions: replacing the original instructions with combinations of characters that rarely appear in natural language.
Why this matters
The Signed-Prompt method is significant as it addresses a major security vulnerability in LLM-integrated applications - the inability of LLMs to verify the trustworthiness of instruction sources, particularly in the context of prompt injection attacks. This novel approach enhances the security of these applications by ensuring that they can discern between authorized and malicious instructions.
Methodology
The study involves a detailed analysis of prompt injection attack patterns, followed by an explanation of the Signed-Prompt concept. The methodology includes the development of an encoder for signing user instructions and adjusting the LLM to understand these signed instructions. Two approaches were used for implementing this method: prompt engineering based on ChatGPT-4 and fine-tuning based on ChatGLM-6B. The performance of the Signed-Prompt method was tested against various types of prompt injection attacks, demonstrating its effectiveness and robustness.
Key Findings
- Signed-Prompt Methodology: Introduces a novel approach where sensitive instructions are signed by authorized users, enabling the LLM to identify trusted sources.
- Encoder Development: An encoder was developed to sign user instructions, effectively differentiating between authorized and unauthorized commands.
- Adjustment of LLMs: LLMs were adjusted to understand signed instructions, distinguishing them from unsigned ones.
- Implementation Approaches: Utilized prompt engineering based on ChatGPT-4 and fine-tuning based on ChatGLM-6B.
- Effective Defense Against Attacks: Demonstrated substantial resistance to various types of prompt injection attacks.
- Enhanced Security: Provides a robust defense strategy, significantly improving the security of LLM-integrated applications.
Applying Signed-Prompt Techniques
To apply the Signed-Prompt techniques consider the following steps:
1. Identify Sensitive Instructions: Determine which parts of your prompt could be considered sensitive or potentially harmful if manipulated.
2. Develop an Encoder: Implement or use an existing encoder to sign these sensitive instructions. This could involve replacing specific terms or commands with unique identifiers or codes.
3. Adjust Your LLM: Ensure that your LLM is capable of understanding these signed instructions. This might involve fine-tuning the model or using prompt engineering techniques.
4. Test and Validate: Before deploying, test the system to ensure that it correctly differentiates between signed and unsigned instructions and resists prompt injection attacks.