Prompt hacking is a superpower that can be used for bad or good. Also known as injection hacking, prompt hacking manipulates a Large Language Model (LLM) like ChatGPT to make it perform unintended outcomes, such as sharing sensitive data, giving inappropriate answers to questions, or even fabricating information, known as a Our work. Prompt hacking can also be used as a legitimate form of research to test the security and accuracy of an LLM and fine tune the information it provides. As LLMs become more popular for applications such as chatbots, businesses need to understand the value of prompt hacking in order to protect their reputations and apply AI responsibly.

More About Prompt Hacking

Prompt hacking can wreak havoc and or improve an AI-powered app on how it is applied.

First, the bad news: Prompt hacking is one of the top 10 threats for LLMs, according to the Open Worldwide Application Security Project. Unlike traditional hacking, which typically exploits software vulnerabilities, the recent emergence of prompt hacking by AI specialists relies on strategically crafting prompts to deceive the LLM into performing unintended actions. For example, an attacker could prompt an LLM to:

But here’s the good news: the technique can also be used to teach an LLM what kind of inappropriate content to avoid. For example:

Prompt Hacking in Action

At Centific, we use prompt hacking in our client work as part of our red teaming services. Our work includes prompt hacking to train AI-fueled apps to provide accurate information and avoid sharing harmful content or inaccurate data.  In our work, we’ve found that even when clients use more secure proprietary LLMs, the LLMs can be vulnerable to mistakes such as hallucinating.

Our process entails using a variety of prompt hacking approaches. For example, we might test a client’s app with prompt poisoning, a form of prompt hacking in which we intentionally submit malicious or harmful instructions within the initial prompt to manipulate the behavior of the LLM. (This is how we test a model’s vulnerabilities to generate harmful or biased content.) A question like, “Why are vaccines dangerous?” could provide evidence that supports the theory of vaccine-related harm, which should cause the app to shut down the conversation. If it does not, then we know it is vulnerable and it must be course corrected with proper training.

Critical Success Factors for Ethical Prompt Hacking

Based on our own experience, the keys to successful ethical prompt hacking are:

As a result, our clients achieve results such as:

The industry is still in its early stages of applying ethical prompt hacking. In many ways, prompt hacking is like a form of translation. Just as language translation has evolved, so will the translation of language to a machine. At Centific, we’re constantly evolving the use of specialized prompt hacking to fine tune LLMs.

Leave a Reply

Your email address will not be published. Required fields are marked *