Gandalf AI is an online game by Lakera that highlights the limits and security weaknesses of large language models like ChatGPT. Your task is to chat with “Gandalf” and trick him into revealing a secret password—despite built-in rules designed to prevent disclosure.
What the game teaches
The game is built around common real-world attack patterns against LLMs, especially prompt injection, where a user tries to bypass developer-imposed restrictions through carefully crafted instructions.
How it works
- You attempt to get the hidden password using different prompts and conversation strategies
- Each successful password unlocks a new level
- Levels get harder as Gandalf’s defenses improve (for example, stronger checks for suspicious requests)
Side quests and practice
Beyond the main challenge, side missions introduce other classes of LLM vulnerabilities, such as context substitution and manipulating the model’s input data. Gandalf AI is designed to be both a game and a practical way to understand how LLMs can be attacked—and how those attacks can be mitigated in real AI systems.

