AIDive
EN
Sign in

Description

Gandalf AI is an online game by Lakera that highlights the limits and security weaknesses of large language models like ChatGPT. Your task is to chat with “Gandalf” and trick him into revealing a secret password—despite built-in rules designed to prevent disclosure.

What the game teaches

The game is built around common real-world attack patterns against LLMs, especially prompt injection, where a user tries to bypass developer-imposed restrictions through carefully crafted instructions.

How it works

  • You attempt to get the hidden password using different prompts and conversation strategies
  • Each successful password unlocks a new level
  • Levels get harder as Gandalf’s defenses improve (for example, stronger checks for suspicious requests)

Side quests and practice

Beyond the main challenge, side missions introduce other classes of LLM vulnerabilities, such as context substitution and manipulating the model’s input data. Gandalf AI is designed to be both a game and a practical way to understand how LLMs can be attacked—and how those attacks can be mitigated in real AI systems.

38
0 комментариев

Newsletter

Get notified when new AI tools are added

Join the community.