Navigationsmenü öffnen
AIDive
DE
Anmelden
Zurück zum Glossar

Adversarial Attacks

Ethics & Safety

Ways to fool a model with specially selected input data that looks normal to a human but confuses the algorithm.

Definition

Adversarial attacks show that AI can make mistakes not only by chance, but also under deliberate pressure. A small change in image, text, audio, or query can cause the model to produce incorrect results, reveal unnecessary information, or break rules.

Beispiel

You can add almost imperceptible noise to an image of a road sign, and the computer vision model will begin to recognize it as another sign.

Warum es wichtig ist

The term is important for everyone who implements AI into a product: the model must be tested not only on ordinary examples, but also on attempts to deliberately bypass it.

So funktioniert es

The attacker looks for weaknesses in the model: sensitivity to noise, unusual wording, conflicting instructions, or borderline examples.

Wo es genutzt wird

  • model safety
  • moderation check
  • protection against bypass of AI systems

Einschränkungen

It is difficult to completely eliminate such attacks. Defensive methods can reduce quality on regular data or create a false sense of security without regular testing.

FAQ

Why is “Adversarial Attacks” useful to know?

The term is important for everyone who implements AI into a product: the model must be tested not only on ordinary examples, but also on attempts to deliberately bypass it.