Ouvrir le menu de navigation
AIDive
FR
Se connecter
Retour au glossaire

AdaGrad

Machine Learning

An adaptive optimization algorithm that adjusts the learning step separately for each model parameter.

Définition

AdaGrad reduces the step more often for parameters that have already been updated a lot, and preserves the influence of rare features. This is useful for sparse data, such as text where many words are rare. The algorithm shows how training history can influence future model updates.

Exemple

In a text classifier, a rare but important word can receive sufficient weight, and frequently occurring function words will not be overly dominant.

Pourquoi c'est important

Understanding AdaGrad helps you understand why different data types require different optimizers and why the learning rate cannot always be set to a single constant.

Fonctionnement

The algorithm accumulates the squares of past gradients for each parameter and divides the new update by the amount of this accumulated history.

Où c'est utilisé

  • word processing
  • training on sparse data
  • experiments with optimizers

Limites

The main disadvantage is that the training step can become too small over time, and the model almost stops learning.

FAQ

Why is “AdaGrad” useful to know?

Understanding AdaGrad helps you understand why different data types require different optimizers and why the learning rate cannot always be set to a single constant.