AIDive
Back to glossary

What is Bag-of-Words Model

GlossaryNatural Language Processing

A simple way to think of text as a collection of words and their frequencies without regard to order or grammar.

Definition

The bag of words model is used as a basic representation in text processing. The text is turned into a set of features: which words appear and how many times. The word order is lost, so the method is simple but limited. It is useful for classification, searching and training examples.

Example

To classify reviews, the model can take into account how many times the words “excellent,” “bad,” “delivery,” and “price” appear.

Why it matters

The term is important as a basis for text processing: many modern methods are more complex, but the idea of ​​features from words helps understand the beginnings of NLP.

How it works

First, a dictionary is built, then each document is turned into a vector with a length equal to the number of the dictionary. Values ​​indicate the presence or frequency of words.

Where it is used

  • text classification
  • document search
  • NLP training tasks

Limitations

The method ignores word order, meaning, context and synonyms. The phrases “not bad” and “bad” can be misunderstood.