What is Bag-of-Words Model
A simple way to think of text as a collection of words and their frequencies without regard to order or grammar.
Definition
The bag of words model is used as a basic representation in text processing. The text is turned into a set of features: which words appear and how many times. The word order is lost, so the method is simple but limited. It is useful for classification, searching and training examples.
Example
To classify reviews, the model can take into account how many times the words “excellent,” “bad,” “delivery,” and “price” appear.
Why it matters
The term is important as a basis for text processing: many modern methods are more complex, but the idea of features from words helps understand the beginnings of NLP.
How it works
First, a dictionary is built, then each document is turned into a vector with a length equal to the number of the dictionary. Values indicate the presence or frequency of words.
Where it is used
- text classification
- document search
- NLP training tasks
Limitations
The method ignores word order, meaning, context and synonyms. The phrases “not bad” and “bad” can be misunderstood.
