In this post I am going to talk about N-grams, a concept found in Natural Language Processing ( aka NLP). First of all, let’s see what the term ‘N-gram’ means. Turns out that is the simplest bit, an N-gram is simply a sequence of N words. For instance, let us take a look at the following examples.
- San Francisco (is a 2-gram)
- The Three Musketeers (is a 3-gram)
- She stood up slowly (is a 4-gram)
Now which of these three N-grams have you seen quite frequently? Probably, “San Francisco” and “The Three Musketeers”. On the other hand, you might not have seen “She stood up slowly” that frequently. Basically, “She stood up slowly” is an example of an N-gram that does not occur as often in sentences as Examples 1 and 2.
Now if we assign a probability to the occurrence of an N-gram or the probability of a word occurring next in a sequence of words, it can be very useful. Why? Continue reading →