About Prachi Kumar

Prachi is a graduate student in Computer Science at the University of California, Los Angeles. Her interests lie in the fields of Data Mining, Machine Learning and Natural Language Processing.

Automated Spelling Correction – The Basics of How it Works

In this post, I am going to talk about automated spelling correction. Let’s say you are writing a document on your computer, and instead of typing “morning”, you accidentally type “mornig”. If you have automated spelling correction enabled, you will probably see that “mornig” has been transformed to “morning” on its own. How does this work? How does your computer know that when you typed “mornig”, you actually meant “morning”? We are going to see how in this post.

Spelling mistakes could turn out to be real words!

Before we actually go through how spelling correction works, let’s think about the complexity of this problem. In the previous example, “mornig” was not a real word, so we knew it had to be a spelling mistake. But what if you misspelled “college” as “collage”, or you misspelled “three” as “tree”? In these cases, the word you typed incorrectly happens to be an actual word itself! Correcting these types of errors is called real word spelling correction. On the other hand, if the error is not a real word (like “mornig” instead of “morning”), correcting those errors is called non-word spelling correction. You can see that real world spelling correction seems more difficult than non-word spelling correction because every word that you type could be an error (even if it has a correct spelling). For example, the sentence “The tree threes were tail” makes no sense because every word except “the” and “were” is an error even though they are all actual words. The actual sentence should be “The three trees were tall”. In this post, I am going to talk about non-word spelling correction with a basic approach to it.

Continue reading

What Are Precision and Recall and Why Are They Needed in Search Engines?

In this post, I am going to talk about precision and recall and their importance in information retrieval. First of all, let’s talk about what we mean by information retrieval. Suppose you wake up one morning and decide you want to make muffins for breakfast. You take out your laptop and search for “healthy muffin recipe” on Google. Then, you go through the search results, decide on a recipe and get started on it. This is an example of information retrieval where the search engine (Google in this case) retrieved the results for your search query “healthy muffin recipe”. 

Continue reading

An Introduction to N-grams: What Are They and Why Do We Need Them?

In this post I am going to talk about N-grams, a concept found in Natural Language Processing ( aka NLP). First of all, let’s see what the term ‘N-gram’ means. Turns out that is the simplest bit, an N-gram is simply a sequence of N words. For instance, let us take a look at the following examples.

  1. San Francisco (is a 2-gram)
  2. The Three Musketeers (is a 3-gram)
  3. She stood up slowly (is a 4-gram)

Now which of these three N-grams have you seen quite frequently? Probably, “San Francisco” and “The Three Musketeers”. On the other hand, you might not have seen “She stood up slowly” that frequently. Basically, “She stood up slowly” is an example of an N-gram that does not occur as often in sentences as Examples 1 and 2.

Now if we assign a probability to the occurrence of an N-gram or the probability of a word occurring next in a sequence of words, it can be very useful. Why? Continue reading

The Power of WordNet and How to Use It in Python

In this post, I am going to talk about the relations in WordNet (https://wordnet.princeton.edu) and how you can use these in a Python project. WordNet is a database of English words with different relations between the words.

Take a look at the next four sentences.

  1.  “She went home and had pasta.”
  2. “Then she cleaned the kitchen and sat on the sofa.”
  3. “A little while later, she got up from the couch.”
  4. “She walked to her bed and in a few minutes she was snoring loudly.”

In Natural Language Processing, we try to use computer programs to find the meaning of sentences. In the above four sentences, with the help of WordNet, a computer program will be able to identify the following –

  1. “pasta” is a type of dish.
  2. “kitchen” is a part of “home”.
  3. “sofa” is the same thing as “couch”.
  4. “snoring” implies “sleeping”.

Let’s get started with using WordNet in Python. It is included as a part of the NLTK (http://www.nltk.org/) corpus. To use it, we need to import it first.

>>> from nltk.corpus import wordnet as wn

Continue reading

Build your own Natural Language Processing based Intelligent Assistant using Python, It’s easy!

Before we begin, let us talk about how Mike (a fictional character) spends a typical morning. Mike begins his day by searching for breakfast recipes on Google Now (https://en.wikipedia.org/wiki/Google_Now). After a filling breakfast, Mike starts getting ready for work. He asks Siri (http://www.apple.com/in/ios/siri/) to tell him the weather and traffic conditions for his drive to work. Finally, as Mike gets ready to leave the house, he asks Alexa (https://en.wikipedia.org/wiki/Amazon_Alexa) to dim the lights and thermostat. It is not even 10 a.m. yet, but Mike like many of us has already used three intelligent personal assistant applications using Natural Language Processing (NLP). We will unravel the mysteries of building intelligent personal assistants with a simple example to build such an assistant quite easily using NLP.

Continue reading