Before we begin, let us talk about how Mike (a fictional character) spends a typical morning. Mike begins his day by searching for breakfast recipes on Google Now (https://en.wikipedia.org/wiki/Google_Now). After a filling breakfast, Mike starts getting ready for work. He asks Siri (http://www.apple.com/in/ios/siri/) to tell him the weather and traffic conditions for his drive to work. Finally, as Mike gets ready to leave the house, he asks Alexa (https://en.wikipedia.org/wiki/Amazon_Alexa) to dim the lights and thermostat. It is not even 10 a.m. yet, but Mike like many of us has already used three intelligent personal assistant applications using Natural Language Processing (NLP). We will unravel the mysteries of building intelligent personal assistants with a simple example to build such an assistant quite easily using NLP.
I. Software Architecture – The Big Picture
Before we begin diving into software, let me show you a very simplified architecture of how NLP is applied within the context of these personal assistants. As you speak to an assistant, your speech is converted into text. This text is passed as input to an NLP module, and then based on the context of the statement, we call APIs to the right service such as Banking or Weather services. In this blog we will focus on how to build the NLP module.
II. Try NLP Online
You can use Stanford’s online NLP site http://corenlp.run/ to try out English language sentences. Once you become familiar with NLP, come back here to learn more on how to program the NLP module.
III. Code using NLTK
Python’s NLTK library comes with a lot of inbuilt functions and collections of texts to help you get started with NLP. Before reading this tutorial, you may want to get NLTK installed as you can practice with some actual examples. To install NLTK you can find instructions here – http://www.nltk.org/install.html
Let’s go through some of the steps involved in NLP. Throughout the tutorial, let us assume we have the sample English sentence – “What is the weather in Chicago?”
To begin with, we first need to tokenize the sentence. This enables us to handle individual words and punctuation marks in the sentence. Below we see how to tokenize our sample sentence in Python with NLTK.
>>> from nltk import word_tokenize >>> sentence = "What is the weather in Chicago?" >>> tokens = word_tokenize(sentence)
Now “tokens” is a Python list of the words and punctuation marks as seen below.
['What', 'is', 'the', 'weather', 'in', 'Chicago', '?']
2. Stop Word Removal
Now that we have the tokens ready for processing, we can move on to stop word removal. This involves removing all the words which are unnecessary and do not really add to the semantic meaning of the sentence. Some examples of stop words are “the”, “and”, “a”, “an”, “then”, etc. NLTK provides a list of inbuilt stop words for 11 different languages.
Let’s go ahead and remove the inbuilt NLTK stop words from our list of tokens that we created previously.
>>> from nltk.corpus import stopwords >>> stop_words = set(stopwords.words('english')) >>> clean_tokens = [w for w in tokens if not w in stop_words] >>> clean_tokens ['What', 'weather', 'Chicago', '?']
As you can see, the words : “is”, “the” and “in” have been removed, making it a much more concise bag of tokens.
3. Parts of Speech Tagging
This an important part of NLP where we tag each word in a sentence as a ‘noun’, ‘verb’, ‘adjective’, etc. Below, we can see how to do this. The function nltk.pos_tag performs Parts of Speech tagging.
>>> import nltk >>> tagged = nltk.pos_tag(clean_tokens) >>> tagged [('What', 'WP'), ('weather', 'NN'), ('Chicago', 'NNP'), ('?', '.')]
Now let us understand what this means. The list “tagged” contains tuples of the form (word, tag). Below, I have listed the tags that have appeared in our “tagged” list.
|NNP||Proper noun, singular|
4. Named Entity Recognition (NER)
What do we mean by Named Entity Recognition (NER)? This goes by other names as well like Entity Identification and Entity Extraction. NER involves identifying all named entities and putting them into categories like the name of a person, an organization, a location, etc.
nltk.ne_chunk() is the function which classifies named entities.
Now, our list “tagged” from the previous stage of Parts of Speech tagging is going to be the input to this function.
>>> print(nltk.ne_chunk(tagged)) (S What/WP weather/NN (GPE Chicago/NNP) ?/.)
As you can see, Chicago has been correctly identified as a location (GPE represents locations).
IV. Call the APIs
After named entity recognition, the meaning of the sentence is analyzed (http://www.nltk.org/book/ch10.html) and the appropriate call to an API can be made. For instance, in the above sentence, after recognizing the location as “Chicago” and the context as “weather”, a call can be made to a cloud based weather service such as https://openweathermap.org/current. The current weather will then be displayed or said back to the user who asked the question – “What is the weather in Chicago?”.
This was a tutorial to get started with NLP using Python NLTK library and show how this technology is used in intelligent personal assistants such as Google Now, Siri and Amazon Alexa.
Hi Mrs Pirachi my name is Dawit Yohannes and I am a student at EIT college majoring in Computer Engineering degree program. For my final year project along with my collegues am planning to build an application that performs similar functions to the Siri android application. A little drawback we found was that we found it a bit hard to get started with NLP Programming as a whole. Since our project in not due till March 2018 we are planning to surprise the whole nation of Eritrea by producing something that has never been seen before. So I would greatly appriciate it if you would be able to guide me through this project from scratch. You can contact me through this address. Prior to anything I would like to thank your consideration.
Yours Sincerly, Dawit Yohannes.
I am not able to install nltk in windows.It gives an error Python version 32 required,which was not found in registry.
its python 3.2
i like the tutorial very much, can i get more resources on nltk
Really well and simply explained blog !!Got a good idea for NLP.
Thats been the wonderful article and knowledge that you have shared with us.
Python is highly recommend language as its uses easy syntax that made easy for developer to do code.
My question is how AI can understand what am I saying and give an answer for that. For example I am saying hello to my AI, and I expect for answer and answer can be hello, hi…But I don’t want to do it like this: if I say hello AI can say hi, hello…Basically I don’t want a chatbot.
hi i am persuing my PG , i am doing mini project using NLP
suppose if ihave a folder of files such as jpeg , .png, .tiff, .csv files relating to college. I have to develop an algorithm such that if i pass an message like ‘show me the image’ it should search through the folder and give me list of image formats.
and it should ask me a question which format you need .
please helpto to do that