Natural Language Toolkit - Introduction


What is Natural Language Processing (NLP)?

The method of communication with the help of which humans can speak, read, and write, is language. In other words, we humans can think, make plans, make decisions in our natural language. Here the big question is, in the era of artificial intelligence, machine learning and deep learning, can humans communicate in natural language with computers/machines? Developing NLP applications is a huge challenge for us because computers require structured data, but on the other hand, human speech is unstructured and often ambiguous in nature.

Natural language is that subfield of computer science, more specifically of AI, which enables computers/machines to understand, process and manipulate human language. In simple words, NLP is a way of machines to analyze, understand and derive meaning from human natural languages like Hindi, English, French, Dutch, etc.

How does it work?

Before getting deep dive into the working of NLP, we must have to understand how human beings use language. Every day, we humans use hundreds or thousands of words and other humans interpret them and answer accordingly. It’s a simple communication for humans, isn’t it? But we know words run much-much deeper than that and we always derive a context from what we say and how we say. That’s why we can say rather than focuses on voice modulation, NLP does draw on contextual pattern.

Let us understand it with an example −

Man is to woman as king is to what?
We can interpret it easily and answer as follows:
Man relates to king, so woman can relate to queen.
Hence the answer is Queen.

How humans know what word means what? The answer to this question is that we learn through our experience. But, how do machines/computers learn the same?

Let us understand it with following easy steps −

  • First, we need to feed the machines with enough data so that machines can learn from experience.

  • Then machine will create word vectors, by using deep learning algorithms, from the data we fed earlier as well as from its surrounding data.

  • Then by performing simple algebraic operations on these word vectors, machine would be able to provide the answers as human beings.

Components of NLP

Following diagram represents the components of natural language processing (NLP) −


Morphological Processing

Morphological processing is the first component of NLP. It includes breaking of chunks of language input into sets of tokens corresponding to paragraphs, sentences and words. For example, a word like “everyday” can be broken into two sub-word tokens as “every-day”.

Syntax analysis

Syntax Analysis, the second component, is one of the most important components of NLP. The purposes of this component are as follows −

  • To check that a sentence is well formed or not.

  • To break it up into a structure that shows the syntactic relationships between the different words.

  • E.g. The sentences like “The school goes to the student” would be rejected by syntax analyzer.

Semantic analysis

Semantic Analysis is the third component of NLP which is used to check the meaningfulness of the text. It includes drawing exact meaning, or we can say dictionary meaning from the text. E.g. The sentences like “It’s a hot ice-cream.” would be discarded by semantic analyzer.

Pragmatic analysis

Pragmatic analysis is the fourth component of NLP. It includes fitting the actual objects or events that exist in each context with object references obtained by previous component i.e. semantic analysis. E.g. The sentences like “Put the fruits in the basket on the table” can have two semantic interpretations hence the pragmatic analyzer will choose between these two possibilities.

Examples of NLP Applications

NLP, an emerging technology, derives various forms of AI we used to see these days. For today’s and tomorrow’s increasingly cognitive applications, the use of NLP in creating a seamless and interactive interface between humans and machines will continue to be a top priority. Following are some of the very useful applications of NLP.

Machine Translation

Machine translation (MT) is one of the most important applications of natural language processing. MT is basically a process of translating one source language or text into another language. Machine translation system can be of either Bilingual or Multilingual.

Fighting Spam

Due to enormous increase in unwanted emails, spam filters have become important because it is the first line of defense against this problem. By considering its false-positive and false-negative issues as the main issues, the functionality of NLP can be used to develop spam filtering system.

N-gram modelling, Word Stemming and Bayesian classification are some of the existing NLP models that can be used for spam filtering.

Information retrieval & Web search

Most of the search engines like Google, Yahoo, Bing, WolframAlpha, etc., base their machine translation (MT) technology on NLP deep learning models. Such deep learning models allow algorithms to read text on webpage, interprets its meaning and translate it to another language.

Automatic Text Summarization

Automatic text summarization is a technique which creates a short, accurate summary of longer text documents. Hence, it helps us in getting relevant information in less time. In this digital era, we are in a serious need of automatic text summarization because we have the flood of information over internet which is not going to stop. NLP and its functionalities play an important role in developing an automatic text summarization.

Grammar Correction

Spelling correction & grammar correction is a very useful feature of word processor software like Microsoft Word. Natural language processing (NLP) is widely used for this purpose.


Question-answering, another main application of natural language processing (NLP), focuses on building systems which automatically answer the question posted by user in their natural language.

Sentiment analysis

Sentiment analysis is among one other important applications of natural language processing (NLP). As its name implies, Sentiment analysis is used to −

  • Identify the sentiments among several posts and

  • Identify the sentiment where the emotions are not expressed explicitly.

Online E-commerce companies like Amazon, ebay, etc., are using sentiment analysis to identify the opinion and sentiment of their customers online. It will help them to understand what their customers think about their products and services.

Speech engines

Speech engines like Siri, Google Voice, Alexa are built on NLP so that we can communicate with them in our natural language.

Implementing NLP

In order to build the above-mentioned applications, we need to have specific skill set with a great understanding of language and tools to process the language efficiently. To achieve this, we have various open-source tools available. Some of them are open-sourced while others are developed by organizations to build their own NLP applications. Following is the list of some NLP tools −

  • Natural Language Tool Kit (NLTK)

  • Mallet

  • GATE

  • Open NLP

  • UIMA

  • Genism

  • Stanford toolkit

Most of these tools are written in Java.

Natural Language Tool Kit (NLTK)

Among the above-mentioned NLP tool, NLTK scores very high when it comes to the ease of use and explanation of the concept. The learning curve of Python is very fast and NLTK is written in Python so NLTK is also having very good learning kit. NLTK has incorporated most of the tasks like tokenization, stemming, Lemmatization, Punctuation, Character Count, and Word count. It is very elegant and easy to work with.