Removing stop words with NLTK in Python



In NLP(Natural Language Processing), stop words are the words that are filtered out before or after processing text data, such as "is", "and", "a" etc. These words do not add meaning to the text and can be removed to improve the efficiency.

The Natural Language Toolkit (NLTK) is the python library that provides the easy to use interface and the tools for text processing such as tokenization and stop word removal. In this article, we will explore how to remove stop words using NLTK.

NLTK Stop Words

Before going to use the NLTK stop words, we have to make sure that the nltk package is installed. If not installed use the below command to install -

pip install nltk

After installation, import the necessary modules and download the stop words corpus.

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
nltk.download('punkt')
nltk.download('stopwords')

Let's dive into the examples for getting more idea of removing the stop words with NLTK.

Example 1

In this scenario, we are using the word_tokenize() for splitting the sentence into the words then using the list comprehension to the filter the stop words (like "is", "a" etc.).

Let's look at the following example, where we are going to comsider the basic stop word removal.

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
str1="Welcome to the TutorialsPoint"
a=set(stopwords.words('english'))
words=word_tokenize(str1)
result=[word for word in words if word.lower() not in a]
print(result)

The output of the above program is as follows -

['Welcome', 'TutorialsPoint']

Example 2

In this case, we are using the string with the punctuation, which retains the punctuation as they are not part of the stop words.

Consider the following example, where we are applying the punctuation in the string and observing the output.

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
str1="Hello,! have a nice day..!"
a=set(stopwords.words('english'))
words=word_tokenize(str1)
result=[word for word in words if word.lower() not in a]
print(result)

The output of the above program is as follows -

['Hello', ',', '!', 'nice', 'day', '..', '!']
Updated on: 2025-08-28T12:55:26+05:30

770 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements