Pos tagging and lammetization using spacy in python

Python acts as an integral tool for understanding the concepts and application of machine learning and deep learning. It offers numerous libraries and modules that provide a magnificent platform for building useful Natural Language Processing (NLP) techniques. In this article, we will discuss one such powerful library known as spaCy.

spaCy is an open-source library used to analyze and process textual data efficiently. We will explore two key NLP concepts: Part-of-Speech (PoS) tagging and lemmatization using spaCy.

What is spaCy?

spaCy is an industrial-strength NLP library designed for production use. It provides fast and accurate text processing capabilities including tokenization, PoS tagging, lemmatization, and named entity recognition. spaCy is written in Cython, making it highly efficient for large-scale text processing tasks.

Installation and Setup

Install spaCy using pip:

pip install spacy

Download the English language model:

python -m spacy download en_core_web_sm

The model name en_core_web_sm follows a naming convention where:

  • en ? English language

  • core ? General-purpose capabilities

  • web ? Trained on web text

  • sm ? Small model size

What is PoS Tagging?

Part-of-Speech (PoS) tagging is the process of assigning grammatical categories (noun, verb, adjective, etc.) to each word in a text. This helps machines understand the syntactic role and meaning of words within their context.

Example: Basic PoS Tagging

import spacy

# Load the English model
nlp = spacy.load("en_core_web_sm")

# Sample text for analysis
text = "Python programming can be used to perform various operations."

# Process the text
doc = nlp(text)

# Display each word with its PoS tag
for token in doc:
    print(f"{token.text:12} {token.pos_}")
Python       PROPN
programming  NOUN
can          AUX
be           AUX
used         VERB
to           PART
perform      VERB
various      ADJ
operations   NOUN
.            PUNCT

Example: Filtering Specific PoS Tags

You can extract specific grammatical categories from text:

import spacy

nlp = spacy.load("en_core_web_sm")
text = "The quick brown fox jumps over the lazy dog."

doc = nlp(text)

# Extract adjectives
adjectives = [token.text for token in doc if token.pos_ == "ADJ"]
print("Adjectives:", adjectives)

# Extract nouns
nouns = [token.text for token in doc if token.pos_ == "NOUN"]
print("Nouns:", nouns)
Adjectives: ['quick', 'brown', 'lazy']
Nouns: ['fox', 'dog']

What is Lemmatization?

Lemmatization is the process of reducing inflected words to their base or dictionary form (lemma). Unlike stemming, lemmatization considers the word's context and part of speech to produce meaningful base forms.

Example: Lemmatization Process

import spacy

nlp = spacy.load("en_core_web_sm")
text = "The cats are running and jumping in the gardens."

doc = nlp(text)

# Display original word and its lemma
print(f"{'Original':12} {'Lemma':12}")
print("-" * 24)
for token in doc:
    if token.text != token.lemma_:
        print(f"{token.text:12} {token.lemma_:12}")
Original     Lemma       
------------------------
cats         cat         
are          be          
running      run         
jumping      jump        
gardens      garden      

Combined PoS Tagging and Lemmatization

import spacy

nlp = spacy.load("en_core_web_sm")
text = "The students were studying advanced algorithms."

doc = nlp(text)

print(f"{'Word':12} {'PoS':8} {'Lemma':12}")
print("-" * 32)
for token in doc:
    print(f"{token.text:12} {token.pos_:8} {token.lemma_:12}")
Word         PoS      Lemma       
--------------------------------
The          DET      the         
students     NOUN     student     
were         AUX      be          
studying     VERB     study       
advanced     ADJ      advanced    
algorithms   NOUN     algorithm   
.            PUNCT    .           

Common PoS Tags

Tag Description Example
NOUN Noun cat, car
VERB Verb run, eat
ADJ Adjective big, red
PROPN Proper noun Python, John
PUNCT Punctuation ., !

Conclusion

spaCy provides powerful tools for PoS tagging and lemmatization that are essential for text preprocessing in NLP applications. PoS tagging helps identify grammatical roles while lemmatization reduces words to their base forms for better text analysis and understanding.

Updated on: 2026-03-27T00:24:22+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements