Article Categories

Selected Reading

Extracting locations from text using Python

Python Server Side Programming Programming

In Python, we can extract locations from text using NLP libraries such as NLTK, spaCy, and TextBlob. Extracting locations from text is crucial for various Natural Language Processing tasks such as sentiment analysis, information retrieval, and social media analysis. In this article, we will discuss how to extract locations from text using the spaCy library.

Prerequisites

Installing spaCy Library

Before using the spaCy library for location extraction, you need to install it using the pip command. Type the following command in your terminal or command prompt ?

pip install spacy

Download the Pre-trained English Model

spaCy provides pre-trained models for Named Entity Recognition (NER). NER is the process of identifying and categorizing named entities in text such as persons, organizations, and locations. You can install the pre-trained English model using the following command ?

python -m spacy download en_core_web_sm

Algorithm for Location Extraction

Here is a general algorithm for extracting locations from text using spaCy ?

Import the spaCy library
Load the pre-trained English model using spacy.load()
Define the text string that contains location mentions
Create a spaCy Doc object by passing the text to the nlp() function
Loop over the entities in the document using the doc.ents attribute
Check if the entity label is 'GPE' (geopolitical entity)
If the entity label is 'GPE', extract the text using the entity.text attribute
Store the extracted locations in a list for further processing

Basic Location Extraction

Syntax

import spacy

nlp = spacy.load('en_core_web_sm')
doc = nlp(text)

for entity in doc.ents:
    if entity.label_ == 'GPE':
        print(entity.text)

Here, we import the spaCy library and load the pre-trained English model using spacy.load(). The nlp() function applies a pipeline of language processing tasks including tokenization, part-of-speech tagging, and named entity recognition.

Example

Let's extract a location from a sample text. The model identifies 'New York City' as a geopolitical entity (GPE) ?

import spacy

nlp = spacy.load('en_core_web_sm')

text = "I went to New York City last summer and visited the Statue of Liberty."

doc = nlp(text)

for entity in doc.ents:
    if entity.label_ == 'GPE':
        print(entity.text)

New York City

Extracting Multiple Locations

When text contains multiple location mentions, spaCy can extract all of them in a single pass ?

import spacy

nlp = spacy.load('en_core_web_sm')

text = "I love traveling to Paris and London. I also enjoy visiting Sydney."

doc = nlp(text)

locations = []
for entity in doc.ents:
    if entity.label_ == 'GPE':
        locations.append(entity.text)

print("Found locations:", locations)

Found locations: ['Paris', 'London', 'Sydney']

Enhanced Location Extraction

You can also extract additional information about each location entity, including its position in the text ?

import spacy

nlp = spacy.load('en_core_web_sm')

text = "Tokyo is the capital of Japan, while Beijing is the capital of China."

doc = nlp(text)

for entity in doc.ents:
    if entity.label_ == 'GPE':
        print(f"Location: {entity.text}")
        print(f"Start: {entity.start_char}, End: {entity.end_char}")
        print(f"Label: {entity.label_}")
        print("---")

Location: Tokyo
Start: 0, End: 5
Label: GPE
---
Location: Japan
Start: 26, End: 31
Label: GPE
---
Location: Beijing
Start: 39, End: 46
Label: GPE
---
Location: China
Start: 67, End: 72
Label: GPE
---

Key Points

GPE stands for "Geopolitical Entity" and includes countries, cities, states, and regions
spaCy's NER model is pre-trained and works out-of-the-box for common locations
The accuracy depends on the training data and may not recognize very obscure location names
You can also check for other location-related labels like 'LOC' for non-geopolitical locations

Conclusion

spaCy provides an efficient way to extract locations from text using its pre-trained NER models. The library can identify single or multiple locations and provides additional metadata about each entity. This makes it valuable for location-based text analysis and information extraction tasks.

Rohan Singh

Updated on: 2026-03-27T07:16:44+05:30

3K+ Views

Previous Next