Extracting locations from text using Python


In Python, we can extract location from text using NLP libraries available in Python such as NLTK, spaCy, and TextBlob.Extracting location from text is used in various Natural language processing tasks such as sentiment analysis, information retrieval, and social media analysis. In this article, we will discuss how we can extract location from text using the spaCY library.

Prerequisites

Installing spaCY library

Before using the scpaCy library for the extraction process you need to install the spaCy library using pip command in Python. To extract the spaCy library you can type the following command in your terminal or command prompt.

pip install spacy

Download the pre-trained English Model

spaCy provides pre-trained models for named entity recognition (NER). Named entity recognition is the process of identifying and categorizing named entities in text such as persons, organizations, and locations. SpaCy provides a pre-trained model for NER that can be used to extract locations from the text.

You can install the pre-trained English model using the following command −

python -m spacy download en_core_web_sm

Extracting Location from Text

Algorithm

Here is a general algorithm for extracting locations from text using spaCy −

  • Import the spaCy library

  • Load the pre-trained English model using spacy.load()

  • Define the text string that contains the location mentions

  • Create a spaCy Doc object by passing the text to the nlp() function

  • Loop over the entities in the document using the doc.ents attribute

  • Check if the entity label is 'GPE' (geopolitical entity)

  • If the entity label is 'GPE', extract the text of the entity using the entity.text attribute

  • Store the extracted locations in a list or another appropriate data structure

  • Optional: manually verify the extracted locations

  • Use the extracted locations for further analysis or processing as needed.

Using Spacy library

We first import the spaCy library and load the pre-trained English model. We then define a text string that contains a mention of a location. We use the nlp() function to create a spaCy Doc object and pass the text to it. We then loop over the entities in the document using the doc.ents attribute. For each entity, we check if the entity label is 'GPE' (which stands for the geopolitical entity) and if it is, we print the text of the entity.

Syntax

import spacy

nlp = spacy.load('en_core_web_sm')

doc = nlp(text)

Here, we first import the spaCy library and load the pre-trained English model using the spacy.load() function. We then create a spaCy Doc object by passing the text string to the spacy.load() function.The nlp() function applies a pipeline of language processing tasks to the text, such as tokenization, part-of-speech tagging, and named entity recognition. The resulting Doc object contains the processed text and its annotations, which can be accessed using various attributes and methods.

Example

In the below example, we take a sample text and extract the location from it. First, the en_core_web_sm model is loaded using spacy.load(). The text "I went to New York City last summer and visited the Statue of Liberty." is processed using the loaded model, resulting in a doc object. The code then iterates over the entities identified in the document, and if the entity label is 'GPE' (geopolitical entity), it prints the corresponding text, which in this case would be "New York City".

import spacy

nlp = spacy.load('en_core_web_sm')

text = "I went to New York City last summer and visited the Statue of Liberty."

doc = nlp(text)

for entity in doc.ents:
   if entity.label_ == 'GPE':
      print(entity.text)

Output

New York City

Extracting Multiple Location

The text might contain multiple locations in the full sentence. We can extract all the locations from the text using spaCy.

Example

In the below example, the text defined contains mentions of three locations, "Paris", "London", and "Sydney". We again use the nlp() function to create a spaCy Doc object and pass the text to it. We then loop over the entities in the document and check if the entity label is 'GPE'. If it is, we print the text of the entity.

import spacy

nlp = spacy.load('en_core_web_sm')

text = "I love traveling to Paris and London. I also enjoy visiting Sydney."

doc = nlp(text)

for entity in doc.ents:
   if entity.label_ == 'GPE':
      print(entity.text)

Output

Paris
London
Sydney

Conclusion

In this example, we discussed how we can extract location from the text using the spaCy library in Python. SpaCy provides a pre-trained model for NER that can be used to extract locations from the text. SpaCy can also be used to extract multiple locations present in a single text.

Updated on: 10-Jul-2023

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements