Article Categories

Selected Reading

Fetching text from Wikipedia's Infobox in Python

Python Server Side Programming Programming

In this article, we are going to scrape the text from Wikipedia's Infobox using BeautifulSoup and requests in Python. We can do it in 10 minutes. It's straightforward and useful for extracting structured information from Wikipedia pages.

Prerequisites

We need to install bs4 and requests. Execute the below commands to install ?

pip install beautifulsoup4
pip install requests

Steps to Extract Infobox Data

Follow the below steps to write the code to fetch the text that we want from the infobox ?

Import the bs4 and requests modules.
Send an HTTP request to the page that you want to fetch data from using the requests.get() method.
Parse the response text using bs4.BeautifulSoup class and store it in a variable.
Go to the Wikipedia page and inspect the element that you want.
Find element using a suitable method provided by bs4.

Example: Extracting India's Motto

Let's extract the motto from India's Wikipedia page. We'll target the infobox table and navigate to the specific row containing the motto ?

# importing the modules
import requests
import bs4

# URL
URL = "https://en.wikipedia.org/wiki/India"

# sending the request
response = requests.get(URL)

# parsing the response
soup = bs4.BeautifulSoup(response.text, 'html.parser')

# Now, we have parsed HTML with us. I want to get the motto from the wikipedia page.
# Elements structure
# table - class="infobox"
# 3rd tr to get motto

# getting infobox
infobox = soup.find('table', {'class': 'infobox'})

# getting 3rd row element tr
third_tr = infobox.find_all('tr')[2]

# from third_tr we have to find first 'a' element and 'div' element to get required data
first_a = third_tr.div.find('a')
div = third_tr.div.div

# motto
motto = f"{first_a.text} {div.text[:len(div.text) - 3]}"

# printing the motto
print(motto)

If you run the above program, you will get the following result ?

Satyameva Jayate "Truth Alone Triumphs"

How It Works

The code works by targeting the infobox table structure. Wikipedia infoboxes have consistent HTML structure where information is organized in table rows. We locate the specific row containing the motto and extract the text from nested elements.

Key Points

Always use 'html.parser' as the parser for better compatibility
Wikipedia's infobox structure may vary between pages
Inspect the HTML structure before writing extraction code
Handle potential missing elements with try-except blocks for robust scraping

Conclusion

You can extract any data from Wikipedia infoboxes by inspecting the HTML structure and targeting specific elements. This method works for extracting structured information like population, area, capitals, and other metadata from Wikipedia pages.

Hafeezul Kareem

Updated on: 2026-03-25T12:19:03+05:30

2K+ Views

Previous Next