Fetching text from Wikipedia’s Infobox in Python


In this article, we are going to scrape the text from Wikipedia's Infobox using BeatifulSoup and requests in Python. We can do it in 10 mins. It's straightforward.

We need to install bs4 and requests. Execute the below commands to install.

pip install bs4
pip install requests

Follow the below steps to write the code to fetch the text that we want from the infobox.

  • Import the bs4 and requests modules.
  • Send an HTTP request to the page that you want to fetch data from using the requests.get() method.
  • Parse the response text using bs4.BeautifulSoup class and store it in a variable.
  • Go to the Wikipedia page and inspect the element that you want.
  • Find element using a suitable method provided by bs4.

Let's see the example code below.

Example

# importing the module
import requests
import bs4

# URL
URL = "https://en.wikipedia.org/wiki/India"

# sending the request
response = requests.get(URL)

# parsing the response
soup = bs4.BeautifulSoup(response.text, 'html')

# Now, we have paresed HTML with us. I want to get the _motto_ from the wikipedia page.
# Elements structure
# table - class="infobox"
# 3rd tr to get motto

# getting infobox
infobox = soup.find('table', {'class': 'infobox'})

# getting 3rd row element tr
third_tr = infobox.find_all('tr')[2]

# from third_tr we have to find first 'a' element and 'div' element to get required data
first_a = third_tr.div.find('a')
div = third_tr.div.div

# motto
motto = f"{first_a.text} {div.text[:len(div.text) - 3]}"

# printing the motto
print(motto)

If you run the above program, you will get the following result.

Output

Satyameva Jayate "Truth Alone Triumphs"

Conclusion

You can get any data you want by inspecting and find the element in the Wikipedia page. If you have any queries regarding the tutorial, mention them in the comment section.

Updated on: 13-Nov-2020

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements