How can BeautifulSoup package be used to parse data from a webpage in Python?

PythonServer Side ProgrammingProgramming

BeautifulSoup is a third party Python library that is used to parse data from web pages. It helps in web scraping, which is a process of extracting, using, and manipulating the data from different resources.

Web scraping can also be used to extract data for research purposes, understand/compare market trends, perform SEO monitoring, and so on.

The below line can be run to install BeautifulSoup on Windows −

pip install beautifulsoup4

Let us see an example −


import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen
import urllib
url = ''
html = urlopen(url).read()
print("Reading the webpage...")
soup = BeautifulSoup(html, features="html.parser")
print("Parsing the webpage...")
for script in soup(["script", "style"]):
   script.extract() # rip it out
print("Extracting text from the webpage...")
text = soup.get_text()
print("Data cleaning...")
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
text = '\n'.join(chunk for chunk in chunks if chunk)
text = str(text)


Reading the webpage...
Parsing the webpage...
Extracting text from the webpage...
Data cleaning...
Recursive C implementation of Euclid's algorithm from the above flowchart
A recursive algorithm is one that invokes (makes reference to) itself repeatedly until a certain condition (also known as termination condition) matches, which is a method common to functional programming….
Cookie statement


  • The required packages are imported, and aliased.

  • The website is defined.

  • The url is opened, and the ‘script’ tag and other irrelevant HTML tags are removed.

  • The ‘get_text’ function is used to extract text from the webpage data.

  • The extra spaces and invalid words are eliminated.

  • The text is printed on the console.

Updated on 18-Jan-2021 12:52:22