How can titles from a webpage be extracted using BeautifulSoup?

PythonServer Side ProgrammingProgramming

BeautifulSoup is a third party Python library that is used to parse data from web pages. It helps the developers in Natural Language Processing applications, helps analyse data, and extract meaning insights from it.

Natural Language Processing, or NLP is a part of Machine Learning that deals with text data and ways of pre-processing it to supply it as input to a Machine Learning problem.

Web scraping can also be used to extract data for research purposes, understand/compare market trends, perform SEO monitoring, and so on.

The below line can be run to install BeautifulSoup on Windows −

pip install beautifulsoup4

Following is an example −

Example

from bs4 import BeautifulSoup
import requests
url = "https://en.wikipedia.org/wiki/Algorithm"
req = requests.get(url)
soup = BeautifulSoup(req.text, "html.parser")
print("The titles are :")
print(soup.title)

Output

The titles are :
<title>Algorithm − Wikipedia

Explanation

  • The required packages are imported, and aliased.

  • The website is defined.

  • The url is opened, and data is read from it.

  • The ‘BeautifulSoup’ function is used to extract text from the webpage.

  • The titles are extracted using ‘title’ attribute.

  • The titles are printed on the console.

raja
Published on 18-Jan-2021 12:52:57
Advertisements