How can BeautifulSoup package be used to extract the name of the domain of the website in Python?

PythonServer Side ProgrammingProgramming

BeautifulSoup is a third party Python library that is used to parse data from web pages. It helps in web scraping, which is a process of extracting, using, and manipulating the data from different resources. Also, it helps the developers in Natural Language Processing applications, helps analyse data, and extract meaning insights from it.

Natural Language Processing, or NLP is a part of Machine Learning that deals with text data and ways of pre-processing it to supply it as input to a Machine Learning problem.

Web scraping can also be used to extract data for research purposes, understand/compare market trends, perform SEO monitoring, and so on.

The below line can be run to install BeautifulSoup on Windows −

Example

pip install beautifulsoup4
import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen
import urllib

url = 'https://en.wikipedia.org/wiki/Algorithm'
parsed_uri = urllib.request.urlparse(url)
domainName = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)
print("The domain name is : ")
print(domainName)

Output

The domain name is :
https://en.wikipedia.org/

Explanation

  • The required packages are imported, and aliased.

  • The website is defined.

  • The domain name is determined using ‘netloc’ and ‘scheme’ functions.

  • The ‘urlparse’ function is called to get the name of the domain.

  • The domain name is printed on the console.

raja
Published on 18-Jan-2021 17:18:53
Advertisements