Extract the title from a webpage using Python

In Python, we can extract the title from a webpage using web scraping. Web scraping is the process of extracting data from a website or webpage. In this article, we will scrape the title of a webpage using various Python libraries including Requests, BeautifulSoup, urllib, Selenium, and regular expressions.

Method 1: Using Requests and BeautifulSoup

The most common approach uses the requests library to send HTTP requests and BeautifulSoup to parse HTML content. The requests library fetches the webpage, and BeautifulSoup extracts the title tag.

Example

In the below example, we extract the title of the Wikipedia homepage. We send a GET request to the URL and parse the HTML response ?

import requests
from bs4 import BeautifulSoup

url = 'https://www.wikipedia.org/'
response = requests.get(url)

soup = BeautifulSoup(response.content, 'html.parser')
title = soup.title.string

print(title)
Wikipedia

Method 2: Using urllib and BeautifulSoup

This method uses urllib (built into Python) instead of requests. The urllib library opens the URL directly and retrieves the HTML content, which is then parsed by BeautifulSoup.

Example

Here we use urllib.request.urlopen() to fetch the webpage content ?

from urllib.request import urlopen
from bs4 import BeautifulSoup

url = 'https://www.wikipedia.org/'
html_page = urlopen(url)
soup = BeautifulSoup(html_page, 'html.parser')
title = soup.title.string

print(title)
Wikipedia

Method 3: Using Selenium and BeautifulSoup

Selenium is useful for JavaScriptheavy websites where the title might be dynamically generated. It opens a real browser, loads the page completely, then extracts the HTML source.

Example

This approach uses Chrome WebDriver to load the page and get the rendered HTML ?

from selenium import webdriver
from bs4 import BeautifulSoup

url = 'https://www.wikipedia.org/'
driver = webdriver.Chrome()
driver.get(url)

html_page = driver.page_source
soup = BeautifulSoup(html_page, 'html.parser')
title = soup.title.string

print(title)
driver.quit()
Wikipedia

Method 4: Using Regular Expressions

Regular expressions can extract the title directly from HTML text without parsing the entire document. This method is faster but less reliable for complex HTML structures.

Example

We use a regex pattern to match the title tags in the HTML content ?

import requests
import re

url = 'https://www.wikipedia.org/'
response = requests.get(url)
html_content = response.content.decode('utf-8')

title_pattern = re.compile('<title>(.+?)</title>')
match = title_pattern.search(html_content)
title = match.group(1)

print(title)
Wikipedia

Comparison of Methods

Method Best For Dependencies JavaScript Support
Requests + BeautifulSoup Static websites requests, beautifulsoup4 No
urllib + BeautifulSoup No external dependencies beautifulsoup4 only No
Selenium + BeautifulSoup JavaScriptheavy sites selenium, webdriver Yes
Regular Expressions Simple HTML, speed requests only No

Conclusion

Use Requests + BeautifulSoup for most static websites as it's reliable and efficient. Choose Selenium when dealing with JavaScriptrendered content, and use regular expressions only for simple HTML structures where performance is critical.

Updated on: 2026-03-27T07:15:45+05:30

6K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements