Article Categories

Selected Reading

Extract the title from a webpage using Python

Python Server Side Programming Programming

In Python, we can extract the title from a webpage using web scraping. Web scraping is the process of extracting data from a website or webpage. In this article, we will scrape the title of a webpage using various Python libraries including Requests, BeautifulSoup, urllib, Selenium, and regular expressions.

Method 1: Using Requests and BeautifulSoup

The most common approach uses the requests library to send HTTP requests and BeautifulSoup to parse HTML content. The requests library fetches the webpage, and BeautifulSoup extracts the title tag.

Example

In the below example, we extract the title of the Wikipedia homepage. We send a GET request to the URL and parse the HTML response ?

import requests
from bs4 import BeautifulSoup

url = 'https://www.wikipedia.org/'
response = requests.get(url)

soup = BeautifulSoup(response.content, 'html.parser')
title = soup.title.string

print(title)

Wikipedia

Method 2: Using urllib and BeautifulSoup

This method uses urllib (built into Python) instead of requests. The urllib library opens the URL directly and retrieves the HTML content, which is then parsed by BeautifulSoup.

Example

Here we use urllib.request.urlopen() to fetch the webpage content ?

from urllib.request import urlopen
from bs4 import BeautifulSoup

url = 'https://www.wikipedia.org/'
html_page = urlopen(url)
soup = BeautifulSoup(html_page, 'html.parser')
title = soup.title.string

print(title)

Wikipedia

Method 3: Using Selenium and BeautifulSoup

Selenium is useful for JavaScriptheavy websites where the title might be dynamically generated. It opens a real browser, loads the page completely, then extracts the HTML source.

Example

This approach uses Chrome WebDriver to load the page and get the rendered HTML ?

from selenium import webdriver
from bs4 import BeautifulSoup

url = 'https://www.wikipedia.org/'
driver = webdriver.Chrome()
driver.get(url)

html_page = driver.page_source
soup = BeautifulSoup(html_page, 'html.parser')
title = soup.title.string

print(title)
driver.quit()

Wikipedia

Method 4: Using Regular Expressions

Regular expressions can extract the title directly from HTML text without parsing the entire document. This method is faster but less reliable for complex HTML structures.

Example

We use a regex pattern to match the title tags in the HTML content ?

import requests
import re

url = 'https://www.wikipedia.org/'
response = requests.get(url)
html_content = response.content.decode('utf-8')

title_pattern = re.compile('<title>(.+?)</title>')
match = title_pattern.search(html_content)
title = match.group(1)

print(title)

Wikipedia

Comparison of Methods

Method	Best For	Dependencies	JavaScript Support
Requests + BeautifulSoup	Static websites	requests, beautifulsoup4	No
urllib + BeautifulSoup	No external dependencies	beautifulsoup4 only	No
Selenium + BeautifulSoup	JavaScriptheavy sites	selenium, webdriver	Yes
Regular Expressions	Simple HTML, speed	requests only	No

Conclusion

Use Requests + BeautifulSoup for most static websites as it's reliable and efficient. Choose Selenium when dealing with JavaScriptrendered content, and use regular expressions only for simple HTML structures where performance is critical.

Rohan Singh

Updated on: 2026-03-27T07:15:45+05:30

6K+ Views

Previous Next

Article Categories

Extract the title from a webpage using Python

Method 1: Using Requests and BeautifulSoup

Example

Method 2: Using urllib and BeautifulSoup

Example

Method 3: Using Selenium and BeautifulSoup

Example

Method 4: Using Regular Expressions

Example

Comparison of Methods

Conclusion

Learn More in Our Tutorials