Flight-price checker using Python and Selenium


Web scraping has been a useful technique for extracting data from websites for various purposes, including price checking for airline tickets. In this article, we will explore how to build a flight price checker using Selenium, a popular web testing automation tool. By leveraging Selenium's capabilities, we can automate the process of collecting and comparing prices for flights across different airlines, saving time and effort for users.

Setup

Firefox Executable

  • Download the Firefox browser installer from here

  • Once downloaded, install the browser and an exe file will be placed automatically in C:\Program Files\Mozilla Firefox\firefox.exe. We will be needing it later.

Gecko Driver

  • Windows Users can download the gecko driver from here. For other versions see releases.

  • Extract the zip and place the “geckodriver.exe” file in C:\ directory. We will be referencing it later in our code.

Selenium Python Package

We are going to be working with the latest version of Selenium Webdriver so pip install the following −

pip3 install -U selenium
pip3 install -U webdriver-manager

Algorithm

  • Import the necessary libraries - Selenium and time

  • Set up the Firefox Gecko driver path

  • Open the website to be scraped

  • Identify the necessary elements to be scraped

  • Input the departure and arrival locations and the departure and return dates

  • Click the search button

  • Wait for the search results to load

  • Scrape the prices for the different airlines

  • Store the data in a format that's easy to read and analyze

  • Compare the prices and identify the cheapest option

Example

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options
import time

# Set Firefox options
firefox_options = Options()
firefox_options.binary_location = r'C:\Program Files\Mozilla Firefox\firefox.exe'

# Initialize webdriver with Firefox
driver = webdriver.Firefox(executable_path=r'C:\geckodriver.exe', options=firefox_options)

# Set URL and date of travel
url = 'https://paytm.com/flights/flightSearch/BBI-Bhubaneshwar/DEL-Delhi/1/0/0/E/2023-04-22'
date_of_travel = "2023-04-22"

# Print URL
print(f"URL: {url}")

# Load webpage
driver.get(url)

# Wait for 5 seconds
time.sleep(5)

# Find search button and click it
search_button = driver.find_element(By.CLASS_NAME, "_3LRd")
search_button.click()

# Find all elements with class name '_2gMo'
prices_elements = driver.find_elements(By.CLASS_NAME, "_2gMo")

# Get text of all elements
prices_text = [price.text for price in prices_elements]

# Convert text to integers
prices = [int(p.replace(',', '')) for p in prices_text]

# Display the minimum airfare price
print(f"Minimum Airfare Price: {min(prices)}")

# Display all prices
print(f"All prices:\n {prices}")

Output

Minimum Airfare Price: 4471
All prices:
 [4471, 4472, 4544, 4544, 4679, 4838, 5497, 5497, 5866, 6991, 7969, 8393, 8393, 8393, 8393, 8393, 8445, 8445, 8445, 8445, 8445, 8498, 8498, 8498, 8540, 8898, 8898, 8898, 8898, 8898, 9203, 9207, 9385, 10396, 10554, 10896, 11390, 11433, 11766, 11838, 11838, 11838, 12518, 12678, 12678, 12678, 12735, 12735, 12735, 12735, 12767, 12767, 12787, 12787, 12787, 12787, 12840, 12945, 12966, 12981, 13069, 13145, 13145, 13145, 13145, 13152, 13525, 13537, 13537, 13571, 13610, 13633, 13828, 13956, 14358, 14630, 14630, 14828, 14838, 15198, 15528, 15849, 15954, 16479, 17748, 17748, 18506, 20818, 20818, 20818, 20818, 21992, 23590, 24468, 25483, 25483, 26628, 75271]

Explanation

  • First, the necessary libraries are imported: webdriver and Options from selenium, By from selenium.webdriver.common.by, and time.

  • Next, Firefox options are set using Options() and the binary location for Firefox is set to C:\Program Files\Mozilla Firefox\firefox.exe.

  • A webdriver instance is then created with Firefox using the webdriver.Firefox() function, passing in the path to the Gecko driver executable and the Firefox options.

  • The URL and date of travel are set to https://paytm.com/flights/flightSearch/BBI-Bhubaneshwar/DEL-Delhi/1/0/0/E/2023-04-22 and "2023-04-22", respectively. The URL is then printed to the console.

  • The webpage is loaded into the browser using driver.get(url).

  • The script then waits for 5 seconds using time.sleep(5).

  • The search button on the webpage is found using driver.find_element(By.CLASS_NAME, "_3LRd") and stored in the search_button variable. The click() method is then called on the search_button variable to simulate a click on the button.

  • All elements on the web page with class name _2gMo are found using driver.find_elements(By.CLASS_NAME, "_2gMo") and stored in the prices_elements list.

  • The text of all elements in the prices_elements list is extracted using a list comprehension and stored in the prices_text list.

  • The replace() method is used to remove commas from each element in prices_text and the resulting string is converted to an integer using int(). This is done using another list comprehension and the resulting list of integers is stored in the prices list.

  • The minimum value in prices is found using the min() function and printed to the console.

  • Finally, all values in prices are printed to the console.

Application

Using Python and Selenium, this code can be used to begin scraping airfare prices from Paytm's flight search website and hereon, you can modify it to meet specific needs and additional features like storing the scraped data in a file and sending an email notification with a price, among other things.

Conclusion

Selenium is a potent web automation and scraping tool that may be used to collect information from websites without an API. Python's versatility, usability, and robust ecosystem of tools make it the perfect language for scraping. This script shows how to automate browser activities and retrieve data from a webpage with just a few lines of code.

Updated on: 21-Aug-2023

423 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements