Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Flight-price checker using Python and Selenium
Web scraping has been a useful technique for extracting data from websites for various purposes, including price checking for airline tickets. In this article, we will explore how to build a flight price checker using Selenium, a popular web automation tool. By leveraging Selenium's capabilities, we can automate the process of collecting and comparing prices for flights across different airlines, saving time and effort for users.
Prerequisites and Setup
Before we start building our flight price checker, we need to set up the required tools and dependencies.
Installing Required Packages
We'll use the modern approach with WebDriverManager to automatically handle browser drivers ?
pip install selenium webdriver-manager
Manual Setup (Alternative Method)
If you prefer manual setup:
Download Firefox browser from the official Mozilla website
Download GeckoDriver from GitHub releases
Extract and place geckodriver.exe in your system PATH or project directory
Implementation Strategy
Our flight price checker follows this workflow:
Initialize Selenium WebDriver with Firefox
Navigate to the flight booking website
Locate and interact with search elements
Extract flight prices from search results
Process and analyze the price data
Display the minimum fare and all available prices
Complete Flight Price Checker
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.firefox.options import Options
from webdriver_manager.firefox import GeckoDriverManager
import time
def setup_driver():
"""Initialize Firefox WebDriver with options"""
firefox_options = Options()
# Uncomment next line to run headless (no browser window)
# firefox_options.add_argument('--headless')
# Use WebDriverManager to automatically handle driver
service = Service(GeckoDriverManager().install())
driver = webdriver.Firefox(service=service, options=firefox_options)
return driver
def scrape_flight_prices():
"""Scrape flight prices from Paytm flights"""
# Initialize driver
driver = setup_driver()
try:
# Flight search URL (BBI to DEL on specific date)
url = 'https://paytm.com/flights/flightSearch/BBI-Bhubaneshwar/DEL-Delhi/1/0/0/E/2023-04-22'
print(f"Navigating to: {url}")
# Load webpage
driver.get(url)
# Wait for page to load
time.sleep(8)
# Find and click search button
try:
search_button = driver.find_element(By.CLASS_NAME, "_3LRd")
search_button.click()
print("Search button clicked successfully")
# Wait for results to load
time.sleep(10)
except Exception as e:
print(f"Could not find search button: {e}")
# Extract price elements
prices_elements = driver.find_elements(By.CLASS_NAME, "_2gMo")
if not prices_elements:
print("No price elements found. Website structure may have changed.")
return
# Process prices
prices_text = [price.text for price in prices_elements if price.text.strip()]
prices = []
for price_text in prices_text:
try:
# Remove commas and convert to integer
clean_price = int(price_text.replace(',', '').replace('?', ''))
prices.append(clean_price)
except ValueError:
continue
if prices:
min_price = min(prices)
max_price = max(prices)
avg_price = sum(prices) // len(prices)
print(f"\n--- Flight Price Analysis ---")
print(f"Total flights found: {len(prices)}")
print(f"Minimum fare: ?{min_price}")
print(f"Maximum fare: ?{max_price}")
print(f"Average fare: ?{avg_price}")
print(f"\nAll prices: {sorted(prices)[:10]}...") # Show first 10 prices
else:
print("No valid prices found")
except Exception as e:
print(f"An error occurred: {e}")
finally:
# Always close the browser
driver.quit()
# Run the scraper
if __name__ == "__main__":
scrape_flight_prices()
Key Components Explained
WebDriver Setup
The setup_driver() function initializes Firefox with modern WebDriverManager, eliminating manual driver management ?
Price Extraction Logic
The scraper locates price elements using CSS class selectors and processes the text to extract numerical values. Error handling ensures the script continues even if some prices can't be parsed ?
Data Processing
Extracted prices are cleaned (removing currency symbols and commas) and converted to integers for numerical analysis ?
Enhanced Features
Here's an improved version with additional functionality ?
import json
from datetime import datetime
class FlightPriceChecker:
def __init__(self):
self.driver = None
self.results = []
def setup_driver(self):
"""Initialize WebDriver"""
firefox_options = Options()
service = Service(GeckoDriverManager().install())
self.driver = webdriver.Firefox(service=service, options=firefox_options)
def scrape_prices(self, route_url):
"""Scrape prices for a specific route"""
if not self.driver:
self.setup_driver()
try:
self.driver.get(route_url)
time.sleep(8)
# Find price elements (adapt selector as needed)
price_elements = self.driver.find_elements(By.CSS_SELECTOR, "[class*='price']")
prices = []
for element in price_elements:
try:
price_text = element.text.strip()
if price_text and any(char.isdigit() for char in price_text):
clean_price = int(''.join(filter(str.isdigit, price_text)))
if clean_price > 1000: # Filter realistic prices
prices.append(clean_price)
except:
continue
return prices
except Exception as e:
print(f"Error scraping prices: {e}")
return []
def save_results(self, filename="flight_prices.json"):
"""Save results to JSON file"""
with open(filename, 'w') as f:
json.dump({
'timestamp': datetime.now().isoformat(),
'results': self.results
}, f, indent=2)
def close(self):
"""Clean up resources"""
if self.driver:
self.driver.quit()
# Usage example
checker = FlightPriceChecker()
try:
prices = checker.scrape_prices("https://example-flight-site.com/search")
if prices:
print(f"Found {len(prices)} flights")
print(f"Best price: ?{min(prices)}")
checker.save_results()
finally:
checker.close()
Important Considerations
Website Changes: Flight booking sites frequently update their HTML structure, which may break selectors
Rate Limiting: Add delays between requests to avoid being blocked
Legal Compliance: Ensure your scraping activities comply with the website's robots.txt and terms of service
Data Accuracy: Prices may change rapidly; consider implementing real-time validation
Troubleshooting
Common issues and solutions:
Element Not Found: Use browser developer tools to identify current CSS selectors
Timeout Errors: Increase wait times or implement explicit waits
Driver Issues: Ensure WebDriverManager has proper permissions to download drivers
Conclusion
Building a flight price checker with Python and Selenium provides valuable automation for price comparison. While the basic implementation demonstrates core concepts, production systems require robust error handling, dynamic selector management, and compliance with website policies. The modern WebDriverManager approach simplifies setup and maintenance compared to manual driver management.
