Extracting HTML tables from web pages is a common task in web scraping and data analysis. Python provides powerful libraries like requests, BeautifulSoup, and pandas to make this process straightforward. Required Libraries First, install the necessary packages if they're not already available ? pip install requests beautifulsoup4 pandas tabulate Basic Setup Import the required libraries and set up the target URL ? import requests import pandas as pd from bs4 import BeautifulSoup from tabulate import tabulate # Set the target URL site_url = "https://www.tutorialspoint.com/python/python_basic_operators.htm" Making HTTP Request ... Read More
Sometimes during data analysis, we need to examine duplicate rows to understand patterns in our data rather than dropping them immediately. Pandas provides several methods to find, filter, and handle duplicate rows effectively. The duplicated() Method The duplicated() method identifies duplicate rows in a DataFrame. Let's work with an HR dataset to demonstrate this functionality ? import pandas as pd import numpy as np # Import HR Dataset with certain columns df = pd.read_csv("https://raw.githubusercontent.com/sasankac/TestDataSet/master/HRDataset.csv", usecols=["Employee_Name", "PerformanceScore", "Position", "CitizenDesc"]) ... Read More
Pandas provides powerful indexing capabilities to select subsets of data. Lexicographical slicing allows you to select data based on alphabetical ordering of string indexes, similar to how words are arranged in a dictionary. Loading and Exploring the Dataset Let's start by importing a movies dataset and examining its structure − import pandas as pd import numpy as np movies = pd.read_csv("https://raw.githubusercontent.com/sasankac/TestDataSet/master/movies_data.csv", index_col="title", ... Read More
Pandas provides powerful selection capabilities to extract subsets of data using either index positions or index labels. This article demonstrates how to select data using index labels with the .loc accessor. The .loc attribute works similar to Python dictionaries, selecting data by index labels rather than positions. This is different from .iloc which selects by integer position like Python lists. Setting Up the Dataset Let's start by importing a movies dataset with the title as the index ? import pandas as pd movies = pd.read_csv("https://raw.githubusercontent.com/sasankac/TestDataSet/master/movies_data.csv", ... Read More
Finding the largest or smallest items in a collection is a common task in Python. This article explores different methods to find single or multiple largest/smallest values efficiently. Method 1: Using min() and max() for Single Items For finding a single smallest or largest item (N=1), min() and max() are the most efficient functions ? import random # Create a random list of integers random_list = random.sample(range(1, 10), 9) print("List:", random_list) # Find the smallest number smallest = min(random_list) print("Smallest:", smallest) # Find the largest number largest = max(random_list) print("Largest:", largest) ... Read More
When analyzing sequences of data, identifying the most frequently occurring items is a common task. Python's Counter from the collections module provides an elegant solution for counting and finding the most frequent elements in any sequence. What is a Counter? The Counter is a subclass of dictionary that stores elements as keys and their counts as values. Unlike regular dictionaries that raise a KeyError for missing keys, Counter returns zero for non-existent items. from collections import Counter # Regular dictionary raises KeyError regular_dict = {} try: print(regular_dict['missing_key']) except KeyError as e: ... Read More
We can find elements and extract their text with Selenium webdriver. First, identify the element using any locator like id, class name, CSS selector, or XPath. Then use the text property to obtain the text content. Syntax element_text = driver.find_element(By.CSS_SELECTOR, "h4").text Here driver is the webdriver object. The find_element() method identifies the element using the specified locator, and the text property extracts the text content. Modern Selenium Approach Recent Selenium versions use the By class for locators instead of the deprecated find_element_by_* methods ? from selenium import webdriver from selenium.webdriver.common.by import ... Read More
Setting default timeout in Selenium Python WebDriver helps prevent tests from hanging indefinitely. Selenium provides two main approaches: set_page_load_timeout() for page loading and implicitly_wait() for element location. Page Load Timeout The set_page_load_timeout() method sets a timeout for page loading. If the page doesn't load within the specified time, a TimeoutException is thrown. Syntax driver.set_page_load_timeout(timeout_in_seconds) Example from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.common.exceptions import TimeoutException # Setup WebDriver service = Service() driver = webdriver.Chrome(service=service) try: # Set page load timeout to 10 seconds ... Read More
We can wait until the page is loaded with Selenium WebDriver using synchronization concepts. Selenium provides implicit and explicit wait mechanisms. To wait until the page is loaded, we use the explicit wait approach. The explicit wait depends on expected conditions for particular element behaviors. For waiting until the page loads, we use expected conditions like presence_of_element_located for a specific element. If the wait time elapses without the condition being met, a timeout error is thrown. Required Imports To implement explicit wait conditions, we need the WebDriverWait and expected_conditions classes ? from selenium import webdriver ... Read More
Sometimes we need to check if a list can be completely partitioned into valid groups. This problem involves grouping numbers using specific rules to determine if the entire list is in a "valid state". Problem Definition Given a list of numbers, check if every number can be grouped using one of these rules: Contiguous pairs: (a, a) − two identical numbers Identical triplets: (a, a, a) − three identical numbers Consecutive triplets: (a, a+1, a+2) − three consecutive numbers Example For nums = [7, 7, 3, 4, 5], we can group [7, 7] ... Read More
Data Structure
Networking
RDBMS
Operating System
Java
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Economics & Finance