Devesh Chauhan

Devesh Chauhan

47 Articles Published

Articles by Devesh Chauhan

Page 2 of 5

Python program to divide dictionary and its keys into K equal dictionaries

Devesh Chauhan
Devesh Chauhan
Updated on 27-Mar-2026 558 Views

A dictionary is a unique data structure in Python that stores data as key-value pairs, where each key is a unique identifier used to access its corresponding value. We can perform various operations on dictionaries to manipulate the stored data. This article explains how to divide a dictionary into K equal dictionaries where each value is divided by K, and K represents the number of keys in the original dictionary. Understanding the Problem Given a dictionary, we need to create K copies where each value is divided by K (the total number of keys). Let's understand this ...

Read More

Drop rows from the dataframe based on certain condition applied on a column

Devesh Chauhan
Devesh Chauhan
Updated on 27-Mar-2026 1K+ Views

In this article, we will discuss different methods to drop rows from a DataFrame based on conditions applied to columns. We will use pandas to create and manipulate DataFrames, demonstrating various filtering techniques. Pandas is a powerful library that supports multiple file types including CSV, JSON, HTML, SQL, and Excel, making it an essential tool for data manipulation. Creating a Pandas DataFrame We will create a DataFrame consisting of player profiles with their ratings and salaries arranged in rows and columns ? import pandas as pd dataset = { "Player ...

Read More

Drop rows containing specific value in pyspark dataframe

Devesh Chauhan
Devesh Chauhan
Updated on 27-Mar-2026 1K+ Views

When dealing with large datasets, PySpark provides powerful tools for data processing and manipulation. PySpark is Apache Spark's Python API that allows you to work with distributed data processing in your local Python environment. In this tutorial, we'll learn how to drop rows containing specific values from a PySpark DataFrame using different methods. This selective data elimination is essential for data cleaning and maintaining data relevance. Creating a Sample PySpark DataFrame First, let's create a sample DataFrame to demonstrate the row dropping techniques ? from pyspark.sql import SparkSession # Create SparkSession spark = SparkSession.builder.appName("DropRowsDemo").getOrCreate() ...

Read More

Drop One or Multiple Columns From PySpark DataFrame

Devesh Chauhan
Devesh Chauhan
Updated on 27-Mar-2026 1K+ Views

A PySpark DataFrame is a distributed data structure built on Apache Spark that provides powerful data processing capabilities. Sometimes you need to remove unnecessary columns to optimize performance or focus on specific data. PySpark offers several methods to drop one or multiple columns from a DataFrame. Creating a PySpark DataFrame First, let's create a sample DataFrame to demonstrate column dropping operations ? from pyspark.sql import SparkSession import pandas as pd # Create SparkSession spark = SparkSession.builder.appName("DropColumns").getOrCreate() # Sample dataset dataset = { "Device name": ["Laptop", "Mobile phone", "TV", "Radio"], ...

Read More

Drop Empty Columns in Pandas

Devesh Chauhan
Devesh Chauhan
Updated on 27-Mar-2026 11K+ Views

Pandas DataFrames often contain empty columns filled with NaN values that can clutter your data analysis. Python provides several efficient methods to identify and remove these empty columns to create cleaner, more relevant datasets. What Are Empty Columns? In pandas, a column is considered empty when it contains only NaN (Not a Number) values. Note that columns with empty strings, zeros, or spaces are not considered empty since these values may carry meaningful information about your dataset. Creating a DataFrame with Empty Columns Let's start by creating a sample DataFrame that includes an empty column filled ...

Read More

Drop duplicate rows in PySpark DataFrame

Devesh Chauhan
Devesh Chauhan
Updated on 27-Mar-2026 592 Views

PySpark is a Python API for Apache Spark, designed to process large-scale data in real-time with distributed computing capabilities. Unlike regular DataFrames, PySpark DataFrames distribute data across clusters and follow a strict schema for optimized processing. In this article, we'll explore different methods to drop duplicate rows from PySpark DataFrames using distinct() and dropDuplicates() functions. Installation Install PySpark using pip ? pip install pyspark Creating a PySpark DataFrame First, let's create a sample DataFrame with duplicate rows to demonstrate the deduplication methods ? from pyspark.sql import SparkSession import pandas as ...

Read More

Drop columns in DataFrame by label Names or by Index Positions

Devesh Chauhan
Devesh Chauhan
Updated on 27-Mar-2026 279 Views

A pandas DataFrame is a 2D data structure for storing tabular data. When working with DataFrames, you often need to remove unwanted columns. This can be done by specifying column names or their index positions using the drop() method. In this tutorial, we'll explore different methods to drop columns from a pandas DataFrame including dropping by names, index positions, and ranges. Creating the Sample DataFrame Let's start by creating a sample DataFrame to work with ? import pandas as pd dataset = { "Employee ID": ["CIR45", "CIR12", "CIR18", "CIR50", "CIR28"], ...

Read More

Drop a list of rows from a Pandas DataFrame

Devesh Chauhan
Devesh Chauhan
Updated on 27-Mar-2026 606 Views

The pandas library in Python is widely popular for representing data in tabular structures called DataFrames. When working with data analysis, you often need to remove specific rows from your DataFrame. This article demonstrates three effective methods for dropping multiple rows from a Pandas DataFrame. Creating a Sample DataFrame Let's start by creating a DataFrame with student marks data ? import pandas as pd dataset = { "Aman": [98, 92, 88, 90, 91], "Raj": [78, 62, 90, 71, 45], "Saloni": [82, ...

Read More

Plotting stock charts in excel sheet using xlsxwriter module in python

Devesh Chauhan
Devesh Chauhan
Updated on 27-Mar-2026 315 Views

Factors such as data analysis and growth rate monitoring are very important when it comes to plotting stock charts. For any business to flourish and expand, the right strategy is needed. These strategies are built on the back of a deep fundamental research. Python programming helps us to create and compare data which in turn can be used to study a business model. Python offers several methods and functions through which we can plot graphs, analyze growth and introspect the sudden changes. In this article we will be discussing about one such operation where we will plot a stock ...

Read More

Pos tagging and lammetization using spacy in python

Devesh Chauhan
Devesh Chauhan
Updated on 27-Mar-2026 1K+ Views

Python acts as an integral tool for understanding the concepts and application of machine learning and deep learning. It offers numerous libraries and modules that provide a magnificent platform for building useful Natural Language Processing (NLP) techniques. In this article, we will discuss one such powerful library known as spaCy. spaCy is an open-source library used to analyze and process textual data efficiently. We will explore two key NLP concepts: Part-of-Speech (PoS) tagging and lemmatization using spaCy. What is spaCy? spaCy is an industrial-strength NLP library designed for production use. It provides fast and accurate text processing ...

Read More
Showing 11–20 of 47 articles
Advertisements