Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Articles by Devesh Chauhan
Page 2 of 5
Python program to divide dictionary and its keys into K equal dictionaries
A dictionary is a unique data structure in Python that stores data as key-value pairs, where each key is a unique identifier used to access its corresponding value. We can perform various operations on dictionaries to manipulate the stored data. This article explains how to divide a dictionary into K equal dictionaries where each value is divided by K, and K represents the number of keys in the original dictionary. Understanding the Problem Given a dictionary, we need to create K copies where each value is divided by K (the total number of keys). Let's understand this ...
Read MoreDrop rows from the dataframe based on certain condition applied on a column
In this article, we will discuss different methods to drop rows from a DataFrame based on conditions applied to columns. We will use pandas to create and manipulate DataFrames, demonstrating various filtering techniques. Pandas is a powerful library that supports multiple file types including CSV, JSON, HTML, SQL, and Excel, making it an essential tool for data manipulation. Creating a Pandas DataFrame We will create a DataFrame consisting of player profiles with their ratings and salaries arranged in rows and columns ? import pandas as pd dataset = { "Player ...
Read MoreDrop rows containing specific value in pyspark dataframe
When dealing with large datasets, PySpark provides powerful tools for data processing and manipulation. PySpark is Apache Spark's Python API that allows you to work with distributed data processing in your local Python environment. In this tutorial, we'll learn how to drop rows containing specific values from a PySpark DataFrame using different methods. This selective data elimination is essential for data cleaning and maintaining data relevance. Creating a Sample PySpark DataFrame First, let's create a sample DataFrame to demonstrate the row dropping techniques ? from pyspark.sql import SparkSession # Create SparkSession spark = SparkSession.builder.appName("DropRowsDemo").getOrCreate() ...
Read MoreDrop One or Multiple Columns From PySpark DataFrame
A PySpark DataFrame is a distributed data structure built on Apache Spark that provides powerful data processing capabilities. Sometimes you need to remove unnecessary columns to optimize performance or focus on specific data. PySpark offers several methods to drop one or multiple columns from a DataFrame. Creating a PySpark DataFrame First, let's create a sample DataFrame to demonstrate column dropping operations ? from pyspark.sql import SparkSession import pandas as pd # Create SparkSession spark = SparkSession.builder.appName("DropColumns").getOrCreate() # Sample dataset dataset = { "Device name": ["Laptop", "Mobile phone", "TV", "Radio"], ...
Read MoreDrop Empty Columns in Pandas
Pandas DataFrames often contain empty columns filled with NaN values that can clutter your data analysis. Python provides several efficient methods to identify and remove these empty columns to create cleaner, more relevant datasets. What Are Empty Columns? In pandas, a column is considered empty when it contains only NaN (Not a Number) values. Note that columns with empty strings, zeros, or spaces are not considered empty since these values may carry meaningful information about your dataset. Creating a DataFrame with Empty Columns Let's start by creating a sample DataFrame that includes an empty column filled ...
Read MoreDrop duplicate rows in PySpark DataFrame
PySpark is a Python API for Apache Spark, designed to process large-scale data in real-time with distributed computing capabilities. Unlike regular DataFrames, PySpark DataFrames distribute data across clusters and follow a strict schema for optimized processing. In this article, we'll explore different methods to drop duplicate rows from PySpark DataFrames using distinct() and dropDuplicates() functions. Installation Install PySpark using pip ? pip install pyspark Creating a PySpark DataFrame First, let's create a sample DataFrame with duplicate rows to demonstrate the deduplication methods ? from pyspark.sql import SparkSession import pandas as ...
Read MoreDrop columns in DataFrame by label Names or by Index Positions
A pandas DataFrame is a 2D data structure for storing tabular data. When working with DataFrames, you often need to remove unwanted columns. This can be done by specifying column names or their index positions using the drop() method. In this tutorial, we'll explore different methods to drop columns from a pandas DataFrame including dropping by names, index positions, and ranges. Creating the Sample DataFrame Let's start by creating a sample DataFrame to work with ? import pandas as pd dataset = { "Employee ID": ["CIR45", "CIR12", "CIR18", "CIR50", "CIR28"], ...
Read MoreDrop a list of rows from a Pandas DataFrame
The pandas library in Python is widely popular for representing data in tabular structures called DataFrames. When working with data analysis, you often need to remove specific rows from your DataFrame. This article demonstrates three effective methods for dropping multiple rows from a Pandas DataFrame. Creating a Sample DataFrame Let's start by creating a DataFrame with student marks data ? import pandas as pd dataset = { "Aman": [98, 92, 88, 90, 91], "Raj": [78, 62, 90, 71, 45], "Saloni": [82, ...
Read MorePlotting stock charts in excel sheet using xlsxwriter module in python
Factors such as data analysis and growth rate monitoring are very important when it comes to plotting stock charts. For any business to flourish and expand, the right strategy is needed. These strategies are built on the back of a deep fundamental research. Python programming helps us to create and compare data which in turn can be used to study a business model. Python offers several methods and functions through which we can plot graphs, analyze growth and introspect the sudden changes. In this article we will be discussing about one such operation where we will plot a stock ...
Read MorePos tagging and lammetization using spacy in python
Python acts as an integral tool for understanding the concepts and application of machine learning and deep learning. It offers numerous libraries and modules that provide a magnificent platform for building useful Natural Language Processing (NLP) techniques. In this article, we will discuss one such powerful library known as spaCy. spaCy is an open-source library used to analyze and process textual data efficiently. We will explore two key NLP concepts: Part-of-Speech (PoS) tagging and lemmatization using spaCy. What is spaCy? spaCy is an industrial-strength NLP library designed for production use. It provides fast and accurate text processing ...
Read More