Devesh Chauhan has Published 54 Articles

Drop rows in PySpark DataFrame with condition

Devesh Chauhan

Devesh Chauhan

Updated on 05-May-2023 13:27:10

833 Views

Applying conditions on a data frame can be very beneficial for a programmer. We can validate data to make sure that it fits our model. We can manipulate the data frame by applying conditions and filter out irrelevant data from the data frame which improves data visualization. In this article, ... Read More

Drop rows from the dataframe based on certain condition applied on a column

Devesh Chauhan

Devesh Chauhan

Updated on 05-May-2023 13:21:11

762 Views

In this article, we will discuss the different methods to drop rows from a data frame base on a one or multiple conditions. These conditions will be applied on the columns and the rows will be dropped accordingly. We will use pandas to create a data frame as it offers ... Read More

Drop rows from Pandas dataframe with missing values or NaN in columns

Devesh Chauhan

Devesh Chauhan

Updated on 05-May-2023 13:19:35

4K+ Views

A dataset consists of a wide variety of values. These values can be a “string”, “integer”, “decimal” “Boolean” or even a “data structure”. These datasets are extremely valuable and can be used in various purposes. We can train model, interpret results, produce a hypothesis and build applications with the help ... Read More

Drop rows containing specific value in pyspark dataframe

Devesh Chauhan

Devesh Chauhan

Updated on 05-May-2023 13:15:20

536 Views

When we are dealing with complex datasets, we require frameworks that can process data quickly and provide results. This is where PySpark comes into the picture. PySpark is a tool which was developed by the Apache community to process data in real time. It is an API which is used ... Read More

Drop One or Multiple Columns From PySpark DataFrame

Devesh Chauhan

Devesh Chauhan

Updated on 05-May-2023 13:11:28

538 Views

The PySpark data frame is a powerful, real time data processing framework which was developed by the Apache Spark developers. Spark was originally written in “scala” programming language and in order to increase its reach and flexibility, several APIs were built. These APIs provided an interface which can be used ... Read More

Drop Empty Columns in Pandas

Devesh Chauhan

Devesh Chauhan

Updated on 05-May-2023 13:08:22

5K+ Views

Pandas data frame is a very powerful data manipulation tool. It is a tabular data structure consisting of rows and columns. The size of this 2-D matrix can be variable depending upon the complexity of the dataset. We can use different type of sources to create a data frame ranging ... Read More

Drop duplicate rows in PySpark DataFrame

Devesh Chauhan

Devesh Chauhan

Updated on 05-May-2023 13:04:34

239 Views

PySpark is a tool designed by the Apache spark community to process data in real time and analyse the results in a local python environment. Spark data frames are different from other data frames as it distributes the information and follows a schema. Spark can handle stream processing as well ... Read More

Drop columns in DataFrame by label Names or by Index Positions

Devesh Chauhan

Devesh Chauhan

Updated on 05-May-2023 13:01:50

85 Views

A pandas data frame is a 2D data structure consisting of a series of entities. It is very useful in the analysis of mathematical data. The data is arranged in a tabular manner with each row behaving as an instance of the data. A Pandas data frame is special ... Read More

Drop Collection if already exists in MongoDB using Python

Devesh Chauhan

Devesh Chauhan

Updated on 05-May-2023 12:58:01

319 Views

MongoDB is a widely popular open-source database that stores data in a flexible JSON like format. It does not use the orthodox technique of storing data in rows and columns. Instead, it uses a more flexible approach which increases its scalability. This database is designed to handle large volumes ... Read More

Drop a list of rows from a Pandas DataFrame

Devesh Chauhan

Devesh Chauhan

Updated on 05-May-2023 12:55:13

234 Views

The pandas library in python is widely popular for representing data in the form of tabular data structures. The dataset is arranged into a 2-D matrix consisting of rows and columns. Pandas library offers numerous functions that can help the programmer to analyze the dataset by providing valuable mathematical insights. ... Read More

Advertisements