Devesh Chauhan

Devesh Chauhan

47 Articles Published

Articles by Devesh Chauhan

Page 4 of 5

Drop rows from the dataframe based on certain condition applied on a column

Devesh Chauhan
Devesh Chauhan
Updated on 05-May-2023 1K+ Views

In this article, we will discuss the different methods to drop rows from a data frame base on a one or multiple conditions. These conditions will be applied on the columns and the rows will be dropped accordingly. We will use pandas to create a data frame as it offers multiple functions to manipulate the data frame. We will also create a dataset which will act as a reference for the data frame although it is not mandatory to create one, we can also use a CSV file or any other document. Pandas support multiple file types including: “CSV”, ...

Read More

Drop rows containing specific value in pyspark dataframe

Devesh Chauhan
Devesh Chauhan
Updated on 05-May-2023 1K+ Views

When we are dealing with complex datasets, we require frameworks that can process data quickly and provide results. This is where PySpark comes into the picture. PySpark is a tool which was developed by the Apache community to process data in real time. It is an API which is used to create data frames and interpret results in our local python environment. The data frame can contain huge amount of information/data and in order to maintain the relevance of the data to be interpreted we make the required changes. In this article, we will manipulate a PySpark data frame ...

Read More

Drop One or Multiple Columns From PySpark DataFrame

Devesh Chauhan
Devesh Chauhan
Updated on 05-May-2023 1K+ Views

The PySpark data frame is a powerful, real time data processing framework which was developed by the Apache Spark developers. Spark was originally written in “scala” programming language and in order to increase its reach and flexibility, several APIs were built. These APIs provided an interface which can be used to run spark applications on our local environment. One such API is known as PySpark which was developed for the python environment. The PySpark data frame also consists of rows and columns but the processing part is different as it uses in-system (RAM) computational techniques for processing the data. ...

Read More

Drop Empty Columns in Pandas

Devesh Chauhan
Devesh Chauhan
Updated on 05-May-2023 11K+ Views

Pandas data frame is a very powerful data manipulation tool. It is a tabular data structure consisting of rows and columns. The size of this 2-D matrix can be variable depending upon the complexity of the dataset. We can use different type of sources to create a data frame ranging from databases to files. The columns in a pandas data frame represents a series of information and it can be an integer, float, or string. We can perform numerous operations on these columns including deletion, indexing, filtering etc. In this article, we will perform one such basic operation of ...

Read More

Drop duplicate rows in PySpark DataFrame

Devesh Chauhan
Devesh Chauhan
Updated on 05-May-2023 575 Views

PySpark is a tool designed by the Apache spark community to process data in real time and analyse the results in a local python environment. Spark data frames are different from other data frames as it distributes the information and follows a schema. Spark can handle stream processing as well as batch processing and this is the reason for their popularity. A PySpark data frame requires a session in order to generate an entry point and it performs on-system processing of the data (RAM). You can install PySpark module on windows using the following command – pip install pyspark ...

Read More

Drop columns in DataFrame by label Names or by Index Positions

Devesh Chauhan
Devesh Chauhan
Updated on 05-May-2023 259 Views

A pandas data frame is a 2D data structure consisting of a series of entities. It is very useful in the analysis of mathematical data. The data is arranged in a tabular manner with each row behaving as an instance of the data. A Pandas data frame is special because it is empowered with numerous functions making it a very powerful programming asset. Each column in a data frame represents a series of information which is labelled. In this article, we will operate on these columns and discuss the various methods to drop columns in a pandas data frame. ...

Read More

Drop a list of rows from a Pandas DataFrame

Devesh Chauhan
Devesh Chauhan
Updated on 05-May-2023 568 Views

The pandas library in python is widely popular for representing data in the form of tabular data structures. The dataset is arranged into a 2-D matrix consisting of rows and columns. Pandas library offers numerous functions that can help the programmer to analyze the dataset by providing valuable mathematical insights. The tabular data structure is known as a data frame that can be generated with the help of pandas DataFrame() function. In this article we will perform a simple operation of removing/dropping multiple rows from a pandas data frame. Firstly, we have to prepare a dataset and then ...

Read More

Plotting stock charts in excel sheet using xlsxwriter module in python

Devesh Chauhan
Devesh Chauhan
Updated on 09-Mar-2023 281 Views

Factors such as data analysis and growth rate monitoring are very important when it comes to plotting stock charts. For any business to flourish and expand, the right strategy is needed. These strategies are built on the back of a deep fundamental research. Python programming helps us to create and compare data which in turn can be used to study a business model. Python offers several methods and functions through which we can plot graphs, analyse growth and introspect the sudden changes. In this article we will be discussing about one such operation where we will plot a stock chart ...

Read More

Pos tagging and lammetization using spacy in python

Devesh Chauhan
Devesh Chauhan
Updated on 27-Feb-2023 1K+ Views

Python acts as an integral tool for understanding the concepts and application of machine learning and deep learning. It offers numerous libraries and modules that provides a magnificent platform for building useful techniques. In this article we will discuss about one such library known as “spaCy”. spaCy is an open-source library and is used to analyse and compare textual data. We will discuss about this library in detail but before we dive deep into the topic, let’s quickly go through the overview of this article and understand the itinerary. This article is divided into two sections − In ...

Read More

Ways to create a dictionary of lists in python

Devesh Chauhan
Devesh Chauhan
Updated on 27-Feb-2023 3K+ Views

A dictionary in python is a collection of data stored in the form of key value pair. We can assign different datatypes as the value for a key. It helps the coder to store data and categories genres and build databases accordingly. List on the other hand also stores data but here elements are not associated with multiple values. Both dictionaries and lists are indexed. In list we store data in the form of sequences and these sequences can be traversed and manipulated. In this article we will merge the two formats together and create a dictionary of ...

Read More
Showing 31–40 of 47 articles
Advertisements