Why do we use pandas in python?


Pandas has been one of the most commonly used tools for Data Science and Machine learning, which is used for data cleaning and analysis.

Here, Pandas is the best tool for handling this real-world messy data. And pandas is one of the open-source python packages built on top of NumPy.

Handling data using pandas is very fast and effective by using pandas Series and data frame, these two pandas data structures will help you to manipulate data in various ways.

Based on the features available in pandas we can say pandas is best for handling data. It can handle missing data, cleaning up the data and it supports multiple file formats. This means it can read or load data in many formats like CSV, Excel, SQL, etc.,

Let’s take an example and see how it’s gonna read CSV data.

Example

data = pd.read_csv('world-happiness-report.csv') 
print(data.shape) 
data.head()

Explanation

In the above code, variable data stores CSV data which is a world happiness report (downloaded from Kaggle datasets) by using the read_csv function available in the pandas package. data.shape is used to give you the columns and row count.

Output

      Country name year  Life Ladder   Log GDP per capita Social support \
0   Afghanistan    2008    3.724               7.370           0.451
1   Afghanistan    2009    4.402               7.540           0.552
2   Afghanistan    2010    4.758               7.647           0.539
3   Afghanistan    2011    3.832               7.620           0.521
4   Afghanistan    2012    3.783               7.705           0.521

Healthy life expectancy at birth   Freedom to make life choices   Generosity \
                           50.80                          0.718       0.168
                           51.20                          0.679       0.190
                           51.60                          0.600       0.121
                           51.92                          0.496       0.162
                           52.24                          0.531       0.236

Perceptions of corruption   Positive affect   Negative affect
                   0.882             0.518             0.258
                   0.850             0.584             0.237
                   0.707             0.618             0.275
                   0.731             0.611             0.267
                   0.776             0.710             0.268

The above block has the top 5 rows of data in the world’s happiness report data set that can be displayed by pandas dataframe.head() function.

There are many more features that help us to deal with large data for both machine learning data science operations. Which are merging and joining data sets, Visualization, grouping, masking, and also is very helpful for performing mathematical operations on our data sets.

Let’s take another example and see how to create an output file using pandas.

Example

file = data.to_json('output_file.json')

Explanation

Data.to_json is a pandas function that is used to create a JSON file based on our pandas dataframe object (data).

Output

The resultant JSON file will be created in our working directory with an extension of .json and the name of the file is output_file (for our above example).

These are some reasons why we need python pandas.

Updated on: 18-Nov-2021

4K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements