How to combine Groupby and Multiple Aggregate Functions in Pandas?


The groupby() and aggregate() are the two functions available in the pandas library.

The groupby() function

The groupby() function allows you to group a DataFrame by one or more columns. It internally performs a combination of operations such as splitting the object, applying a function, and combining the results, on the dataframe object.

This function returns DataFrameGroupBy object which contains information about the groups. Once we obtain this object we can perform various operations such as calculating the mean, calculating the sum and average etc…

Syntax

Following is the syntax of the groupby() function –

DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)

The aggregate() function

The aggregate() function is used to apply one or more operations (functions) over a particular axis on the entire DataFrame or on specific columns in the DataFrame. These mathematical operations are used to summarize and compute the statistics for the given set of data. agg is the alias of the aggregate() function.

Syntax

The following is the syntax of groupby -

DataFrame.aggregate(func=None, axis=0, *args, **kwargs)

Where,

  • func − This is the function or, list of functions used to calculate the aggregate values.

  • axis − This value specifies the axis (column: 0 or 'index' and row − 1 or 'columns').

  • *args− Positional arguments to pass to func

  • **kwargs − Keyword arguments to pass to func

Combining Groupby and Multiple Aggregate Functions

We can perform multiple aggregate functions such as sum, mean, min max etc., on the results of a Groupby clause using the aggregate() or agg() function as shown below –

pandas.groupby(column_name).agg(column)

Example

In the following example, we use groupby function in pandas to group a Dataframe by the column name Fruits and perform the aggregate operation mean on two different columns 'Dozens' and 'Cost'. This will return the combined output of groupby and aggregate functions.

import pandas as pd
data = {'Fruits': ['Papaya','Apple','Banana','Grapes','Orange','Watermelon'],
        'Dozens': [25, 30, 35, 27, 32, 37],
        'Cost': [500, 6000, 270, 5500, 6500, 7500]}
df = pd.DataFrame(data)
grouped_df = df.groupby('Fruits').agg({'Dozens': ['mean'], 'Cost': ['mean']})
print(grouped_df)

Output

Following is the output of the combination of groupby and aggregate functions.

  	   Dozens    Cost
             mean    mean
Fruits                   
Apple        30.0  6000.0
Banana       35.0   270.0
Grapes       27.0  5500.0
Orange       32.0  6500.0
Papaya       25.0   500.0
Watermelon   37.0  7500.0

Example

In the following example we are considering the same dataset and grouping it by the same column (fruits) and calculating the "min", "max", "sum", "count", "mean" values of the column ‘cost’ using the agg() function –

import pandas as pd
data = {'Fruits': ['Papaya','Apple','Banana','Grapes','Orange','Watermelon'],
   'Dozens': [25, 30, 35, 27, 32, 37],
   'Cost': [500, 6000, 270, 5500, 6500, 7500]}
df = pd.DataFrame(data)
grouped_df = df.groupby('Fruits').Cost.agg(["min", "max", "sum", "count", "mean"])
print(grouped_df)

Output

When we run the above code, following output will be displayed -

              min   max   sum  count    mean
Fruits
Apple       6000  6000  6000      1  6000.0
Banana       270   270   270      1   270.0
Grapes      5500  5500  5500      1  5500.0
Orange      6500  6500  6500      1  6500.0
Papaya       500   500   500      1   500.0
Watermelon  7500  7500  7500      1  7500.0

Example

Let’s see another example for combining groupby and aggregate functions. Here we are considering the dataset “titanic.csv” dataset and performing the operations "mean", "max", "sum" on the columns 'Survived' and 'Pclass' grouped by the column 'fare'

import pandas as pd
data = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")
data_groupby = data.groupby('Fare')
data_agg = data_groupby.agg({'Survived':["mean", "max", "sum"],'Pclass':["min", "max", "sum"]})
print(data_agg)

Output

When we run the above code, following output will be generated -

          Survived         Pclass
              mean max sum    min max sum
Fare
0.0000    0.066667   1   1      1   3  29
4.0125    0.000000   0   0      3   3   3
5.0000    0.000000   0   0      1   1   1
6.2375    0.000000   0   0      3   3   3
6.4375    0.000000   0   0      3   3   3
...            ...  ..  ..    ...  ..  ..
227.5250  0.750000   1   3      1   1   4
247.5208  0.500000   1   1      1   1   2
262.3750  1.000000   1   2      1   1   2
263.0000  0.500000   1   2      1   1   4
512.3292  1.000000   1   3      1   1   3

[248 rows x 6 columns]

Updated on: 09-Aug-2023

230 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements