Python – Aggregate values by tuple keys


Introduction

In the current world, handling data is the most challenging task for organizations with a high volume of data and with the development of data science and machine learning it has become easier to access. And the Python language plays a vital role in dealing with this data, as the data present can be relevant or irrelevant to each other. When they have some relevancy, it can be stored combinable with the other data or simply the aggregating of the data. In this, it combines the elements with similar characteristics and attributes they belong to. To do this process, there are some inbuilt functions and libraries that need to be used.

Aggregate values by tuple keys

Tuple is a data structure consisting of elements that are interchangeable after the initialization. The tuples are usually assigned a value and return the statement according to the user's perspective.

Syntax

reduce()

The collection module in the python has many subclasses like β€œdefaultdict()” and reduce() method. The reduce() method always uses two arguments and then reduces them to a single value.

Approach

Approach 1 βˆ’ Using defaultdict() method

Approach 2 βˆ’ Using group() method

Approach 1: Python code to Aggregate value using defaultdict() method

Defaultdict class is used for aggregating the values using the dictionary method under the collection library of Python language. The products are listed along with their respective expiry date and the cost price of products. The dictionary data structure is defined as an integer variable and it creates a dictionary with keys as tuples of product, day_str and then these values are appended with the cost of the product to the key tuple.

Algorithm

  • Step 1 βˆ’ The input string is declared as Item_expiry which contains a set of string.

  • Step 2 βˆ’ The required libraries to aggregate the values by tuple keys is defaultdict.

  • Step 3 βˆ’ The for loop is used to iterate through every elements of the tuple.

  • Step 4 βˆ’ The output is printed by appending the item name, the expiry day of each item and the cost of each item.

Example

# initializing the Item_expiry in a list of values
Item_expiry = [
   ('Milk', 30),
   ('Tomato', 100),
   ('lentils', 345),
   ('Milk', 320)
]
#importing the defaultdict function from collections module
from collections import defaultdict
#creating the dictionary defaultdict of float data type and storing in sums_by_product_days
sums_by_product_days = defaultdict(float)
#Using for loop to iterate through different values of Item_expiry list and adding the cost value to the existing key sums_by_product_days
for product, cost in Item_expiry:
   sums_by_product_days[(product)] += cost
#Returns the values of newly created dictionary   
print(sums_by_product_days)

Output

defaultdict(<class 'float'>, {'Milk': 350.0, 'Tomato': 100.0, 'lentils': 345.0})

Approach 2: Python code to Aggregate value using groupby() method

The pandas library is imported and the products are listed along with their respective expiry date and the cost price of products. The product and the expiry dates are grouped using the groupby() function and the key is a sum which is added using the sum method. Finally, the products along with fields are returned with the help of a print statement.

Algorithm

  • Step 1 βˆ’ The input string is declared as Item_expiry which contains a set of strings.

  • Step 2 βˆ’ The required libraries to aggregate the values by tuple keys are pandas.

  • Step 3 βˆ’ The output is printed by appending the item name, the expiry day of each item, and the cost of each item.

Example

#importing the pandas module
import pandas as pd
# initializing the DataFrame in a list of values with product name, expiry date, and cost
df = pd.DataFrame({
   'product': ['Milk', 'Tomato', 'Lentils', 'Tomato'],
   'expiry': ['1 day', '3 day', '6 months', '3 day'],
   'cost': [30, 100, 345, 50]
})
# Using the groupby function to combine the above dataframes by product and expiry and adding the costs
sums_by_product_days = df.groupby(['product', 'expiry'])['cost'].sum()
#Returns the values as list of elements
print(sums_by_product_days)

Output

product  expiry  
Lentils  6 months    345
Milk     1 day        30
Tomato   3 day       150
Name: cost, dtype: int64

Conclusion

In the Python language, to indicate that you have declared a tuple is done using the brackets β€œ()”. The elements within these brackets can be defined with the elements to initialize as tuples. The advantages of tuples are it follows some specific order in which the elements are defined.

Updated on: 25-Aug-2023

104 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements