- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Find the Geometric mean of a given Pandas Dataframe.
Pandas Dataframe, a python open-source library, is used for storing, deleting, modifying, and updating data in tabular form. It is designed so that I can easily integrate with Python programs for data analysis. It provides various ways of data manipulation techniques and tools for processing data.
The mathematical notion of geometric means is a highly useful concept for the determination of average or central tendencies within a given set of numerical data. This is achieved by multiplying each individual number present within the data set, resulting in an nth root. The value of n, in turn, is dictated by the total number of values within the data group.
Syntax
Syntax to create DataFrame
df = pandas.DataFrame(data, index, columns)
“pandas.dataframe” creates empty dataframe object
“data” where we store data. It can be list or dictionary
“index ” and “column” are optional which specifies row and column labels
Approach 1 - Using NumPy
The following program illustrates finding the geometric mean of a given data frame using Numpy:
Algorithm
Step 1 - Import a Pandas and Numpy modules
Step 2 - Create a Pandas Dataframe to store array values
Step 3 - Use a Numpy function in a variable called geometric_mean to find the average.
Step 4 - Print the output
Example
import pandas as pd import numpy as np # create a sample dataframe df = pd.DataFrame({ 'A': [2, 4, 6, 8], 'B': [1, 3, 5, 7] }) # calculate the geometric mean for each column geometric_mean = np.exp(np.log(df).mean()) # display the result print("Geometric mean for each column:\n", geometric_mean)
Output
Geometric mean for each column: A 4.426728 B 3.201086
Approach 2 - Using a custom function
The following program illustrates a custom function called 'geometric_mean' which accepts a Pandas DataFrame as input and calculates the geometric mean of all the values in the data frame by utilizing a loop.
Algorithm
Step 1 - Importing Pandas library
Step 2 - Creating DataFrame and storing values.
Step 3 - Defining custom function
Step 4 - Creating a new variable “gm” to call the function.
Step 5 - Printing output by calling “gm”.
Example
import pandas as pd # create sample dataframe df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}) # define a function to calculate geometric mean def geometric_mean(data): product = 1 for val in data.to_numpy().ravel(): product *= val return product**(1.0/len(data.to_numpy().ravel())) # calculate geometric mean of dataframe using custom function gm = geometric_mean(df) print(gm)
Output
3.764350599503129
Approach 3 - Using Scipy Library
Scipy is a Python library that provides powerful scientific computing capabilities, enabling you to work with numerical algorithms, optimization, and statistical analysis.
The following code computes the geometric means of a Pandas data frame by using the gmean() function from the scipy.stats module.
Algorithm
Step 1 - Importing Pandas, Numpy, Scipy library
Step 2 - Creating DataFrame “df”
Step 3 - Using numpy and scipy functions.
Step 4 - Printing output.
Example
import numpy as np from scipy.stats import gmean import pandas as pd # create sample dataframe df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}) # calculate geometric mean of dataframe using Scipy gm = gmean(df.to_numpy().ravel()) print(gm)
Output
3.764350599503128
Conclusion
The geometric mean, a powerful mathematical formula commonly employed to calculate the average of a series of numbers multiplied together, can be an incredibly valuable tool for data analysis in Pandas Dataframes. This formula is especially useful in scenarios with multiple columns to analyze, as it can be used with speed and precision to uncover the average across each column. By leveraging the geometric mean, you can unlock insights and patterns in your data that might otherwise go unnoticed, enabling you to make informed decisions and take targeted actions based on the information.