Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Python - Calculate the standard deviation of a column in a Pandas DataFrame
Standard deviation measures how spread out values are from the mean. In Pandas, you can calculate the standard deviation of a DataFrame column using the std() method.
Syntax
To calculate standard deviation of a specific column ?
dataframe['column_name'].std()
Creating Sample DataFrames
First, let's create sample DataFrames with numerical data ?
import pandas as pd
# Create DataFrame1 with car sales data
dataFrame1 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Tesla', 'Bentley', 'Jaguar'],
"Units": [100, 150, 110, 80, 110, 90]
})
print("DataFrame1:")
print(dataFrame1)
DataFrame1:
Car Units
0 BMW 100
1 Lexus 150
2 Audi 110
3 Tesla 80
4 Bentley 110
5 Jaguar 90
Calculating Standard Deviation
Use the std() method to calculate standard deviation of the "Units" column ?
import pandas as pd
dataFrame1 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Tesla', 'Bentley', 'Jaguar'],
"Units": [100, 150, 110, 80, 110, 90]
})
# Calculate standard deviation of Units column
std_units = dataFrame1['Units'].std()
print("Standard Deviation of Units column:", std_units)
Standard Deviation of Units column: 24.22120283277228
Example with Multiple Columns
You can calculate standard deviation for different columns in separate DataFrames ?
import pandas as pd
# DataFrame1 - Car sales
dataFrame1 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Tesla', 'Bentley', 'Jaguar'],
"Units": [100, 150, 110, 80, 110, 90]
})
# DataFrame2 - Product prices
dataFrame2 = pd.DataFrame({
"Product": ['TV', 'PenDrive', 'HeadPhone', 'EarPhone', 'HDD', 'SSD'],
"Price": [8000, 500, 3000, 1500, 3000, 4000]
})
print("DataFrame1:")
print(dataFrame1)
print("\nStandard Deviation of Units column:", dataFrame1['Units'].std())
print("\nDataFrame2:")
print(dataFrame2)
print("\nStandard Deviation of Price column:", dataFrame2['Price'].std())
DataFrame1:
Car Units
0 BMW 100
1 Lexus 150
2 Audi 110
3 Tesla 80
4 Bentley 110
5 Jaguar 90
Standard Deviation of Units column: 24.22120283277228
DataFrame2:
Product Price
0 TV 8000
1 PenDrive 500
2 HeadPhone 3000
3 EarPhone 1500
4 HDD 3000
5 SSD 4000
Standard Deviation of Price column: 2601.281735352477
Parameters
The std() method accepts optional parameters ?
import pandas as pd
data = pd.DataFrame({
"Values": [10, 20, 30, 40, 50]
})
# Default: sample standard deviation (ddof=1)
print("Sample std (ddof=1):", data['Values'].std())
# Population standard deviation (ddof=0)
print("Population std (ddof=0):", data['Values'].std(ddof=0))
Sample std (ddof=1): 15.811388300841898 Population std (ddof=0): 14.142135623730951
Conclusion
Use dataframe['column'].std() to calculate standard deviation of a Pandas DataFrame column. By default, it calculates sample standard deviation with ddof=1. Set ddof=0 for population standard deviation.
Advertisements
