Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to find the standard deviation of specific columns in a dataframe in Pandas Python?
Standard deviation measures how spread out values are in a dataset and indicates how far individual values are from the arithmetic mean. In Pandas, you can calculate the standard deviation of specific columns using the std() function.
When working with DataFrames, you often need to find the standard deviation of particular numeric columns. The std() function can be applied to individual columns by indexing the DataFrame with the column name.
Example
Let's create a DataFrame and calculate the standard deviation of specific columns ?
import pandas as pd
my_data = {
'Name': pd.Series(['Tom', 'Jane', 'Vin', 'Eve', 'Will']),
'Age': pd.Series([45, 67, 89, 12, 23]),
'Value': pd.Series([8.79, 23.24, 31.98, 78.56, 90.20])
}
print("The dataframe is:")
my_df = pd.DataFrame(my_data)
print(my_df)
print("\nThe standard deviation of column 'Age' is:")
print(my_df['Age'].std())
print("\nThe standard deviation of column 'Value' is:")
print(my_df['Value'].std())
The dataframe is: Name Age Value 0 Tom 45 8.79 1 Jane 67 23.24 2 Vin 89 31.98 3 Eve 12 78.56 4 Will 23 90.20 The standard deviation of column 'Age' is: 31.499206339207976 The standard deviation of column 'Value' is: 35.747101700697364
Multiple Columns at Once
You can also calculate standard deviation for multiple columns simultaneously ?
import pandas as pd
my_data = {
'Name': pd.Series(['Tom', 'Jane', 'Vin', 'Eve', 'Will']),
'Age': pd.Series([45, 67, 89, 12, 23]),
'Value': pd.Series([8.79, 23.24, 31.98, 78.56, 90.20])
}
my_df = pd.DataFrame(my_data)
print("Standard deviation of numeric columns:")
print(my_df[['Age', 'Value']].std())
Standard deviation of numeric columns: Age 31.499206 Value 35.747102 dtype: float64
Using Column Index
You can also access columns by their integer index position ?
import pandas as pd
my_data = {
'Name': pd.Series(['Tom', 'Jane', 'Vin', 'Eve', 'Will']),
'Age': pd.Series([45, 67, 89, 12, 23]),
'Value': pd.Series([8.79, 23.24, 31.98, 78.56, 90.20])
}
my_df = pd.DataFrame(my_data)
print("Standard deviation using column index:")
print("Age column (index 1):", my_df.iloc[:, 1].std())
print("Value column (index 2):", my_df.iloc[:, 2].std())
Standard deviation using column index: Age column (index 1): 31.499206339207976 Value column (index 2): 35.747101700697364
Conclusion
Use df['column_name'].std() to find the standard deviation of a specific column. For multiple columns, use df[['col1', 'col2']].std() or access by index with df.iloc[:, index].std().
