Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
How to get the correlation between two columns in Pandas?
We can use the .corr() method to get the correlation between two columns in Pandas. The correlation coefficient measures the linear relationship between two variables, ranging from -1 to 1.
Basic Syntax
# Method 1: Using .corr() on a Series correlation = df['column1'].corr(df['column2']) # Method 2: Using .corr() on DataFrame to get correlation matrix correlation_matrix = df[['column1', 'column2']].corr()
Example
Let's create a DataFrame and calculate correlations between different columns ?
import pandas as pd
# Create sample DataFrame
df = pd.DataFrame({
"x": [5, 2, 7, 0],
"y": [4, 7, 5, 1],
"z": [9, 3, 5, 1]
})
print("Input DataFrame:")
print(df)
Input DataFrame: x y z 0 5 4 9 1 2 7 3 2 7 5 5 3 0 1 1
Finding Correlation Between Two Columns
import pandas as pd
df = pd.DataFrame({
"x": [5, 2, 7, 0],
"y": [4, 7, 5, 1],
"z": [9, 3, 5, 1]
})
# Correlation between x and y
corr_xy = df['x'].corr(df['y'])
print(f"Correlation between x and y: {corr_xy:.2f}")
# Correlation between x and z
corr_xz = df['x'].corr(df['z'])
print(f"Correlation between x and z: {corr_xz:.2f}")
# Self-correlation (always 1.0)
corr_xx = df['x'].corr(df['x'])
print(f"Correlation between x and x: {corr_xx:.2f}")
Correlation between x and y: 0.41 Correlation between x and z: 0.72 Correlation between x and x: 1.00
Getting Correlation Matrix
You can also get the correlation matrix for multiple columns at once ?
import pandas as pd
df = pd.DataFrame({
"x": [5, 2, 7, 0],
"y": [4, 7, 5, 1],
"z": [9, 3, 5, 1]
})
# Get correlation matrix for all columns
correlation_matrix = df.corr()
print("Correlation Matrix:")
print(correlation_matrix)
# Get correlation matrix for specific columns
specific_corr = df[['x', 'y']].corr()
print("\nCorrelation between x and y columns:")
print(specific_corr)
Correlation Matrix:
x y z
x 1.000000 0.409836 0.722071
y 0.409836 1.000000 0.075107
z 0.722071 0.075107 1.000000
Correlation between x and y columns:
x y
x 1.000000 0.409836
y 0.409836 1.000000
Understanding Correlation Values
| Correlation Value | Relationship | Meaning |
|---|---|---|
| 1.0 | Perfect Positive | Variables increase together |
| 0.0 | No Linear Relationship | No correlation |
| -1.0 | Perfect Negative | One increases as other decreases |
Conclusion
Use df['col1'].corr(df['col2']) to get correlation between two specific columns. Use df.corr() to get the complete correlation matrix for all numeric columns in your DataFrame.
Advertisements
