- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to get the correlation between two columns in Pandas?
We can use the .corr() method to get the correlation between two columns in Pandas. Let's take an example and see how to apply this method.
Steps
- Create a two-dimensional, size-mutable, potentially heterogeneous tabular data, df.
- Print the input DataFrame, df.
- Initialize two variables, col1 and col2, and assign them the columns that you want to find the correlation of.
- Find the correlation between col1 and col2 by using df[col1].corr(df[col2]) and save the correlation value in a variable, corr.
- Print the correlation value, corr.
Example
import pandas as pd df = pd.DataFrame( { "x": [5, 2, 7, 0], "y": [4, 7, 5, 1], "z": [9, 3, 5, 1] } ) print "Input DataFrame is:\n", df col1, col2 = "x", "y" corr = df[col1].corr(df[col2]) print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2) col1, col2 = "x", "x" corr = df[col1].corr(df[col2]) print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2) col1, col2 = "x", "z" corr = df[col1].corr(df[col2]) print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2) col1, col2 = "y", "x" corr = df[col1].corr(df[col2]) print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2)
Output
Input DataFrame is: x y z 0 5 4 9 1 2 7 3 2 7 5 5 3 0 1 1 Correlation between x and y is: 0.41 Correlation between x and x is: 1.0 Correlation between x and z is: 0.72 Correlation between y and x is: 0.41
Advertisements