- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies

- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who

# How to Create a Correlation Matrix using Pandas?

Correlation analysis is a crucial technique in data analysis, helping to identify relationships between variables in a dataset. A correlation matrix is a table showing the correlation coefficients between variables in a dataset. It is a powerful tool that provides valuable insights into the underlying patterns in the data and is widely used in many fields, including finance, economics, social sciences, and engineering.

In this tutorial, we will explore how to create a correlation matrix using Pandas, a popular data manipulation library in Python.

To generate a correlation matrix with pandas, the following steps must be followed −

Acquire the data

Construct a pandas DataFrame

Produce a correlation matrix using pandas

## Example

Now let's work on different examples to understand how we can create correlation matrices using pandas.

This code demonstrates how to use the pandas library in Python to create a correlation matrix from a given dataset. The dataset contains three variables: Sales, Expenses, and Profit for three different time periods. The code creates a pandas DataFrame using the data and then uses the DataFrame to create a correlation matrix.

The correlation coefficients between Sales and Expenses and Sales and Profit are then extracted and displayed along with the correlation matrix. The correlation coefficients indicate the degree of correlation between two variables, with a value of "1" representing perfect positive correlation, "-1" representing perfect negative correlation, and "0" indicating no correlation.

Consider the code shown below.

# Import the pandas library import pandas as pd # Create a dictionary containing the data to be used in the correlation analysis data = { 'Sales': [25, 36, 12], # Values for sales in three different time periods 'Expenses': [30, 25, 20], # Values for expenses in the same time periods 'Profit': [15, 20, 10] # Values for profit in the same time periods } # Create a pandas DataFrame using the dictionary sales_data = pd.DataFrame(data) # Use the DataFrame to create a correlation matrix correlation_matrix = sales_data.corr() # Display the correlation matrix print("Correlation Matrix:") print(correlation_matrix) # Get the correlation coefficient between Sales and Expenses sales_expenses_correlation = correlation_matrix.loc['Sales', 'Expenses'] # Get the correlation coefficient between Sales and Profit sales_profit_correlation = correlation_matrix.loc['Sales', 'Profit'] # Display the correlation coefficients print("Correlation Coefficients:") print(f"Sales and Expenses: {sales_expenses_correlation:.2f}") print(f"Sales and Profit: {sales_profit_correlation:.2f}")

## Output

On execution, you will get the following output −

Correlation Matrix: Sales Expenses Profit Sales 1.000000 0.541041 0.998845 Expenses 0.541041 1.000000 0.500000 Profit 0.998845 0.500000 1.000000 Correlation Coefficients: Sales and Expenses: 0.54 Sales and Profit: 1.00

The values on the diagonal represent the correlation between a variable and itself, therefore the diagonal values indicate a correlation of 1.

## Example

Let's explore one more example. Consider the code shown below.

In this example, we create a simple DataFrame with three columns and three rows. We then use the .corr() method on the DataFrame to calculate the correlation matrix, and finally print the correlation matrix to the console.

# Import the pandas library import pandas as pd # Create a sample data frame data = { 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9] } df = pd.DataFrame(data) # Create the correlation matrix corr_matrix = df.corr() # Display the correlation matrix print(corr_matrix)

## Output

On execution, you will get the following output −

A B C A 1.0 1.0 1.0 B 1.0 1.0 1.0 C 1.0 1.0 1.0

## Conclusion

In conclusion, creating a correlation matrix using pandas in Python is a straightforward process. First, a pandas DataFrame is created with the desired data, and then the **.corr()** method is used to calculate the correlation matrix. The resulting correlation matrix provides valuable insights into the relationships between the different variables, with the diagonal values indicating the correlation of each variable with itself.

The correlation coefficients range from -1 to 1, where values closer to -1 or 1 indicate stronger correlation, while values closer to 0 indicate weaker or no correlation. Correlation matrices are useful in a wide range of applications, such as data analysis, finance, and machine learning.