Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Data analysis and Visualization with Python program
Data analysis and visualization are essential skills in Python programming. This tutorial covers the fundamentals using two powerful libraries: pandas for data manipulation and matplotlib for creating visualizations.
Installation
Install the required packages using pip −
pip install pandas matplotlib
Introduction to Pandas
Pandas is an open-source library that provides high-performance data analysis tools. It offers data structures like DataFrame and Series for handling structured data efficiently.
Creating DataFrames
A DataFrame is a two-dimensional data structure with labeled rows and columns. Here's how to create one from scratch ?
import pandas as pd # Creating individual records hafeez = ['Hafeez', 19] aslan = ['Aslan', 21] kareem = ['Kareem', 18] # Creating DataFrame with column names data_frame = pd.DataFrame([hafeez, aslan, kareem], columns=['Name', 'Age']) print(data_frame)
Name Age
0 Hafeez 19
1 Aslan 21
2 Kareem 18
Working with Sample Data
Let's create a sample dataset to demonstrate pandas functionality without external files ?
import pandas as pd
# Creating sample sales data
sales_data = {
'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
'Price': [999, 699, 399, 1099, 799],
'Quantity': [5, 8, 12, 3, 6],
'Year': [2020, 2020, 2021, 2021, 2022]
}
df = pd.DataFrame(sales_data)
print("Sample Data:")
print(df)
print(f"\nDataset shape: {df.shape}")
Sample Data: Product Price Quantity Year 0 Laptop 999 5 2020 1 Phone 699 8 2020 2 Tablet 399 12 2021 3 Laptop 1099 3 2021 4 Phone 799 6 2022 Dataset shape: (5, 4)
Data Analysis with Pandas
The describe() method provides statistical summary for numerical columns ?
import pandas as pd
sales_data = {
'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
'Price': [999, 699, 399, 1099, 799],
'Quantity': [5, 8, 12, 3, 6],
'Year': [2020, 2020, 2021, 2021, 2022]
}
df = pd.DataFrame(sales_data)
print("Statistical Summary:")
print(df.describe())
Statistical Summary:
Price Quantity Year
count 5.000000 5.000000 5.000000
mean 799.000000 6.800000 2020.800000
std 285.263481 3.563706 0.836660
min 399.000000 3.000000 2020.000000
25% 699.000000 5.000000 2020.000000
50% 799.000000 6.000000 2021.000000
75% 999.000000 8.000000 2021.000000
max 1099.000000 12.000000 2022.000000
Data Visualization with Matplotlib
Matplotlib is the primary plotting library in Python. It creates static, animated, and interactive visualizations.
Creating a Histogram
Histograms show the distribution of numerical data ?
import pandas as pd
import matplotlib.pyplot as plt
sales_data = {
'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
'Price': [999, 699, 399, 1099, 799],
'Quantity': [5, 8, 12, 3, 6],
'Year': [2020, 2020, 2021, 2021, 2022]
}
df = pd.DataFrame(sales_data)
# Create histogram
plt.figure(figsize=(8, 5))
df['Price'].hist(bins=3, alpha=0.7, color='skyblue')
plt.title('Price Distribution')
plt.xlabel('Price ($)')
plt.ylabel('Frequency')
plt.show()
Creating a Bar Chart
Bar charts are useful for comparing categorical data ?
import pandas as pd
import matplotlib.pyplot as plt
sales_data = {
'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
'Price': [999, 699, 399, 1099, 799],
'Quantity': [5, 8, 12, 3, 6],
'Year': [2020, 2020, 2021, 2021, 2022]
}
df = pd.DataFrame(sales_data)
# Group by product and sum quantities
product_sales = df.groupby('Product')['Quantity'].sum()
plt.figure(figsize=(8, 5))
product_sales.plot(kind='bar', color=['coral', 'lightgreen', 'gold'])
plt.title('Total Sales by Product')
plt.xlabel('Product')
plt.ylabel('Total Quantity Sold')
plt.xticks(rotation=45)
plt.show()
Key Pandas and Matplotlib Features
| Library | Primary Use | Key Methods |
|---|---|---|
| Pandas | Data manipulation |
DataFrame(), describe(), groupby()
|
| Matplotlib | Data visualization |
plot(), hist(), show()
|
Conclusion
Pandas and matplotlib form a powerful combination for data analysis in Python. Pandas handles data manipulation and analysis, while matplotlib creates compelling visualizations to communicate insights from your data.
