Data analysis and Visualization with Python program

Data analysis and visualization are essential skills in Python programming. This tutorial covers the fundamentals using two powerful libraries: pandas for data manipulation and matplotlib for creating visualizations.

Installation

Install the required packages using pip −

pip install pandas matplotlib

Introduction to Pandas

Pandas is an open-source library that provides high-performance data analysis tools. It offers data structures like DataFrame and Series for handling structured data efficiently.

Creating DataFrames

A DataFrame is a two-dimensional data structure with labeled rows and columns. Here's how to create one from scratch ?

import pandas as pd

# Creating individual records
hafeez = ['Hafeez', 19]
aslan = ['Aslan', 21]
kareem = ['Kareem', 18]

# Creating DataFrame with column names
data_frame = pd.DataFrame([hafeez, aslan, kareem], columns=['Name', 'Age'])
print(data_frame)
     Name  Age
0  Hafeez   19
1   Aslan   21
2  Kareem   18

Working with Sample Data

Let's create a sample dataset to demonstrate pandas functionality without external files ?

import pandas as pd

# Creating sample sales data
sales_data = {
    'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
    'Price': [999, 699, 399, 1099, 799],
    'Quantity': [5, 8, 12, 3, 6],
    'Year': [2020, 2020, 2021, 2021, 2022]
}

df = pd.DataFrame(sales_data)
print("Sample Data:")
print(df)
print(f"\nDataset shape: {df.shape}")
Sample Data:
  Product  Price  Quantity  Year
0  Laptop    999         5  2020
1   Phone    699         8  2020
2  Tablet    399        12  2021
3  Laptop   1099         3  2021
4   Phone    799         6  2022

Dataset shape: (5, 4)

Data Analysis with Pandas

The describe() method provides statistical summary for numerical columns ?

import pandas as pd

sales_data = {
    'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
    'Price': [999, 699, 399, 1099, 799],
    'Quantity': [5, 8, 12, 3, 6],
    'Year': [2020, 2020, 2021, 2021, 2022]
}

df = pd.DataFrame(sales_data)
print("Statistical Summary:")
print(df.describe())
Statistical Summary:
            Price   Quantity         Year
count    5.000000   5.000000     5.000000
mean   799.000000   6.800000  2020.800000
std    285.263481   3.563706     0.836660
min    399.000000   3.000000  2020.000000
25%    699.000000   5.000000  2020.000000
50%    799.000000   6.000000  2021.000000
75%    999.000000   8.000000  2021.000000
max   1099.000000  12.000000  2022.000000

Data Visualization with Matplotlib

Matplotlib is the primary plotting library in Python. It creates static, animated, and interactive visualizations.

Creating a Histogram

Histograms show the distribution of numerical data ?

import pandas as pd
import matplotlib.pyplot as plt

sales_data = {
    'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
    'Price': [999, 699, 399, 1099, 799],
    'Quantity': [5, 8, 12, 3, 6],
    'Year': [2020, 2020, 2021, 2021, 2022]
}

df = pd.DataFrame(sales_data)

# Create histogram
plt.figure(figsize=(8, 5))
df['Price'].hist(bins=3, alpha=0.7, color='skyblue')
plt.title('Price Distribution')
plt.xlabel('Price ($)')
plt.ylabel('Frequency')
plt.show()

Creating a Bar Chart

Bar charts are useful for comparing categorical data ?

import pandas as pd
import matplotlib.pyplot as plt

sales_data = {
    'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
    'Price': [999, 699, 399, 1099, 799],
    'Quantity': [5, 8, 12, 3, 6],
    'Year': [2020, 2020, 2021, 2021, 2022]
}

df = pd.DataFrame(sales_data)

# Group by product and sum quantities
product_sales = df.groupby('Product')['Quantity'].sum()

plt.figure(figsize=(8, 5))
product_sales.plot(kind='bar', color=['coral', 'lightgreen', 'gold'])
plt.title('Total Sales by Product')
plt.xlabel('Product')
plt.ylabel('Total Quantity Sold')
plt.xticks(rotation=45)
plt.show()

Key Pandas and Matplotlib Features

Library Primary Use Key Methods
Pandas Data manipulation DataFrame(), describe(), groupby()
Matplotlib Data visualization plot(), hist(), show()

Conclusion

Pandas and matplotlib form a powerful combination for data analysis in Python. Pandas handles data manipulation and analysis, while matplotlib creates compelling visualizations to communicate insights from your data.

Updated on: 2026-03-25T06:40:19+05:30

463 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements