Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
What are the different types of Python data analysis libraries used?
Python has established itself as the leading language for data science, consistently ranking first in industry surveys. Its success comes from combining an easy-to-learn, object-oriented syntax with specialized libraries for every data science task from mathematical computations to data visualization.
Core Data Science Libraries
NumPy
NumPy (Numerical Python) forms the foundation of Python's data science ecosystem. It provides efficient arrays and mathematical functions for numerical computing ?
import numpy as np
# Creating arrays and basic operations
data = np.array([1, 2, 3, 4, 5])
print("Array:", data)
print("Mean:", np.mean(data))
print("Standard deviation:", np.std(data))
Array: [1 2 3 4 5] Mean: 3.0 Standard deviation: 1.4142135623730951
Pandas
Pandas provides powerful data structures like DataFrames for data manipulation and analysis ?
import pandas as pd
# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Salary': [50000, 60000, 70000]}
df = pd.DataFrame(data)
print(df)
print("\nBasic statistics:")
print(df.describe())
Name Age Salary
0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 70000
Basic statistics:
Age Salary
count 3.000000 3.000000
mean 30.000000 60000.000000
std 5.000000 10000.000000
min 25.000000 50000.000000
25% 27.500000 55000.000000
50% 30.000000 60000.000000
75% 32.500000 65000.000000
max 35.000000 70000.000000
Visualization Libraries
Matplotlib
Matplotlib is the foundational plotting library, offering extensive customization for creating publication-quality visualizations ?
import matplotlib.pyplot as plt
import numpy as np
# Creating a simple line plot
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.figure(figsize=(8, 4))
plt.plot(x, y, label='sin(x)')
plt.xlabel('X values')
plt.ylabel('Y values')
plt.title('Sine Wave')
plt.legend()
plt.grid(True)
plt.show()
Plotly
Plotly creates interactive, web-ready visualizations with minimal code. It's excellent for dashboards and exploratory data analysis ?
import plotly.express as px
import pandas as pd
# Interactive scatter plot
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length",
color="species", title="Iris Dataset")
fig.show()
Machine Learning Libraries
Scikit-learn
Scikit-learn provides comprehensive machine learning algorithms with consistent APIs ?
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Generate sample data
X, y = make_classification(n_samples=1000, n_features=4, n_classes=2, random_state=42)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train model
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)
# Make predictions
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
Accuracy: 0.9150
Deep Learning Libraries
TensorFlow and Keras
Keras provides a high-level interface for building neural networks, while TensorFlow handles the computational backend ?
import tensorflow as tf
from tensorflow import keras
# Simple neural network
model = keras.Sequential([
keras.layers.Dense(64, activation='relu', input_shape=(784,)),
keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
PyTorch
PyTorch offers dynamic computation graphs and is popular in research for its flexibility ?
import torch
import torch.nn as nn
# Define a simple neural network
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(784, 64)
self.fc2 = nn.Linear(64, 10)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = self.fc2(x)
return x
model = Net()
Specialized Libraries
SciPy
SciPy extends NumPy with advanced scientific computing functions including statistics, optimization, and signal processing ?
from scipy import stats
import numpy as np
# Statistical analysis
data = np.random.normal(100, 15, 1000)
mean, std = stats.norm.fit(data)
print(f"Fitted mean: {mean:.2f}")
print(f"Fitted std: {std:.2f}")
# Perform t-test
t_stat, p_value = stats.ttest_1samp(data, 100)
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")
Fitted mean: 99.85 Fitted std: 15.03 T-statistic: -0.3139 P-value: 0.7537
BeautifulSoup
BeautifulSoup excels at web scraping and parsing HTML/XML documents for data extraction ?
from bs4 import BeautifulSoup
import requests
# Parse HTML content
html = "<html><body><h1>Title</h1><p>Content</p></body></html>"
soup = BeautifulSoup(html, 'html.parser')
title = soup.find('h1').text
content = soup.find('p').text
print(f"Title: {title}")
print(f"Content: {content}")
Library Comparison
| Library | Primary Use | Best For |
|---|---|---|
| NumPy | Numerical computing | Array operations, mathematical functions |
| Pandas | Data manipulation | Data cleaning, analysis, CSV/Excel handling |
| Matplotlib | Static plotting | Publication-quality charts |
| Plotly | Interactive visualization | Dashboards, web applications |
| Scikit-learn | Machine learning | Traditional ML algorithms |
| TensorFlow/Keras | Deep learning | Production neural networks |
| PyTorch | Deep learning research | Dynamic neural networks |
Conclusion
Python's data science ecosystem offers specialized libraries for every task, from NumPy's numerical computing to PyTorch's deep learning capabilities. The key is choosing the right combination of libraries based on your project requirements and learning the core ones like NumPy, Pandas, and Matplotlib first.
