Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Singular Value Decomposition
Singular Value Decomposition (SVD) is a powerful mathematical technique used in machine learning to analyze large and complex datasets. It decomposes a matrix into three simpler matrices, making it easier to understand patterns and reduce dimensionality.
For any matrix A, SVD factorizes it as A = U?VT, where:
U contains the left singular vectors (eigenvectors of AAT)
? is a diagonal matrix of singular values (square roots of eigenvalues)
VT contains the right singular vectors (eigenvectors of ATA)
Mathematical Algorithm
The SVD computation follows these steps:
Given matrix A, compute ATA (transpose of A multiplied by A)
Find eigenvalues and eigenvectors of ATA using ATA ?I = 0
Calculate singular values as ? = ??, sorted in descending order
Right singular vectors V are the normalized eigenvectors of ATA
Left singular vectors U are computed as U = AV/?
Basic SVD Example
Let's implement SVD from scratch and compare with NumPy's builtin function:
import numpy as np
def manual_svd(A):
# Compute A^T * A
AtA = np.dot(A.T, A)
# Find eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eigh(AtA)
# Sort in descending order
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
# Calculate singular values
singular_values = np.sqrt(eigenvalues)
# Right singular vectors (V)
V = eigenvectors
# Left singular vectors (U)
U = np.dot(A, V) / singular_values
return U, singular_values, V.T
# Test matrix
A = np.array([[1, 2], [3, 4], [5, 6]])
# Manual SVD
U_manual, s_manual, Vt_manual = manual_svd(A)
print("Manual SVD:")
print("U shape:", U_manual.shape)
print("Singular values:", s_manual)
# NumPy SVD for comparison
U_numpy, s_numpy, Vt_numpy = np.linalg.svd(A, full_matrices=False)
print("\nNumPy SVD:")
print("U shape:", U_numpy.shape)
print("Singular values:", s_numpy)
Manual SVD: U shape: (3, 2) Singular values: [9.52551809 0.51430058] NumPy SVD: U shape: (3, 2) Singular values: [9.52551809 0.51430058]
Data Visualization with SVD
SVD is commonly used for dimensionality reduction. Here's an example using a simple dataset:
import numpy as np
import matplotlib.pyplot as plt
# Create sample 2D data
np.random.seed(42)
data = np.random.randn(100, 4) # 100 samples, 4 features
# Add some correlation between features
data[:, 1] = data[:, 0] + 0.5 * np.random.randn(100)
data[:, 2] = data[:, 0] - data[:, 1] + 0.3 * np.random.randn(100)
data[:, 3] = 2 * data[:, 2] + 0.2 * np.random.randn(100)
# Standardize the data
data_std = (data - data.mean(axis=0)) / data.std(axis=0)
# Perform SVD
U, s, Vt = np.linalg.svd(data_std, full_matrices=False)
# Project data onto first two principal components
data_reduced = U[:, :2]
print("Original data shape:", data.shape)
print("Reduced data shape:", data_reduced.shape)
print("Singular values:", s)
# Create visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
# Original data (first two features)
ax1.scatter(data_std[:, 0], data_std[:, 1], alpha=0.7)
ax1.set_title('Original Data (First 2 Features)')
ax1.set_xlabel('Feature 1')
ax1.set_ylabel('Feature 2')
ax1.grid(True, alpha=0.3)
# SVD-reduced data
ax2.scatter(data_reduced[:, 0], data_reduced[:, 1], alpha=0.7, color='red')
ax2.set_title('SVD-Reduced Data (2D Projection)')
ax2.set_xlabel('1st Principal Component')
ax2.set_ylabel('2nd Principal Component')
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Original data shape: (100, 4) Reduced data shape: (100, 2) Singular values: [1.96396101 1.42518095 0.97618506 0.56066851]
Matrix Reconstruction
SVD allows you to reconstruct the original matrix and approximate it using fewer components:
import numpy as np
# Original matrix
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
# Perform SVD
U, s, Vt = np.linalg.svd(A, full_matrices=False)
print("Original matrix A:")
print(A)
print("\nSingular values:", s)
# Reconstruct using all components
A_reconstructed = np.dot(U * s, Vt)
print("\nReconstructed matrix (all components):")
print(A_reconstructed)
# Low-rank approximation using only first 2 components
k = 2
A_approx = np.dot(U[:, :k] * s[:k], Vt[:k, :])
print(f"\nLow-rank approximation (k={k}):")
print(A_approx)
# Calculate reconstruction error
error = np.linalg.norm(A - A_approx, 'fro')
print(f"\nReconstruction error: {error:.4f}")
Original matrix A: [[ 1 2 3] [ 4 5 6] [ 7 8 9] [10 11 12]] Singular values: [2.54368356e+01 1.72964631e+00 9.18829162e-16] Reconstructed matrix (all components): [[ 1. 2. 3.] [ 4. 5. 6.] [ 7. 8. 9.] [10. 11. 12.]] Low-rank approximation (k=2): [[ 1. 2. 3.] [ 4. 5. 6.] [ 7. 8. 9.] [10. 11. 12.]] Reconstruction error: 0.0000
Key Applications
| Application | Use Case | Benefit |
|---|---|---|
| Dimensionality Reduction | PCA, Data Compression | Reduces storage and computation |
| Recommender Systems | Collaborative Filtering | Handles sparse data efficiently |
| Image Processing | Image Compression | Maintains quality with less data |
| Noise Reduction | Signal Processing | Separates signal from noise |
Conclusion
Singular Value Decomposition is a fundamental technique for matrix factorization that enables dimensionality reduction, data compression, and noise filtering. It decomposes any matrix into three components that reveal the underlying structure of the data, making it invaluable for machine learning and data analysis applications.
