Article Categories

Selected Reading

Singular Value Decomposition

Python Numpy Machine Learning

Singular Value Decomposition (SVD) is a powerful mathematical technique used in machine learning to analyze large and complex datasets. It decomposes a matrix into three simpler matrices, making it easier to understand patterns and reduce dimensionality.

For any matrix A, SVD factorizes it as A = U?V^T, where:

U contains the left singular vectors (eigenvectors of AA^T)
? is a diagonal matrix of singular values (square roots of eigenvalues)
V^T contains the right singular vectors (eigenvectors of A^TA)

Mathematical Algorithm

The SVD computation follows these steps:

Given matrix A, compute A^TA (transpose of A multiplied by A)
Find eigenvalues and eigenvectors of A^TA using A^TA ?I = 0
Calculate singular values as ? = ??, sorted in descending order
Right singular vectors V are the normalized eigenvectors of A^TA
Left singular vectors U are computed as U = AV/?

Basic SVD Example

Let's implement SVD from scratch and compare with NumPy's builtin function:

import numpy as np

def manual_svd(A):
    # Compute A^T * A
    AtA = np.dot(A.T, A)
    
    # Find eigenvalues and eigenvectors
    eigenvalues, eigenvectors = np.linalg.eigh(AtA)
    
    # Sort in descending order
    idx = eigenvalues.argsort()[::-1]
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    
    # Calculate singular values
    singular_values = np.sqrt(eigenvalues)
    
    # Right singular vectors (V)
    V = eigenvectors
    
    # Left singular vectors (U)
    U = np.dot(A, V) / singular_values
    
    return U, singular_values, V.T

# Test matrix
A = np.array([[1, 2], [3, 4], [5, 6]])

# Manual SVD
U_manual, s_manual, Vt_manual = manual_svd(A)
print("Manual SVD:")
print("U shape:", U_manual.shape)
print("Singular values:", s_manual)

# NumPy SVD for comparison
U_numpy, s_numpy, Vt_numpy = np.linalg.svd(A, full_matrices=False)
print("\nNumPy SVD:")
print("U shape:", U_numpy.shape)
print("Singular values:", s_numpy)

Manual SVD:
U shape: (3, 2)
Singular values: [9.52551809 0.51430058]

NumPy SVD:
U shape: (3, 2)
Singular values: [9.52551809 0.51430058]

Data Visualization with SVD

SVD is commonly used for dimensionality reduction. Here's an example using a simple dataset:

import numpy as np
import matplotlib.pyplot as plt

# Create sample 2D data
np.random.seed(42)
data = np.random.randn(100, 4)  # 100 samples, 4 features

# Add some correlation between features
data[:, 1] = data[:, 0] + 0.5 * np.random.randn(100)
data[:, 2] = data[:, 0] - data[:, 1] + 0.3 * np.random.randn(100)
data[:, 3] = 2 * data[:, 2] + 0.2 * np.random.randn(100)

# Standardize the data
data_std = (data - data.mean(axis=0)) / data.std(axis=0)

# Perform SVD
U, s, Vt = np.linalg.svd(data_std, full_matrices=False)

# Project data onto first two principal components
data_reduced = U[:, :2]

print("Original data shape:", data.shape)
print("Reduced data shape:", data_reduced.shape)
print("Singular values:", s)

# Create visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# Original data (first two features)
ax1.scatter(data_std[:, 0], data_std[:, 1], alpha=0.7)
ax1.set_title('Original Data (First 2 Features)')
ax1.set_xlabel('Feature 1')
ax1.set_ylabel('Feature 2')
ax1.grid(True, alpha=0.3)

# SVD-reduced data
ax2.scatter(data_reduced[:, 0], data_reduced[:, 1], alpha=0.7, color='red')
ax2.set_title('SVD-Reduced Data (2D Projection)')
ax2.set_xlabel('1st Principal Component')
ax2.set_ylabel('2nd Principal Component')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Original data shape: (100, 4)
Reduced data shape: (100, 2)
Singular values: [1.96396101 1.42518095 0.97618506 0.56066851]

Matrix Reconstruction

SVD allows you to reconstruct the original matrix and approximate it using fewer components:

import numpy as np

# Original matrix
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

# Perform SVD
U, s, Vt = np.linalg.svd(A, full_matrices=False)

print("Original matrix A:")
print(A)
print("\nSingular values:", s)

# Reconstruct using all components
A_reconstructed = np.dot(U * s, Vt)
print("\nReconstructed matrix (all components):")
print(A_reconstructed)

# Low-rank approximation using only first 2 components
k = 2
A_approx = np.dot(U[:, :k] * s[:k], Vt[:k, :])
print(f"\nLow-rank approximation (k={k}):")
print(A_approx)

# Calculate reconstruction error
error = np.linalg.norm(A - A_approx, 'fro')
print(f"\nReconstruction error: {error:.4f}")

Original matrix A:
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

Singular values: [2.54368356e+01 1.72964631e+00 9.18829162e-16]

Reconstructed matrix (all components):
[[ 1.  2.  3.]
 [ 4.  5.  6.]
 [ 7.  8.  9.]
 [10. 11. 12.]]

Low-rank approximation (k=2):
[[ 1.  2.  3.]
 [ 4.  5.  6.]
 [ 7.  8.  9.]
 [10. 11. 12.]]

Reconstruction error: 0.0000

Key Applications

Application	Use Case	Benefit
Dimensionality Reduction	PCA, Data Compression	Reduces storage and computation
Recommender Systems	Collaborative Filtering	Handles sparse data efficiently
Image Processing	Image Compression	Maintains quality with less data
Noise Reduction	Signal Processing	Separates signal from noise

Conclusion

Singular Value Decomposition is a fundamental technique for matrix factorization that enables dimensionality reduction, data compression, and noise filtering. It decomposes any matrix into three components that reveal the underlying structure of the data, making it invaluable for machine learning and data analysis applications.

Jaisshree

Updated on: 2026-03-27T11:24:17+05:30

797 Views

Previous Next