Singular Value Decomposition

Singular Value Decomposition (SVD) is a powerful mathematical technique used in machine learning to analyze large and complex datasets. It decomposes a matrix into three simpler matrices, making it easier to understand patterns and reduce dimensionality.

For any matrix A, SVD factorizes it as A = U?VT, where:

  • U contains the left singular vectors (eigenvectors of AAT)

  • ? is a diagonal matrix of singular values (square roots of eigenvalues)

  • VT contains the right singular vectors (eigenvectors of ATA)

Mathematical Algorithm

The SVD computation follows these steps:

  1. Given matrix A, compute ATA (transpose of A multiplied by A)

  2. Find eigenvalues and eigenvectors of ATA using ATA ?I = 0

  3. Calculate singular values as ? = ??, sorted in descending order

  4. Right singular vectors V are the normalized eigenvectors of ATA

  5. Left singular vectors U are computed as U = AV/?

Basic SVD Example

Let's implement SVD from scratch and compare with NumPy's builtin function:

import numpy as np

def manual_svd(A):
    # Compute A^T * A
    AtA = np.dot(A.T, A)
    
    # Find eigenvalues and eigenvectors
    eigenvalues, eigenvectors = np.linalg.eigh(AtA)
    
    # Sort in descending order
    idx = eigenvalues.argsort()[::-1]
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    
    # Calculate singular values
    singular_values = np.sqrt(eigenvalues)
    
    # Right singular vectors (V)
    V = eigenvectors
    
    # Left singular vectors (U)
    U = np.dot(A, V) / singular_values
    
    return U, singular_values, V.T

# Test matrix
A = np.array([[1, 2], [3, 4], [5, 6]])

# Manual SVD
U_manual, s_manual, Vt_manual = manual_svd(A)
print("Manual SVD:")
print("U shape:", U_manual.shape)
print("Singular values:", s_manual)

# NumPy SVD for comparison
U_numpy, s_numpy, Vt_numpy = np.linalg.svd(A, full_matrices=False)
print("\nNumPy SVD:")
print("U shape:", U_numpy.shape)
print("Singular values:", s_numpy)
Manual SVD:
U shape: (3, 2)
Singular values: [9.52551809 0.51430058]

NumPy SVD:
U shape: (3, 2)
Singular values: [9.52551809 0.51430058]

Data Visualization with SVD

SVD is commonly used for dimensionality reduction. Here's an example using a simple dataset:

import numpy as np
import matplotlib.pyplot as plt

# Create sample 2D data
np.random.seed(42)
data = np.random.randn(100, 4)  # 100 samples, 4 features

# Add some correlation between features
data[:, 1] = data[:, 0] + 0.5 * np.random.randn(100)
data[:, 2] = data[:, 0] - data[:, 1] + 0.3 * np.random.randn(100)
data[:, 3] = 2 * data[:, 2] + 0.2 * np.random.randn(100)

# Standardize the data
data_std = (data - data.mean(axis=0)) / data.std(axis=0)

# Perform SVD
U, s, Vt = np.linalg.svd(data_std, full_matrices=False)

# Project data onto first two principal components
data_reduced = U[:, :2]

print("Original data shape:", data.shape)
print("Reduced data shape:", data_reduced.shape)
print("Singular values:", s)

# Create visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# Original data (first two features)
ax1.scatter(data_std[:, 0], data_std[:, 1], alpha=0.7)
ax1.set_title('Original Data (First 2 Features)')
ax1.set_xlabel('Feature 1')
ax1.set_ylabel('Feature 2')
ax1.grid(True, alpha=0.3)

# SVD-reduced data
ax2.scatter(data_reduced[:, 0], data_reduced[:, 1], alpha=0.7, color='red')
ax2.set_title('SVD-Reduced Data (2D Projection)')
ax2.set_xlabel('1st Principal Component')
ax2.set_ylabel('2nd Principal Component')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
Original data shape: (100, 4)
Reduced data shape: (100, 2)
Singular values: [1.96396101 1.42518095 0.97618506 0.56066851]

Matrix Reconstruction

SVD allows you to reconstruct the original matrix and approximate it using fewer components:

import numpy as np

# Original matrix
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

# Perform SVD
U, s, Vt = np.linalg.svd(A, full_matrices=False)

print("Original matrix A:")
print(A)
print("\nSingular values:", s)

# Reconstruct using all components
A_reconstructed = np.dot(U * s, Vt)
print("\nReconstructed matrix (all components):")
print(A_reconstructed)

# Low-rank approximation using only first 2 components
k = 2
A_approx = np.dot(U[:, :k] * s[:k], Vt[:k, :])
print(f"\nLow-rank approximation (k={k}):")
print(A_approx)

# Calculate reconstruction error
error = np.linalg.norm(A - A_approx, 'fro')
print(f"\nReconstruction error: {error:.4f}")
Original matrix A:
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

Singular values: [2.54368356e+01 1.72964631e+00 9.18829162e-16]

Reconstructed matrix (all components):
[[ 1.  2.  3.]
 [ 4.  5.  6.]
 [ 7.  8.  9.]
 [10. 11. 12.]]

Low-rank approximation (k=2):
[[ 1.  2.  3.]
 [ 4.  5.  6.]
 [ 7.  8.  9.]
 [10. 11. 12.]]

Reconstruction error: 0.0000

Key Applications

Application Use Case Benefit
Dimensionality Reduction PCA, Data Compression Reduces storage and computation
Recommender Systems Collaborative Filtering Handles sparse data efficiently
Image Processing Image Compression Maintains quality with less data
Noise Reduction Signal Processing Separates signal from noise

Conclusion

Singular Value Decomposition is a fundamental technique for matrix factorization that enables dimensionality reduction, data compression, and noise filtering. It decomposes any matrix into three components that reveal the underlying structure of the data, making it invaluable for machine learning and data analysis applications.

Updated on: 2026-03-27T11:24:17+05:30

658 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements