Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Functional Transforms for Computer Vision using PyTorch
Computer vision tasks often require preprocessing and augmentation of image data to improve model performance and generalization. PyTorch provides a powerful library for image transformations called torchvision.transforms. While predefined transforms cover many use cases, functional transforms offer greater flexibility for custom transformations using PyTorch tensors and functions.
Understanding Transforms in PyTorch
Transforms in PyTorch are operations that modify images or their properties. There are two main types: class transforms and functional transforms. Class transforms are implemented as classes with defined parameters, while functional transforms are implemented as functions that operate directly on input data.
Functional transforms offer more flexibility as they allow custom operations using PyTorch tensors and functions, making them ideal for complex or parameterized transformations.
Creating a Simple Functional Transform
Let's create a custom grayscale transform that converts RGB images to grayscale ?
import torch
def grayscale(img):
"""Converts an RGB image to grayscale.
Args:
img (Tensor): Input RGB image tensor of shape (C, H, W).
Returns:
Tensor: Grayscale image tensor of shape (1, H, W).
"""
if img.size(0) != 3:
raise ValueError("Input image must have 3 channels (RGB).")
# Apply grayscale transformation using mean across channels
grayscale_img = torch.mean(img, dim=0, keepdim=True)
return grayscale_img
# Example usage with a dummy RGB tensor
rgb_tensor = torch.rand(3, 100, 100) # Random RGB image
gray_result = grayscale(rgb_tensor)
print(f"Original shape: {rgb_tensor.shape}")
print(f"Grayscale shape: {gray_result.shape}")
Original shape: torch.Size([3, 100, 100]) Grayscale shape: torch.Size([1, 100, 100])
Working with Real Images
Here's how to apply functional transforms to actual images using torchvision.transforms.functional ?
from torchvision.transforms import functional as F
from PIL import Image
import torch
# Load and convert image (this requires an actual image file)
image = Image.open("sample_image.jpg")
tensor_image = F.to_tensor(image)
# Apply custom grayscale transform
grayscale_image = grayscale(tensor_image)
# Convert back to PIL image
grayscale_pil = F.to_pil_image(grayscale_image)
grayscale_pil.save("grayscale_output.jpg")
Creating a Transform Pipeline
Integrate custom transforms with predefined ones using Compose ?
import torch
from torchvision.transforms import Compose
# Define a simple resize function for demonstration
def simple_resize(img, size=224):
"""Simple resize using interpolation."""
return torch.nn.functional.interpolate(
img.unsqueeze(0), size=(size, size), mode='bilinear'
).squeeze(0)
# Create transform pipeline
def transform_pipeline(img):
# Resize to 224x224
img = simple_resize(img, 224)
# Convert to grayscale
img = grayscale(img)
return img
# Test with dummy data
test_image = torch.rand(3, 256, 256)
transformed = transform_pipeline(test_image)
print(f"Input: {test_image.shape}")
print(f"Output: {transformed.shape}")
Input: torch.Size([3, 256, 256]) Output: torch.Size([1, 224, 224])
Parameterized Custom Transforms
Create transforms that accept parameters for greater flexibility ?
import torch
def brightness_adjustment(img, factor):
"""Adjusts image brightness.
Args:
img (Tensor): Input image tensor.
factor (float): Brightness factor (> 1 brighter, < 1 darker).
Returns:
Tensor: Brightness-adjusted image tensor.
"""
return torch.clamp(img * factor, 0, 1)
def noise_injection(img, noise_level=0.1):
"""Adds random noise to image.
Args:
img (Tensor): Input image tensor.
noise_level (float): Standard deviation of noise.
Returns:
Tensor: Image with added noise.
"""
noise = torch.randn_like(img) * noise_level
return torch.clamp(img + noise, 0, 1)
# Example usage
sample_img = torch.rand(3, 64, 64)
# Make image brighter
bright_img = brightness_adjustment(sample_img, 1.5)
# Add noise
noisy_img = noise_injection(sample_img, 0.05)
print(f"Original range: [{sample_img.min():.3f}, {sample_img.max():.3f}]")
print(f"Bright range: [{bright_img.min():.3f}, {bright_img.max():.3f}]")
print(f"Noisy range: [{noisy_img.min():.3f}, {noisy_img.max():.3f}]")
Original range: [0.001, 0.999] Bright range: [0.001, 1.000] Noisy range: [0.000, 1.000]
Comparison of Transform Types
| Feature | Class Transforms | Functional Transforms |
|---|---|---|
| Flexibility | Predefined operations | Custom operations |
| Parameters | Fixed at initialization | Dynamic per call |
| Use Case | Standard augmentation | Complex custom logic |
| Implementation | Class-based | Function-based |
Practical Example: Multi-Transform Function
Combine multiple transformations in a single function ?
import torch
import random
def augment_image(img, augment_prob=0.5):
"""Apply random augmentations to an image.
Args:
img (Tensor): Input image tensor.
augment_prob (float): Probability of applying each augmentation.
Returns:
Tensor: Augmented image tensor.
"""
# Random brightness adjustment
if random.random() < augment_prob:
factor = random.uniform(0.7, 1.3)
img = brightness_adjustment(img, factor)
# Random noise injection
if random.random() < augment_prob:
noise_level = random.uniform(0.01, 0.05)
img = noise_injection(img, noise_level)
# Random grayscale conversion
if random.random() < augment_prob:
img = grayscale(img)
# Convert back to 3 channels
img = img.repeat(3, 1, 1)
return img
# Test the augmentation function
test_img = torch.rand(3, 100, 100)
augmented = augment_image(test_img, augment_prob=0.8)
print(f"Input shape: {test_img.shape}")
print(f"Augmented shape: {augmented.shape}")
Input shape: torch.Size([3, 100, 100]) Augmented shape: torch.Size([3, 100, 100])
Conclusion
Functional transforms in PyTorch provide powerful flexibility for custom image preprocessing and augmentation. They enable dynamic parameter control and complex custom operations that go beyond predefined transforms. By combining functional transforms with existing PyTorch tools, you can create sophisticated data pipelines tailored to specific computer vision tasks.
