Functional Transforms for Computer Vision using PyTorch

Computer vision tasks often require preprocessing and augmentation of image data to improve model performance and generalization. PyTorch provides a powerful library for image transformations called torchvision.transforms. While predefined transforms cover many use cases, functional transforms offer greater flexibility for custom transformations using PyTorch tensors and functions.

Understanding Transforms in PyTorch

Transforms in PyTorch are operations that modify images or their properties. There are two main types: class transforms and functional transforms. Class transforms are implemented as classes with defined parameters, while functional transforms are implemented as functions that operate directly on input data.

Functional transforms offer more flexibility as they allow custom operations using PyTorch tensors and functions, making them ideal for complex or parameterized transformations.

Creating a Simple Functional Transform

Let's create a custom grayscale transform that converts RGB images to grayscale ?

import torch

def grayscale(img):
    """Converts an RGB image to grayscale.
    
    Args:
        img (Tensor): Input RGB image tensor of shape (C, H, W).
        
    Returns:
        Tensor: Grayscale image tensor of shape (1, H, W).
    """
    if img.size(0) != 3:
        raise ValueError("Input image must have 3 channels (RGB).")
        
    # Apply grayscale transformation using mean across channels
    grayscale_img = torch.mean(img, dim=0, keepdim=True)
    
    return grayscale_img

# Example usage with a dummy RGB tensor
rgb_tensor = torch.rand(3, 100, 100)  # Random RGB image
gray_result = grayscale(rgb_tensor)
print(f"Original shape: {rgb_tensor.shape}")
print(f"Grayscale shape: {gray_result.shape}")
Original shape: torch.Size([3, 100, 100])
Grayscale shape: torch.Size([1, 100, 100])

Working with Real Images

Here's how to apply functional transforms to actual images using torchvision.transforms.functional ?

from torchvision.transforms import functional as F
from PIL import Image
import torch

# Load and convert image (this requires an actual image file)
image = Image.open("sample_image.jpg")
tensor_image = F.to_tensor(image)

# Apply custom grayscale transform
grayscale_image = grayscale(tensor_image)

# Convert back to PIL image
grayscale_pil = F.to_pil_image(grayscale_image)
grayscale_pil.save("grayscale_output.jpg")

Creating a Transform Pipeline

Integrate custom transforms with predefined ones using Compose ?

import torch
from torchvision.transforms import Compose

# Define a simple resize function for demonstration
def simple_resize(img, size=224):
    """Simple resize using interpolation."""
    return torch.nn.functional.interpolate(
        img.unsqueeze(0), size=(size, size), mode='bilinear'
    ).squeeze(0)

# Create transform pipeline
def transform_pipeline(img):
    # Resize to 224x224
    img = simple_resize(img, 224)
    # Convert to grayscale
    img = grayscale(img)
    return img

# Test with dummy data
test_image = torch.rand(3, 256, 256)
transformed = transform_pipeline(test_image)
print(f"Input: {test_image.shape}")
print(f"Output: {transformed.shape}")
Input: torch.Size([3, 256, 256])
Output: torch.Size([1, 224, 224])

Parameterized Custom Transforms

Create transforms that accept parameters for greater flexibility ?

import torch

def brightness_adjustment(img, factor):
    """Adjusts image brightness.
    
    Args:
        img (Tensor): Input image tensor.
        factor (float): Brightness factor (> 1 brighter, < 1 darker).
        
    Returns:
        Tensor: Brightness-adjusted image tensor.
    """
    return torch.clamp(img * factor, 0, 1)

def noise_injection(img, noise_level=0.1):
    """Adds random noise to image.
    
    Args:
        img (Tensor): Input image tensor.
        noise_level (float): Standard deviation of noise.
        
    Returns:
        Tensor: Image with added noise.
    """
    noise = torch.randn_like(img) * noise_level
    return torch.clamp(img + noise, 0, 1)

# Example usage
sample_img = torch.rand(3, 64, 64)

# Make image brighter
bright_img = brightness_adjustment(sample_img, 1.5)

# Add noise
noisy_img = noise_injection(sample_img, 0.05)

print(f"Original range: [{sample_img.min():.3f}, {sample_img.max():.3f}]")
print(f"Bright range: [{bright_img.min():.3f}, {bright_img.max():.3f}]")
print(f"Noisy range: [{noisy_img.min():.3f}, {noisy_img.max():.3f}]")
Original range: [0.001, 0.999]
Bright range: [0.001, 1.000]
Noisy range: [0.000, 1.000]

Comparison of Transform Types

Feature Class Transforms Functional Transforms
Flexibility Predefined operations Custom operations
Parameters Fixed at initialization Dynamic per call
Use Case Standard augmentation Complex custom logic
Implementation Class-based Function-based

Practical Example: Multi-Transform Function

Combine multiple transformations in a single function ?

import torch
import random

def augment_image(img, augment_prob=0.5):
    """Apply random augmentations to an image.
    
    Args:
        img (Tensor): Input image tensor.
        augment_prob (float): Probability of applying each augmentation.
        
    Returns:
        Tensor: Augmented image tensor.
    """
    # Random brightness adjustment
    if random.random() < augment_prob:
        factor = random.uniform(0.7, 1.3)
        img = brightness_adjustment(img, factor)
    
    # Random noise injection
    if random.random() < augment_prob:
        noise_level = random.uniform(0.01, 0.05)
        img = noise_injection(img, noise_level)
    
    # Random grayscale conversion
    if random.random() < augment_prob:
        img = grayscale(img)
        # Convert back to 3 channels
        img = img.repeat(3, 1, 1)
    
    return img

# Test the augmentation function
test_img = torch.rand(3, 100, 100)
augmented = augment_image(test_img, augment_prob=0.8)
print(f"Input shape: {test_img.shape}")
print(f"Augmented shape: {augmented.shape}")
Input shape: torch.Size([3, 100, 100])
Augmented shape: torch.Size([3, 100, 100])

Conclusion

Functional transforms in PyTorch provide powerful flexibility for custom image preprocessing and augmentation. They enable dynamic parameter control and complex custom operations that go beyond predefined transforms. By combining functional transforms with existing PyTorch tools, you can create sophisticated data pipelines tailored to specific computer vision tasks.

Updated on: 2026-03-27T12:26:11+05:30

376 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements