Activation Functions in Pytorch

PyTorch is an open-source machine learning framework that provides various activation functions for building neural networks. An activation function determines the output of a node in a neural network given an input, introducing non-linearity which is essential for solving complex machine learning problems.

What is an Activation Function?

Neural networks consist of input layers, hidden layers, and output layers. The activation function is applied to the weighted sum of inputs at each node, transforming the linear combination into a non-linear output. This non-linearity enables neural networks to learn complex patterns and relationships in data.

Input Activation Output Activation Function Process

Types of Activation Functions in PyTorch

PyTorch provides several built-in activation functions through torch.nn module ?

  • ReLU Rectified Linear Unit

  • Leaky ReLU Modified ReLU with small slope for negatives

  • Sigmoid S-shaped curve between 0 and 1

  • Tanh Hyperbolic tangent between -1 and 1

  • Softmax Probability distribution for multi-class

ReLU Activation Function

The Rectified Linear Unit (ReLU) is defined as f(x) = max(0, x). It outputs zero for negative inputs and passes positive inputs unchanged. ReLU is computationally efficient and helps mitigate vanishing gradient problems.

Example

Here's how to implement and use ReLU activation ?

import torch
import torch.nn as nn
import numpy as np

# Using PyTorch's built-in ReLU
relu = nn.ReLU()
x = torch.tensor([-1.0, 2.0, -3.0, 4.0, 0.0])
y = relu(x)
print("PyTorch ReLU:", y)

# Custom ReLU implementation
def custom_relu(x):
    return np.maximum(0, x)

x_np = np.array([-1, 2, -3, 4, 0])
y_custom = custom_relu(x_np)
print("Custom ReLU:", y_custom)
PyTorch ReLU: tensor([0., 2., 0., 4., 0.])
Custom ReLU: [0 2 0 4 0]

Leaky ReLU Activation Function

Leaky ReLU solves the "dying ReLU" problem by allowing small negative values. It's defined as f(x) = max(?x, x) where ? is typically 0.01.

Example

import torch
import torch.nn as nn

# Using PyTorch's Leaky ReLU
leaky_relu = nn.LeakyReLU(negative_slope=0.1)
x = torch.tensor([-1.0, 2.0, -3.0, 4.0, 0.0])
y = leaky_relu(x)
print("Leaky ReLU:", y)
Leaky ReLU: tensor([-0.1000,  2.0000, -0.3000,  4.0000,  0.0000])

Sigmoid Activation Function

The sigmoid function maps any input to values between 0 and 1, making it useful for binary classification. It's defined as f(x) = 1/(1+e^(-x)).

Example

import torch
import torch.nn as nn

# Using PyTorch's Sigmoid
sigmoid = nn.Sigmoid()
x = torch.tensor([-1.0, 2.0, -3.0, 4.0, 0.0])
y = sigmoid(x)
print("Sigmoid:", y)
Sigmoid: tensor([0.2689, 0.8808, 0.0474, 0.9820, 0.5000])

Tanh Activation Function

The hyperbolic tangent function outputs values between -1 and 1. It's defined as f(x) = (e^x - e^(-x))/(e^x + e^(-x)) and is zero-centered, making it preferred over sigmoid in hidden layers.

Example

import torch
import torch.nn as nn

# Using PyTorch's Tanh
tanh = nn.Tanh()
x = torch.tensor([-1.0, 2.0, -3.0, 4.0, 0.0])
y = tanh(x)
print("Tanh:", y)
Tanh: tensor([-0.7616,  0.9640, -0.9951,  0.9993,  0.0000])

Softmax Activation Function

Softmax converts a vector of values into a probability distribution, commonly used in multi-class classification output layers. Each output represents the probability of belonging to a specific class.

Example

import torch
import torch.nn as nn

# Using PyTorch's Softmax
softmax = nn.Softmax(dim=0)
x = torch.tensor([1.0, 2.0, 3.0, 4.0, 5.0])
y = softmax(x)
print("Softmax:", y)
print("Sum of probabilities:", torch.sum(y))
Softmax: tensor([0.0117, 0.0317, 0.0861, 0.2341, 0.6364])
Sum of probabilities: tensor(1.0000)

Comparison

Function Range Use Case Advantages
ReLU [0, ?) Hidden layers Fast, avoids vanishing gradients
Leaky ReLU (-?, ?) Hidden layers Solves dying ReLU problem
Sigmoid (0, 1) Binary classification Outputs probabilities
Tanh (-1, 1) Hidden layers Zero-centered output
Softmax (0, 1) Multi-class output Probability distribution

Conclusion

PyTorch provides a comprehensive set of activation functions, each suited for specific tasks. ReLU variants work well for hidden layers, while Sigmoid and Softmax are ideal for classification outputs. Choose the activation function based on your network architecture and problem requirements.

Updated on: 2026-03-27T01:04:55+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements