Types Of Activation Functions in ANN

This article explores Artificial Neural Networks (ANN) and the various activation functions that enable them to learn complex patterns. We'll examine how different activation functions transform inputs and their specific use cases in neural network architectures.

What is an Artificial Neural Network (ANN)?

An Artificial Neural Network (ANN) is a machine learning model inspired by the human brain's structure. It consists of interconnected nodes (neurons) that process and transmit information through weighted connections. These networks learn by adjusting weights during training to produce desired outputs.

ANN Architecture: Three Essential Layers

Input Layer

The input layer receives raw data and feeds it into the network. The number of neurons equals the number of input features in your dataset.

Hidden Layer

Hidden layers perform the actual computation and feature extraction. Multiple hidden layers create "deep" neural networks that can learn complex patterns.

Output Layer

The output layer produces the final predictions. Its structure depends on the problem type single neuron for binary classification, multiple neurons for multi-class problems.

Input Layer Hidden Layer Output Layer X? X? X? X? Y? Y?

Types of Activation Functions

Activation functions introduce non-linearity into neural networks, enabling them to learn complex patterns. Here are the most commonly used activation functions ?

Sigmoid Activation Function

The sigmoid function maps any real number to a value between 0 and 1, making it ideal for binary classification problems. However, it suffers from vanishing gradient problems in deep networks.

import numpy as np
import matplotlib.pyplot as plt

def sigmoid_activation(x):
    return 1 / (1 + np.exp(-x))

# Generate input values
x = np.linspace(-10, 10, 100)
y = sigmoid_activation(x)

# Plot the function
plt.figure(figsize=(8, 5))
plt.plot(x, y, 'b-', linewidth=2, label='Sigmoid')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Sigmoid Activation Function')
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()

# Show some example values
print("Sigmoid examples:")
for val in [-2, 0, 2]:
    print(f"sigmoid({val}) = {sigmoid_activation(val):.3f}")
Sigmoid examples:
sigmoid(-2) = 0.119
sigmoid(0) = 0.500
sigmoid(2) = 0.881

ReLU (Rectified Linear Unit)

The ReLU function is the most popular activation function in modern deep learning. It outputs zero for negative inputs and the input value for positive inputs, solving the vanishing gradient problem.

import numpy as np
import matplotlib.pyplot as plt

def relu_activation(x):
    return np.maximum(0, x)

# Generate input values
x = np.linspace(-5, 5, 100)
y = relu_activation(x)

# Plot the function
plt.figure(figsize=(8, 5))
plt.plot(x, y, 'r-', linewidth=2, label='ReLU')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('ReLU Activation Function')
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()

# Show example values
print("ReLU examples:")
for val in [-2, 0, 3]:
    print(f"ReLU({val}) = {relu_activation(val):.3f}")
ReLU examples:
ReLU(-2) = 0.000
ReLU(0) = 0.000
ReLU(3) = 3.000

Tanh (Hyperbolic Tangent)

The tanh function maps inputs to values between -1 and 1, making it zero-centered unlike sigmoid. It's commonly used in hidden layers of neural networks.

import numpy as np
import matplotlib.pyplot as plt

def tanh_activation(x):
    return np.tanh(x)

# Generate input values
x = np.linspace(-5, 5, 100)
y = tanh_activation(x)

# Plot the function
plt.figure(figsize=(8, 5))
plt.plot(x, y, 'g-', linewidth=2, label='Tanh')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Tanh Activation Function')
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()

# Show example values
print("Tanh examples:")
for val in [-2, 0, 2]:
    print(f"tanh({val}) = {tanh_activation(val):.3f}")
Tanh examples:
tanh(-2) = -0.964
tanh(0) = 0.000
tanh(2) = 0.964

Leaky ReLU

Leaky ReLU addresses the "dying ReLU" problem by allowing small negative values instead of zero, helping gradients flow during backpropagation.

import numpy as np
import matplotlib.pyplot as plt

def leaky_relu_activation(x, alpha=0.01):
    return np.where(x >= 0, x, alpha * x)

# Generate input values
x = np.linspace(-5, 5, 100)
y = leaky_relu_activation(x)

# Plot the function
plt.figure(figsize=(8, 5))
plt.plot(x, y, 'purple', linewidth=2, label='Leaky ReLU (?=0.01)')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Leaky ReLU Activation Function')
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()

# Show example values
print("Leaky ReLU examples:")
for val in [-2, 0, 3]:
    print(f"Leaky ReLU({val}) = {leaky_relu_activation(val):.3f}")
Leaky ReLU examples:
Leaky ReLU(-2) = -0.020
Leaky ReLU(0) = 0.000
Leaky ReLU(3) = 3.000

Softmax Activation Function

The softmax function converts raw scores into probability distributions, ensuring all outputs sum to 1. It's essential for multi-class classification problems.

import numpy as np
import matplotlib.pyplot as plt

def softmax_activation(x):
    exp_x = np.exp(x - np.max(x))  # Subtract max for numerical stability
    return exp_x / np.sum(exp_x)

# Example with class scores
class_scores = np.array([2.0, 1.0, 0.1])
probabilities = softmax_activation(class_scores)

# Visualize probabilities
plt.figure(figsize=(8, 5))
classes = ['Class A', 'Class B', 'Class C']
plt.bar(classes, probabilities, color=['skyblue', 'lightcoral', 'lightgreen'])
plt.ylabel('Probability')
plt.title('Softmax Output Probabilities')
plt.ylim(0, 1)
for i, prob in enumerate(probabilities):
    plt.text(i, prob + 0.02, f'{prob:.3f}', ha='center')
plt.show()

print("Softmax example:")
print(f"Input scores: {class_scores}")
print(f"Probabilities: {probabilities}")
print(f"Sum of probabilities: {np.sum(probabilities):.3f}")
Softmax example:
Input scores: [2.  1.  0.1]
Probabilities: [0.659 0.242 0.099]
Sum of probabilities: 1.000

Activation Function Comparison

Function Range Best Use Case Advantages Disadvantages
Sigmoid (0, 1) Binary classification output Smooth, probabilistic Vanishing gradients
ReLU [0, ?) Hidden layers Fast, simple, avoids vanishing gradients Dying ReLU problem
Tanh (-1, 1) Hidden layers Zero-centered, smooth Vanishing gradients
Leaky ReLU (-?, ?) Hidden layers Prevents dying neurons Extra hyperparameter
Softmax (0, 1) Multi-class output Probability distribution Only for classification

Conclusion

Activation functions are crucial for neural network performance. Choose ReLU for hidden layers, sigmoid for binary classification, and softmax for multi-class problems. Understanding each function's characteristics helps you build more effective neural network architectures.

---
Updated on: 2026-03-27T14:52:22+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements