Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Types Of Activation Functions in ANN
This article explores Artificial Neural Networks (ANN) and the various activation functions that enable them to learn complex patterns. We'll examine how different activation functions transform inputs and their specific use cases in neural network architectures.
What is an Artificial Neural Network (ANN)?
An Artificial Neural Network (ANN) is a machine learning model inspired by the human brain's structure. It consists of interconnected nodes (neurons) that process and transmit information through weighted connections. These networks learn by adjusting weights during training to produce desired outputs.
ANN Architecture: Three Essential Layers
Input Layer
The input layer receives raw data and feeds it into the network. The number of neurons equals the number of input features in your dataset.
Hidden Layer
Hidden layers perform the actual computation and feature extraction. Multiple hidden layers create "deep" neural networks that can learn complex patterns.
Output Layer
The output layer produces the final predictions. Its structure depends on the problem type single neuron for binary classification, multiple neurons for multi-class problems.
Types of Activation Functions
Activation functions introduce non-linearity into neural networks, enabling them to learn complex patterns. Here are the most commonly used activation functions ?
Sigmoid Activation Function
The sigmoid function maps any real number to a value between 0 and 1, making it ideal for binary classification problems. However, it suffers from vanishing gradient problems in deep networks.
import numpy as np
import matplotlib.pyplot as plt
def sigmoid_activation(x):
return 1 / (1 + np.exp(-x))
# Generate input values
x = np.linspace(-10, 10, 100)
y = sigmoid_activation(x)
# Plot the function
plt.figure(figsize=(8, 5))
plt.plot(x, y, 'b-', linewidth=2, label='Sigmoid')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Sigmoid Activation Function')
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()
# Show some example values
print("Sigmoid examples:")
for val in [-2, 0, 2]:
print(f"sigmoid({val}) = {sigmoid_activation(val):.3f}")
Sigmoid examples: sigmoid(-2) = 0.119 sigmoid(0) = 0.500 sigmoid(2) = 0.881
ReLU (Rectified Linear Unit)
The ReLU function is the most popular activation function in modern deep learning. It outputs zero for negative inputs and the input value for positive inputs, solving the vanishing gradient problem.
import numpy as np
import matplotlib.pyplot as plt
def relu_activation(x):
return np.maximum(0, x)
# Generate input values
x = np.linspace(-5, 5, 100)
y = relu_activation(x)
# Plot the function
plt.figure(figsize=(8, 5))
plt.plot(x, y, 'r-', linewidth=2, label='ReLU')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('ReLU Activation Function')
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()
# Show example values
print("ReLU examples:")
for val in [-2, 0, 3]:
print(f"ReLU({val}) = {relu_activation(val):.3f}")
ReLU examples: ReLU(-2) = 0.000 ReLU(0) = 0.000 ReLU(3) = 3.000
Tanh (Hyperbolic Tangent)
The tanh function maps inputs to values between -1 and 1, making it zero-centered unlike sigmoid. It's commonly used in hidden layers of neural networks.
import numpy as np
import matplotlib.pyplot as plt
def tanh_activation(x):
return np.tanh(x)
# Generate input values
x = np.linspace(-5, 5, 100)
y = tanh_activation(x)
# Plot the function
plt.figure(figsize=(8, 5))
plt.plot(x, y, 'g-', linewidth=2, label='Tanh')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Tanh Activation Function')
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()
# Show example values
print("Tanh examples:")
for val in [-2, 0, 2]:
print(f"tanh({val}) = {tanh_activation(val):.3f}")
Tanh examples: tanh(-2) = -0.964 tanh(0) = 0.000 tanh(2) = 0.964
Leaky ReLU
Leaky ReLU addresses the "dying ReLU" problem by allowing small negative values instead of zero, helping gradients flow during backpropagation.
import numpy as np
import matplotlib.pyplot as plt
def leaky_relu_activation(x, alpha=0.01):
return np.where(x >= 0, x, alpha * x)
# Generate input values
x = np.linspace(-5, 5, 100)
y = leaky_relu_activation(x)
# Plot the function
plt.figure(figsize=(8, 5))
plt.plot(x, y, 'purple', linewidth=2, label='Leaky ReLU (?=0.01)')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Leaky ReLU Activation Function')
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()
# Show example values
print("Leaky ReLU examples:")
for val in [-2, 0, 3]:
print(f"Leaky ReLU({val}) = {leaky_relu_activation(val):.3f}")
Leaky ReLU examples: Leaky ReLU(-2) = -0.020 Leaky ReLU(0) = 0.000 Leaky ReLU(3) = 3.000
Softmax Activation Function
The softmax function converts raw scores into probability distributions, ensuring all outputs sum to 1. It's essential for multi-class classification problems.
import numpy as np
import matplotlib.pyplot as plt
def softmax_activation(x):
exp_x = np.exp(x - np.max(x)) # Subtract max for numerical stability
return exp_x / np.sum(exp_x)
# Example with class scores
class_scores = np.array([2.0, 1.0, 0.1])
probabilities = softmax_activation(class_scores)
# Visualize probabilities
plt.figure(figsize=(8, 5))
classes = ['Class A', 'Class B', 'Class C']
plt.bar(classes, probabilities, color=['skyblue', 'lightcoral', 'lightgreen'])
plt.ylabel('Probability')
plt.title('Softmax Output Probabilities')
plt.ylim(0, 1)
for i, prob in enumerate(probabilities):
plt.text(i, prob + 0.02, f'{prob:.3f}', ha='center')
plt.show()
print("Softmax example:")
print(f"Input scores: {class_scores}")
print(f"Probabilities: {probabilities}")
print(f"Sum of probabilities: {np.sum(probabilities):.3f}")
Softmax example: Input scores: [2. 1. 0.1] Probabilities: [0.659 0.242 0.099] Sum of probabilities: 1.000
Activation Function Comparison
| Function | Range | Best Use Case | Advantages | Disadvantages |
|---|---|---|---|---|
| Sigmoid | (0, 1) | Binary classification output | Smooth, probabilistic | Vanishing gradients |
| ReLU | [0, ?) | Hidden layers | Fast, simple, avoids vanishing gradients | Dying ReLU problem |
| Tanh | (-1, 1) | Hidden layers | Zero-centered, smooth | Vanishing gradients |
| Leaky ReLU | (-?, ?) | Hidden layers | Prevents dying neurons | Extra hyperparameter |
| Softmax | (0, 1) | Multi-class output | Probability distribution | Only for classification |
Conclusion
Activation functions are crucial for neural network performance. Choose ReLU for hidden layers, sigmoid for binary classification, and softmax for multi-class problems. Understanding each function's characteristics helps you build more effective neural network architectures.
---