Chainer - Forward & Backward Propagation



Forward Propagation in Chainer

Forward propagation in Chainer refers to the process of passing input data through the layers of a neural network to compute the output. As we know Chainer is being a flexible deep learning framework, it allows dynamic computation graphs which means the graph is built on-the-fly as the data moves forward through the network.

During forward propagation each layer of the network applies a set of operations such as matrix multiplication, activation functions, etc. to the input data which progressively transforming it until the final output is produced. This output could be a prediction in tasks such as classification or regression.

In Chainer the forward propagation is typically handled by calling the model with the input data as an argument and the computation graph is constructed dynamically as this happens.

Steps involved in Forward Propagation

Forward propagation is a fundamental process in neural networks where input data is passed through the network layers to produce an output. The process involves applying a series of mathematical operations, typically involving matrix multiplications and activation functions to transform the input into the desired output. Here are the detailed steps involved in Forwad propagation −

  • Input Layer: The process starts by feeding raw data into the network. Each input feature is assigned a weight that influences how it affects the next layers.
  • Weighted Sum (Linear Transformation): For each layer the network computes a weighted sum of the inputs which is calculated as
    z = W . x + b
    

    where z is the weighted sum, W is the weight matrix, x is the input vector and b is the bias vector.

  • Activation Function: The weighted sum z is passed through an activation function to introduce non-linearity into the model. Common functions such as ReLU (Rectified Linear Unit), Sigmoid and Tanh. For example, if we are using ReLU then the applying activation function will be as follows −
    a = ReLU(z)
    

    where a is result which is the transformed output of the activation function.

  • Propagation Through Layers: The output from each layer serves as the input for the next layer. This process is iteratively applied across all hidden layers progressively refining the data representation.
  • Output Layer: The final layer produces the network's prediction. The choice of activation function here depends on the task as mentioned below −
    • Classification: Softmax is used to generate class probabilities.
    • Regression: A linear function is used to output continuous values.
  • Final Output: The output from the network is used to make predictions or decisions. During training this output is compared to the actual target values to compute the error which is used to update the weights through backpropagation.

Example

Here's an example of forward propagation in Chainer using a simple neural network. This network consists of an input layer, one hidden layer and an output layer. The below code shows how to perform forward propagation and obtain the network's output −

import chainer
import chainer.functions as F
import chainer.links as L
import numpy as np
from chainer import Variable

# Define the neural network model
class SimpleNN(chainer.Chain):
   def __init__(self):
      super(SimpleNN, self).__init__()
      with self.init_scope():
         self.l1 = L.Linear(3, 5)  # Input layer to hidden layer
         self.l2 = L.Linear(5, 2)  # Hidden layer to output layer

   def forward(self, x):
      # Compute the hidden layer output
      h = self.l1(x)
      print("Hidden layer (before activation):", h.data)
      
      # Apply ReLU activation function
      h = F.relu(h)
      print("Hidden layer (after ReLU):", h.data)
      
      # Compute the output layer
      y = self.l2(h)
      print("Output layer (before activation):", y.data)
      
      return y

# Create the model instance
model = SimpleNN()

# Prepare the input data
x = Variable(np.array([[1, 2, 3]], dtype=np.float32))  # Single sample with 3 features

# Perform forward propagation
output = model.forward(x)

# Display the final output
print("Final Output:", output.data)

Following is the output of the Forward Propagation −

Hidden layer (before activation): [[-3.2060928  -0.2460978   2.527906   -0.91410434  0.11754721]]
Hidden layer (after ReLU): [[0.       0.       2.527906   0.       0.11754721]]
Output layer (before activation): [[ 1.6746329  -0.21084023]]
Final Output: [[ 1.6746329  -0.21084023]]

Backward Propagation in Chainer

Backward propagation is a method used to compute the gradients of the loss function with respect to the parameters of a neural network. This process is essential for training the network by adjusting the weights to reduce the loss.

Steps in Backward Propagation

The Backward Propagation process consists of several key steps and each step is crucial for refining the model's parameters and enhancing its performance. Let's see them one by one in detail −

  • Forward Pass: Input data is fed through the network by producing predictions. These predictions are then compared to the true targets using a loss function to calculate the prediction error.
  • Loss Calculation: The loss function measures the discrepancy between predicted values and actual targets by providing a scalar value that reflects the model's performance.
  • Backward Pass: The gradients of the loss function with respect to each network parameter are computed using the chain rule. This involves propagating the gradients backward through the network from the output layer to the input layer.
  • Parameter Update: The computed gradients are used to adjust the network's parameters such as weights and biases. This adjustment is typically performed by an optimizer such as SGD or Adam which updates the parameters to minimize the loss function.

Example

Following is the example which shows how backward propagation works by printing the loss function in Chainer Framework −

import chainer
import chainer.functions as F
import chainer.links as L
from chainer import Chain, optimizers
import numpy as np

# Define a simple neural network
class MLP(Chain):
   def __init__(self):
      super(MLP, self).__init__()
      with self.init_scope():
         self.l1 = L.Linear(2, 3)  # Input layer to hidden layer
         self.l2 = L.Linear(3, 1)  # Hidden layer to output layer

   def forward(self, x):
      h = F.relu(self.l1(x))  # Forward pass through hidden layer
      y = self.l2(h)  # Forward pass through output layer
      return y

# Create a model and an optimizer
model = MLP()
optimizer = optimizers.SGD()
optimizer.setup(model)

# Sample input and target data
x = chainer.Variable(np.array([[1.0, 2.0]], dtype=np.float32))
t = chainer.Variable(np.array([[1.0]], dtype=np.float32))

# Forward pass
y = model.forward(x)
loss = F.mean_squared_error(y, t)  # Compute loss

# Backward pass
model.cleargrads()  # Clear previous gradients
loss.backward()  # Compute gradients
optimizer.update()  # Update parameters using the optimizer

print("Loss:", loss.data)

Below is the example which prints the loss function output of backward propagation −

Loss: 1.0728482
Advertisements