Python API Autograd and Initializer

Quiz

This chapter deals with the autograd and initializer API in MXNet.

mxnet.autograd

This is MXNet autograd API for NDArray. It has the following class −

Class: Function()

It is used for customised differentiation in autograd. It can be written as mxnet.autograd.Function. If, for any reason, the user do not want to use the gradients that are computed by the default chain-rule, then he/she can use Function class of mxnet.autograd to customize differentiation for computation. It has two methods namely Forward() and Backward().

Let us understand the working of this class with the help of following points −

First, we need to define our computation in the forward method.
Then, we need to provide the customized differentiation in the backward method.
Now during gradient computation, instead of user-defined backward function, mxnet.autograd will use the backward function defined by the user. We can also cast to numpy array and back for some operations in forward as well as backward.

Example

Before using the mxnet.autograd.function class, lets define a stable sigmoid function with backward as well as forward methods as follows −

class sigmoid(mx.autograd.Function):
   def forward(self, x):
      y = 1 / (1 + mx.nd.exp(-x))
      self.save_for_backward(y)
      return y
   
   def backward(self, dy):
      y, = self.saved_tensors
      return dy * y * (1-y)

Now, the function class can be used as follows −

func = sigmoid()
x = mx.nd.random.uniform(shape=(10,))
x.attach_grad()
with mx.autograd.record():
m = func(x)
m.backward()
dx_grad = x.grad.asnumpy()
dx_grad

Output

When you run the code, you will see the following output −

array([0.21458015, 0.21291625, 0.23330082, 0.2361367 , 0.23086983,
0.24060014, 0.20326573, 0.21093895, 0.24968489, 0.24301809],
dtype=float32)

Methods and their parameters

Following are the methods and their parameters of mxnet.autogard.function class −

Methods and its Parameters	Definition
forward (heads[, head_grads, retain_graph, ])	This method is used for forward computation.
backward(heads[, head_grads, retain_graph, ])	This method is used for backward computation. It computes the gradients of heads with respect to previously marked variables. This method takes as many inputs as forwards output. It also returns as many NDArrays as forwards inputs.
get_symbol(x)	This method is used to retrieve recorded computation history as Symbol.
grad(heads, variables[, head_grads, ])	This method computes the gradients of heads with respect to variables. Once computed, instead of storing into variable.grad, gradients will be returned as new NDArrays.
is_recording()	With the help of this method we can get status on recording and not recording.
is_training()	With the help of this method we can get status on training and predicting.
mark_variables(variables, gradients[, grad_reqs])	This method will mark NDArrays as variables to compute gradient for autograd. This method is same as function .attach_grad() in a variable but the only difference is that with this call we can set the gradient to any value.
pause([train_mode])	This method returns a scope context to be used in with statement for codes which do not need gradients to be calculated.
predict_mode()	This method returns a scope context to be used in with statement in which forward pass behavior is set to inference mode and that is without changing the recording states.
record([train_mode])	It will return an autograd recording scope context to be used in with statement and captures code which needs gradients to be calculated.
set_recording(is_recording)	Similar to is_recoring(), with the help of this method we can get status on recording and not recording.
set_training(is_training)	Similar to is_traininig(), with the help of this method we can set status to training or predicting.
train_mode()	This method will return a scope context to be used in with statement in which forward pass behavior is set to training mode and that is without changing the recording states.

Implementation Example

In the below example, we will be using mxnet.autograd.grad() method to compute the gradient of head with respect to variables −

x = mx.nd.ones((2,))
x.attach_grad()
with mx.autograd.record():
z = mx.nd.elemwise_add(mx.nd.exp(x), x)
dx_grad = mx.autograd.grad(z, [x], create_graph=True)
dx_grad

Output

The output is mentioned below −

[
[3.7182817 3.7182817]
<NDArray 2 @cpu(0)>]

We can use mxnet.autograd.predict_mode() method to return a scope to be used in with statement −

with mx.autograd.record():
y = model(x)
with mx.autograd.predict_mode():
y = sampling(y)
backward([y])

mxnet.intializer

This is MXNet API for weigh initializer. It has the following classes −

Classes and their parameters

Following are the methods and their parameters of mxnet.autogard.function class:

Classes and its Parameters	Definition
Bilinear()	With the help of this class we can initialize weight for up-sampling layers.
Constant(value)	This class initializes the weights to a given value. The value can be a scalar as well as NDArray that matches the shape of the parameter to be set.
FusedRNN(init, num_hidden, num_layers, mode)	As name implies, this class initialize parameters for the fused Recurrent Neural Network (RNN) layers.
InitDesc	It acts as the descriptor for the initialization pattern.
Initializer(**kwargs)	This is the base class of an initializer.
LSTMBias([forget_bias])	This class initialize all biases of an LSTMCell to 0.0 but except for the forget gate whose bias is set to a custom value.
Load(param[, default_init, verbose])	This class initialize the variables by loading data from file or dictionary.
MSRAPrelu([factor_type, slope])	As name implies, this class Initialize the weight according to a MSRA paper.
Mixed(patterns, initializers)	It initializes the parameters using multiple initializers.
Normal([sigma])	Normal() class initializes weights with random values sampled from a normal distribution with a mean of zero and standard deviation (SD) of sigma.
One()	It initializes the weights of parameter to one.
Orthogonal([scale, rand_type])	As name implies, this class initialize weight as orthogonal matrix.
Uniform([scale])	It initializes weights with random values which is uniformly sampled from a given range.
Xavier([rnd_type, factor_type, magnitude])	It actually returns an initializer that performs Xavier initialization for weights.
Zero()	It initializes the weights of parameter to zero.

Implementation Example

In the below example, we will be using mxnet.init.Normal() class create an initializer and retrieve its parameters −

init = mx.init.Normal(0.8)
init.dumps()

Output

The output is given below −

'["normal", {"sigma": 0.8}]'

Example

init = mx.init.Xavier(factor_type="in", magnitude=2.45)
init.dumps()

Output

The output is shown below −

'["xavier", {"rnd_type": "uniform", "factor_type": "in", "magnitude": 2.45}]'

In the below example, we will be using mxnet.initializer.Mixed() class to initialize parameters using multiple initializers −

init = mx.initializer.Mixed(['bias', '.*'], [mx.init.Zero(),
mx.init.Uniform(0.1)])
module.init_params(init)

for dictionary in module.get_params():
for key in dictionary:
print(key)
print(dictionary[key].asnumpy())

Output

The output is shown below −

fullyconnected1_weight
[[ 0.0097627 0.01856892 0.04303787]]
fullyconnected1_bias
[ 0.]

Print Page