Apache MXNet - Gluon


Another most important MXNet Python package is Gluon. In this chapter, we will be discussing this package. Gluon provides a clear, concise, and simple API for DL projects. It enables Apache MXNet to prototype, build, and train DL models without forfeiting the training speed.


Blocks form the basis of more complex network designs. In a neural network, as the complexity of neural network increases, we need to move from designing single to entire layers of neurons. For example, NN design like ResNet-152 have a very fair degree of regularity by consisting of blocks of repeated layers.


In the example given below, we will write code a simple block, namely block for a multilayer perceptron.

from mxnet import nd
from mxnet.gluon import nn
x = nd.random.uniform(shape=(2, 20))
N_net = nn.Sequential()
N_net.add(nn.Dense(256, activation='relu'))


This produces the following output:

[[ 0.09543004 0.04614332 -0.00286655 -0.07790346 -0.05130241 0.02942038
0.08696645 -0.0190793 -0.04122177 0.05088576]
[ 0.0769287 0.03099706 0.00856576 -0.044672 -0.06926838 0.09132431
0.06786592 -0.06187843 -0.03436674 0.04234696]]
<NDArray 2x10 @cpu(0)>

Steps needed to go from defining layers to defining blocks of one or more layers −

Step 1 − Block take the data as input.

Step 2 − Now, blocks will store the state in the form of parameters. For example, in the above coding example the block contains two hidden layers and we need a place to store parameters for it.

Step 3 − Next block will invoke the forward function to perform forward propagation. It is also called forward computation. As a part of first forward call, blocks initialize the parameters in a lazy fashion.

Step 4 − At last the blocks will invoke backward function and calculate the gradient with reference to their input. Typically, this step is performed automatically.

Sequential Block

A sequential block is a special kind of block in which the data flows through a sequence of blocks. In this, each block applied to the output of one before with the first block being applied on the input data itself.

Let us see how sequential class works −

from mxnet import nd
from mxnet.gluon import nn
class MySequential(nn.Block):
   def __init__(self, **kwargs):
      super(MySequential, self).__init__(**kwargs)

   def add(self, block):
      self._children[block.name] = block
   def forward(self, x):
   for block in self._children.values():
      x = block(x)
   return x
x = nd.random.uniform(shape=(2, 20))
N_net = MySequential()
N_net.add(nn.Dense(256, activation


The output is given herewith −

[[ 0.09543004 0.04614332 -0.00286655 -0.07790346 -0.05130241 0.02942038
0.08696645 -0.0190793 -0.04122177 0.05088576]
[ 0.0769287 0.03099706 0.00856576 -0.044672 -0.06926838 0.09132431
0.06786592 -0.06187843 -0.03436674 0.04234696]]
<NDArray 2x10 @cpu(0)>

Custom Block

We can easily go beyond concatenation with sequential block as defined above. But, if we would like to make customisations then the Block class also provides us the required functionality. Block class has a model constructor provided in nn module. We can inherit that model constructor to define the model we want.

In the following example, the MLP class overrides the __init__ and forward functions of the Block class.

Let us see how it works.

class MLP(nn.Block):

   def __init__(self, **kwargs):
      super(MLP, self).__init__(**kwargs)
      self.hidden = nn.Dense(256, activation='relu') # Hidden layer
      self.output = nn.Dense(10) # Output layer

   def forward(self, x):
      hidden_out = self.hidden(x)
      return self.output(hidden_out)
x = nd.random.uniform(shape=(2, 20))
N_net = MLP()


When you run the code, you will see the following output:

[[ 0.07787763 0.00216403 0.01682201 0.03059879 -0.00702019 0.01668715
0.04822846 0.0039432 -0.09300035 -0.04494302]
[ 0.08891078 -0.00625484 -0.01619131 0.0380718 -0.01451489 0.02006172
0.0303478 0.02463485 -0.07605448 -0.04389168]]
<NDArray 2x10 @cpu(0)>

Custom Layers

Apache MXNet’s Gluon API comes with a modest number of pre-defined layers. But still at some point, we may find that a new layer is needed. We can easily add a new layer in Gluon API. In this section, we will see how we can create a new layer from scratch.

The Simplest Custom Layer

To create a new layer in Gluon API, we must have to create a class inherits from the Block class which provides the most basic functionality. We can inherit all the pre-defined layers from it directly or via other subclasses.

For creating the new layer, the only instance method needed to be implemented is forward (self, x). This method defines, what exactly our layer is going to do during forward propagation. As discussed earlier also, the back-propagation pass for blocks will be done by Apache MXNet itself automatically.


In the example below, we will be defining a new layer. We will also implement forward() method to normalise the input data by fitting it into a range of [0, 1].

from __future__ import print_function
import mxnet as mx
from mxnet import nd, gluon, autograd
from mxnet.gluon.nn import Dense
class NormalizationLayer(gluon.Block):
   def __init__(self):
      super(NormalizationLayer, self).__init__()

   def forward(self, x):
      return (x - nd.min(x)) / (nd.max(x) - nd.min(x))
x = nd.random.uniform(shape=(2, 20))
N_net = NormalizationLayer()


On executing the above program, you will get the following result −

[[0.5216355 0.03835821 0.02284337 0.5945146 0.17334817 0.69329053
0.7782702 1. 0.5508242 0. 0.07058554 0.3677264
0.4366546 0.44362497 0.7192635 0.37616986 0.6728799 0.7032008

 0.46907538 0.63514024]
[0.9157533 0.7667402 0.08980197   0.03593295 0.16176797 0.27679572
 0.07331014 0.3905285 0.6513384 0.02713427 0.05523694 0.12147208
 0.45582628 0.8139887 0.91629887 0.36665893 0.07873632 0.78268915
 0.63404864 0.46638715]]
 <NDArray 2x20 @cpu(0)>


It may be defined as a process used by Apache MXNet’s to create a symbolic graph of a forward computation. Hybridisation allows MXNet to upsurge the computation performance by optimising the computational symbolic graph. Rather than directly inheriting from Block, in fact, we may find that while implementing existing layers a block inherits from a HybridBlock.

Following are the reasons for this −

  • Allows us to write custom layers: HybridBlock allows us to write custom layers that can further be used in imperative and symbolic programming both.

  • Increase computation performance− HybridBlock optimise the computational symbolic graph which allows MXNet to increase computation performance.


In this example, we will be rewriting our example layer, created above, by using HybridBlock:

class NormalizationHybridLayer(gluon.HybridBlock):
   def __init__(self):
      super(NormalizationHybridLayer, self).__init__()

   def hybrid_forward(self, F, x):
      return F.broadcast_div(F.broadcast_sub(x, F.min(x)), (F.broadcast_sub(F.max(x), F.min(x))))

layer_hybd = NormalizationHybridLayer()
layer_hybd(nd.array([1, 2, 3, 4, 5, 6], ctx=mx.cpu()))


The output is stated below:

[0. 0.2 0.4 0.6 0.8 1. ]
<NDArray 6 @cpu(0)>

Hybridisation has nothing to do with computation on GPU and one can train hybridised as well as non-hybridised networks on both CPU and GPU.

Difference between Block and HybridBlock

If we will compare the Block Class and HybridBlock, we will see that HybridBlock already has its forward() method implemented. HybridBlock defines a hybrid_forward() method that needs to be implemented while creating the layers. F argument creates the main difference between forward() and hybrid_forward(). In MXNet community, F argument is referred to as a backend. F can either refer to mxnet.ndarray API (used for imperative programming) or mxnet.symbol API (used for Symbolic programming).

How to add custom layer to a network?

Instead of using custom layers separately, these layers are used with predefined layers. We can use either Sequential or HybridSequential containers to from a sequential neural network. As discussed earlier also, Sequential container inherit from Block and HybridSequential inherit from HybridBlock respectively.


In the example below, we will be creating a simple neural network with a custom layer. The output from Dense (5) layer will be the input of NormalizationHybridLayer. The output of NormalizationHybridLayer will become the input of Dense (1) layer.

net = gluon.nn.HybridSequential()
with net.name_scope():
input = nd.random_uniform(low=-10, high=10, shape=(10, 2))


You will see the following output −


 [-1.316593 ]]
<NDArray 10x1 @cpu(0)>

Custom layer parameters

In a neural network, a layer has a set of parameters associated with it. We sometimes refer them as weights, which is internal state of a layer. These parameters play different roles −

  • Sometimes these are the ones that we want to learn during backpropagation step.

  • Sometimes these are just constants we want to use during forward pass.

If we talk about the programming concept, these parameters (weights) of a block are stored and accessed via ParameterDict class which helps in initialisation, updation, saving, and loading of them.


In the example below, we will be defining two following sets of parameters −

  • Parameter weights − This is trainable, and its shape is unknown during construction phase. It will be inferred on the first run of forward propagation.

  • Parameter scale − This is a constant whose value doesn’t change. As opposite to parameter weights, its shape is defined during construction.

class NormalizationHybridLayer(gluon.HybridBlock):
   def __init__(self, hidden_units, scales):
      super(NormalizationHybridLayer, self).__init__()
      with self.name_scope():
      self.weights = self.params.get('weights',
      shape=(hidden_units, 0),
      self.scales = self.params.get('scales',
      def hybrid_forward(self, F, x, weights, scales):
         normalized_data = F.broadcast_div(F.broadcast_sub(x, F.min(x)),
         (F.broadcast_sub(F.max(x), F.min(x))))
         weighted_data = F.FullyConnected(normalized_data, weights, num_hidden=self.weights.shape[0], no_bias=True)
         scaled_data = F.broadcast_mul(scales, weighted_data)
return scaled_data