Apache MXNet - System Components


Here, the system components in Apache MXNet are explained in detail. First, we will study about the execution engine in MXNet.

Execution Engine

Apache MXNet’s execution engine is very versatile. We can use it for deep learning as well as any domain-specific problem: execute a bunch of functions following their dependencies. It is designed in such a way that the functions with dependencies are serialized whereas, the functions with no dependencies can be executed in parallel.

Core Interface

The API given below is the core interface for Apache MXNet’s execution engine −

virtual void PushSync(Fn exec_fun, Context exec_ctx,
std::vector<VarHandle> const& const_vars,
std::vector<VarHandle> const& mutate_vars) = 0;

The above API has the following −

  • exec_fun − The core interface API of MXNet allows us to push the function named exec_fun, along with its context information and dependencies, to the execution engine.

  • exec_ctx − The context information in which the above-mentioned function exec_fun should be executed.

  • const_vars − These are the variables that the function reads from.

  • mutate_vars − These are the variables that are to be modified.

The execution engine provides its user the guarantee that the execution of any two functions that modify a common variable is serialized in their push order.


Following is the function type of the execution engine of Apache MXNet −

using Fn = std::function<void(RunContext)>;

In the above function, RunContext contains the runtime information. The runtime information should be determined by the execution engine. The syntax of RunContext is as follows−

struct RunContext {
   // stream pointer which could be safely cast to
   // cudaStream_t* type
   void *stream;

Below are given some important points about execution engine’s functions −

  • All the functions are executed by MXNet’s execution engine’s internal threads.

  • It is not good to push blocking the function to the execution engine because with that the function will occupy the execution thread and will also reduce the total throughput.

For this MXNet provides another asynchronous function as follows−

using Callback = std::function<void()>;
using AsyncFn = std::function<void(RunContext, Callback)>;
  • In this AsyncFn function we can pass the heavy part of our threads, but the execution engine does not consider the function finished until we call the callback function.


In Context, we can specify the context of the function to be executed within. This usually includes the following −

  • Whether the function should be run on a CPU or a GPU.

  • If we specify GPU in the Context, then which GPU to use.

  • There is a huge difference between Context and RunContext. Context have the device type and device id, whereas RunContext have the information that can be decided only during runtime.


VarHandle, used to specify the dependencies of functions, is like a token (especially provided by execution engine) we can use to represents the external resources the function can modify or use.

But the question arises, why we need to use VarHandle? It is because, the Apache MXNet engine is designed to decoupled from other MXNet modules.

Following are some important points about VarHandle −

  • It is lightweight so to create, delete, or copying a variable incurs little operating cost.

  • We need to specify the immutable variables i.e. the variables that will be used in the const_vars.

  • We need to specify the mutable variables i.e. the variables that will be modified in the mutate_vars.

  • The rule used by the execution engine to resolve the dependencies among functions is that the execution of any two functions when one of them modifies at least one common variable is serialized in their push order.

  • For creating a new variable, we can use the NewVar() API.

  • For deleting a variable, we can use the PushDelete API.

Let us understand its working with a simple example −

Suppose if we have two functions namely F1 and F2 and they both mutate the variable namely V2. In that case, F2 is guaranteed to be executed after F1 if F2 is pushed after F1. On the other side, if F1 and F2 both use V2 then their actual execution order could be random.

Push and Wait

Push and wait are two more useful API of execution engine.

Following are two important features of Push API:

  • All the Push APIs are asynchronous which means that the API call immediately returns regardless of whether the pushed function is finished or not.

  • Push API is not thread safe which means that only one thread should make engine API calls at a time.

Now if we talk about Wait API, following points represent it −

  • If a user wants to wait for a specific function to be finished, he/she should include a callback function in the closure. Once included, call the function at the end of the function.

  • On the other hand, if a user wants to wait for all functions that involves a certain variable to finish, he/she should use WaitForVar(var) API.

  • If someone wants to wait for all the pushed functions to finish, then use the WaitForAll () API.

  • Used to specify the dependencies of functions, is like a token.


Operator in Apache MXNet is a class that contains actual computation logic as well as auxiliary information and aid the system in performing optimisation.

Operator Interface

Forward is the core operator interface whose syntax is as follows:

virtual void Forward(const OpContext &ctx,
const std::vector<TBlob> &in_data,
const std::vector<OpReqType> &req,
const std::vector<TBlob> &out_data,
const std::vector<TBlob> &aux_states) = 0;

The structure of OpContext, defined in Forward() is as follows:

struct OpContext {
   int is_train;
   RunContext run_ctx;
   std::vector<Resource> requested;

The OpContext describes the state of operator (whether in the train or test phase), which device the operator should be run on and also the requested resources. two more useful API of execution engine.

From the above Forward core interface, we can understand the requested resources as follows −

  • in_data and out_data represent the input and output tensors.

  • req denotes how the result of computation are written into the out_data.

The OpReqType can be defined as −

enum OpReqType {

As like Forward operator, we can optionally implement the Backward interface as follows −

virtual void Backward(const OpContext &ctx,
const std::vector<TBlob> &out_grad,
const std::vector<TBlob> &in_data,
const std::vector<TBlob> &out_data,
const std::vector<OpReqType> &req,
const std::vector<TBlob> &in_grad,
const std::vector<TBlob> &aux_states);

Various tasks

Operator interface allows the users to do the following tasks −

  • User can specify in-place updates and can reduce memory allocation cost

  • In order to make it cleaner, the user can hide some internal arguments from Python.

  • User can define the relationship among the tensors and output tensors.

  • To perform computation, the user can acquire additional temporary space from the system.

Operator Property

As we are aware that in Convolutional neural network (CNN), one convolution has several implementations. To achieve the best performance from them, we might want to switch among those several convolutions.

That is the reason, Apache MXNet separate the operator semantic interface from the implementation interface. This separation is done in the form of OperatorProperty class which consists of the following−

InferShape − The InferShape interface has two purposes as given below:

  • First purpose is to tell the system the size of each input and output tensor so that the space can be allocated before Forward and Backward call.

  • Second purpose is to perform a size check to make sure that there is no error before running.

The syntax is given below −

virtual bool InferShape(mxnet::ShapeVector *in_shape,
mxnet::ShapeVector *out_shape,
mxnet::ShapeVector *aux_shape) const = 0;

Request Resource − What if your system can manage the computation workspace for operations like cudnnConvolutionForward? Your system can perform optimizations such as reuse the space and many more. Here, MXNet easily achieve this with the help of following two interfaces−

virtual std::vector<ResourceRequest> ForwardResource(
   const mxnet::ShapeVector &in_shape) const;
virtual std::vector<ResourceRequest> BackwardResource(
   const mxnet::ShapeVector &in_shape) const;

But, what if the ForwardResource and BackwardResource return non-empty arrays? In that case, the system offers corresponding resources through ctx parameter in the Forward and Backward interface of Operator.

Backward dependency − Apache MXNet has following two different operator signatures to deal with backward dependency −

void FullyConnectedForward(TBlob weight, TBlob in_data, TBlob out_data);
void FullyConnectedBackward(TBlob weight, TBlob in_data, TBlob out_grad, TBlob in_grad);
void PoolingForward(TBlob in_data, TBlob out_data);
void PoolingBackward(TBlob in_data, TBlob out_data, TBlob out_grad, TBlob in_grad);

Here, the two important points to note −

  • The out_data in FullyConnectedForward is not used by FullyConnectedBackward, and

  • PoolingBackward requires all the arguments of PoolingForward.

That is why for FullyConnectedForward, the out_data tensor once consumed could be safely freed because the backward function will not need it. With the help of this system got a to collect some tensors as garbage as early as possible.

In place Option − Apache MXNet provides another interface to the users to save the cost of memory allocation. The interface is appropriate for element-wise operations in which both input and output tensors have the same shape.

Following is the syntax for specifying the in-place update −

Example for Creating an Operator

With the help of OperatorProperty we can create an operator. To do so, follow the steps given below −

virtual std::vector<std::pair<int, void*>> ElewiseOpProperty::ForwardInplaceOption(
   const std::vector<int> &in_data,
   const std::vector<void*> &out_data) 
const {
   return { {in_data[0], out_data[0]} };
virtual std::vector<std::pair<int, void*>> ElewiseOpProperty::BackwardInplaceOption(
   const std::vector<int> &out_grad,
   const std::vector<int> &in_data,
   const std::vector<int> &out_data,
   const std::vector<void*> &in_grad) 
const {
   return { {out_grad[0], in_grad[0]} }

Step 1

Create Operator

First implement the following interface in OperatorProperty:

virtual Operator* CreateOperator(Context ctx) const = 0;

The example is given below −

class ConvolutionOp {
      void Forward( ... ) { ... }
      void Backward( ... ) { ... }
class ConvolutionOpProperty : public OperatorProperty {
      Operator* CreateOperator(Context ctx) const {
         return new ConvolutionOp;

Step 2

Parameterize Operator

If you are going to implement a convolution operator, it is mandatory to know the kernel size, the stride size, padding size, and so on. Why, because these parameters should be passed to the operator before calling any Forward or backward interface.

For this, we need to define a ConvolutionParam structure as below −

#include <dmlc/parameter.h>
struct ConvolutionParam : public dmlc::Parameter<ConvolutionParam> {
   mxnet::TShape kernel, stride, pad;
   uint32_t num_filter, num_group, workspace;
   bool no_bias;

Now, we need to put this in ConvolutionOpProperty and pass it to the operator as follows −

class ConvolutionOp {
      ConvolutionOp(ConvolutionParam p): param_(p) {}
      void Forward( ... ) { ... }
      void Backward( ... ) { ... }
      ConvolutionParam param_;
class ConvolutionOpProperty : public OperatorProperty {
      void Init(const vector<pair<string, string>& kwargs) {
         // initialize param_ using kwargs
      Operator* CreateOperator(Context ctx) const {
         return new ConvolutionOp(param_);
      ConvolutionParam param_;

Step 3

Register the Operator Property Class and the Parameter Class to Apache MXNet

At last, we need to register the Operator Property Class and the Parameter Class to MXNet. It can be done with the help of following macros −

MXNET_REGISTER_OP_PROPERTY(Convolution, ConvolutionOpProperty);

In the above macro, the first argument is the name string and the second is the property class name.