# Apache MXNet - Unified Operator API

This chapter provides information about the unified operator application programming interface (API) in Apache MXNet.

## SimpleOp

SimpleOp is a new unified operator API which unifies different invoking processes. Once invoked, it returns to the fundamental elements of operators. The unified operator is specially designed for unary as well as binary operations. It is because most of the mathematical operators attend to one or two operands and more operands make the optimization, related to dependency, useful.

We will be understanding its SimpleOp unified operator working with the help of an example. In this example, we will be creating an operator functioning as a smooth l1 loss, which is a mixture of l1 and l2 loss. We can define and write the loss as given below −

loss = outside_weight .* f(inside_weight .* (data - label))
grad = outside_weight .* inside_weight .* f'(inside_weight .* (data - label))


Here, in above example,

• .* stands for element-wise multiplication

• f, f’ is the smooth l1 loss function which we are assuming is in mshadow.

It looks impossible to implement this particular loss as a unary or binary operator but MXNet provides its users automatic differentiation in symbolic execution which simplifies the loss to f and f’ directly. That’s why we can certainly implement this particular loss as a unary operator.

## Defining Shapes

As we know MXNet’s mshadow library requires explicit memory allocation hence we need to provide all data shapes before any calculation occurs. Before defining functions and gradient, we need to provide input shape consistency and output shape as follows:

typedef mxnet::TShape (*UnaryShapeFunction)(const mxnet::TShape& src,
const EnvArguments& env);
typedef mxnet::TShape (*BinaryShapeFunction)(const mxnet::TShape& lhs,
const mxnet::TShape& rhs,
const EnvArguments& env);


The function mxnet::Tshape is used to check input data shape and designated output data shape. In case, if you do not define this function then the default output shape would be same as input shape. For example, in case of binary operator the shape of lhs and rhs is by default checked as the same.

Now let’s move on to our smooth l1 loss example. For this, we need to define an XPU to cpu or gpu in the header implementation smooth_l1_unary-inl.h. The reason is to reuse the same code in smooth_l1_unary.cc and smooth_l1_unary.cu.

#include <mxnet/operator_util.h>
#if defined(__CUDACC__)
#define XPU gpu
#else
#define XPU cpu
#endif


As in our smooth l1 loss example, the output has the same shape as the source, we can use the default behavior. It can be written as follows −

inline mxnet::TShape SmoothL1Shape_(const mxnet::TShape& src,const EnvArguments& env) {
return mxnet::TShape(src);
}


## Defining Functions

We can create a unary or binary function with one input as follows −

typedef void (*UnaryFunction)(const TBlob& src,
const EnvArguments& env,
TBlob* ret,
OpReqType req,
RunContext ctx);
typedef void (*BinaryFunction)(const TBlob& lhs,
const TBlob& rhs,
const EnvArguments& env,
TBlob* ret,
OpReqType req,
RunContext ctx);


Following is the RunContext ctx struct which contains the information needed during runtime for execution −

struct RunContext {
void *stream; // the stream of the device, can be NULL or Stream<gpu>* in GPU mode
template<typename xpu> inline mshadow::Stream<xpu>* get_stream() // get mshadow stream from Context
} // namespace mxnet


Now, let’s see how we can write the computation results in ret.

enum OpReqType {
kNullOp, // no operation, do not write anything
kWriteTo, // write gradient to provided space
kWriteInplace, // perform an in-place write
kAddTo // add to the provided space
};


Now, let’s move on to our smooth l1 loss example. For this, we will use UnaryFunction to define the function of this operator as follows:

template<typename xpu>
void SmoothL1Forward_(const TBlob& src,
const EnvArguments& env,
TBlob *ret,
OpReqType req,
RunContext ctx) {
mshadow::Stream<xpu> *s = ctx.get_stream<xpu>();
real_t sigma2 = env.scalar * env.scalar;
mshadow::Tensor<xpu, 2, DType> out = ret->get<xpu, 2, DType>(s);
mshadow::Tensor<xpu, 2, DType> in = src.get<xpu, 2, DType>(s);
ASSIGN_DISPATCH(out, req,
});
}


Except Input, TBlob, and OpReqType are doubled, Gradients functions of binary operators have similar structure. Let’s check out below, where we created a gradient function with various types of input:

// depending only on out_grad
const EnvArguments& env,
OpReqType req,
RunContext ctx);
// depending only on out_value
const OutputValue& out_value,
const EnvArguments& env,
OpReqType req,
RunContext ctx);
// depending only on in_data
const Input0& in_data0,
const EnvArguments& env,
OpReqType req,
RunContext ctx);


As defined above Input0, Input, OutputValue, and OutputGrad all share the structure of GradientFunctionArgument. It is defined as follows −

struct GradFunctionArgument {
TBlob data;
}


Now let’s move on to our smooth l1 loss example. For this to enable the chain rule of gradient we need to multiply out_grad from the top to the result of in_grad.

template<typename xpu>
void SmoothL1BackwardUseIn_(const OutputGrad& out_grad, const Input0& in_data0,
const EnvArguments& env,
OpReqType req,
RunContext ctx) {
mshadow::Stream<xpu> *s = ctx.get_stream<xpu>();
real_t sigma2 = env.scalar * env.scalar;
mshadow::Tensor<xpu, 2, DType> src = in_data0.data.get<xpu, 2, DType>(s);
});
}


## Register SimpleOp to MXNet

Once we created the shape, function, and gradient, we need to restore them into both an NDArray operator as well as into a symbolic operator. For this, we can use the registration macro as follows −

MXNET_REGISTER_SIMPLE_OP(Name, DEV)
.set_shape_function(Shape)
.describe("description");


The SimpleOpInplaceOption can be defined as follows −

enum SimpleOpInplaceOption {
kNoInplace, // do not allow inplace in arguments
kInplaceInOut, // allow inplace in with out (unary)
kInplaceOutIn, // allow inplace out_grad with in_grad (unary)
kInplaceLhsOut, // allow inplace left operand with out (binary)

kInplaceOutLhs // allow inplace out_grad with lhs_grad (binary)
};


Now let’s move on to our smooth l1 loss example. For this, we have a gradient function that relies on input data so that the function cannot be written in place.

MXNET_REGISTER_SIMPLE_OP(smooth_l1, XPU)
.set_enable_scalar(true)
.describe("Calculate Smooth L1 Loss(lhs, scalar)");


## SimpleOp on EnvArguments

As we know some operations might need the following −

• A scalar as input such as a gradient scale

• A set of keyword arguments controlling behavior

• A temporary space to speed up calculations.

The benefit of using EnvArguments is that it provides additional arguments and resources to make calculations more scalable and efficient.

### Example

First let’s define the struct as below −

struct EnvArguments {
real_t scalar; // scalar argument, if enabled
std::vector<std::pair<std::string, std::string> > kwargs; // keyword arguments
std::vector<Resource> resource; // pointer to the resources requested
};


Next, we need to request additional resources like mshadow::Random<xpu> and temporary memory space from EnvArguments.resource. It can be done as follows −

struct ResourceRequest {
enum Type { // Resource type, indicating what the pointer type is
kRandom, // mshadow::Random<xpu> object
kTempSpace // A dynamic temp space that can be arbitrary size
};
Type type; // type of resources
};


Now, the registration will request the declared resource request from mxnet::ResourceManager. After that, it will place the resources in std::vector<Resource> resource in EnvAgruments.

We can access the resources with the help of following code −

auto tmp_space_res = env.resources[0].get_space(some_shape, some_stream);
auto rand_res = env.resources[0].get_random(some_stream);


If you see in our smooth l1 loss example, a scalar input is needed to mark the turning point of a loss function. That’s why in the registration process, we use set_enable_scalar(true), and env.scalar in function and gradient declarations.

## Building Tensor Operation

Here the question arises that why we need to craft tensor operations? The reasons are as follows −

• Computation utilizes the mshadow library and we sometimes do not have functions readily available.

• If an operation is not done in an element-wise way such as softmax loss and gradient.

### Example

Here, we are using the above smooth l1 loss example. We will be creating two mappers namely the scalar cases of smooth l1 loss and gradient:

namespace mshadow_op {
struct smooth_l1_loss {
// a is x, b is sigma2
MSHADOW_XINLINE static real_t Map(real_t a, real_t b) {
if (a > 1.0f / b) {
return a - 0.5f / b;
} else if (a < -1.0f / b) {
return -a - 0.5f / b;
} else {
return 0.5f * a * a * b;
}
}
};
}