CNTK - Creating First Neural Network


This chapter will elaborate on creating a neural network in CNTK.

Build the network structure

In order to apply CNTK concepts to build our first NN, we are going to use NN to classify species of iris flowers based on the physical properties of sepal width and length, and petal width and length. The dataset which we will be using iris dataset that describes the physical properties of different varieties of iris flowers −

  • Sepal length
  • Sepal width
  • Petal length
  • Petal width
  • Class i.e. iris setosa or iris versicolor or iris virginica

Here, we will be building a regular NN called a feedforward NN. Let us see the implementation steps to build the structure of NN −

Step 1 − First, we will import the necessary components such as our layer types, activation functions, and a function that allows us to define an input variable for our NN, from CNTK library.

from cntk import default_options, input_variable
from cntk.layers import Dense, Sequential
from cntk.ops import log_softmax, relu

Step 2 − After that, we will create our model using sequential function. Once created, we will feed it with the layers we want. Here, we are going to create two distinct layers in our NN; one with four neurons and another with three neurons.

model = Sequential([Dense(4, activation=relu), Dense(3, activation=log_sogtmax)])

Step 3 − At last, in order to compile the NN, we will bind the network to the input variable. It has an input layer with four neurons and an output layer with three neurons.

feature= input_variable(4)
z = model(feature)

Applying an activation function

There are lots of activation functions to choose from and choosing the right activation function will definitely make a big difference to how well our deep learning model will perform.

At the output layer

Choosing an activation function at the output layer will depend upon the kind of problem we are going to solve with our model.

  • For a regression problem, we should use a linear activation function on the output layer.

  • For a binary classification problem, we should use a sigmoid activation function on the output layer.

  • For multi-class classification problem, we should use a softmax activation function on the output layer.

  • Here, we are going to build a model for predicting one of the three classes. It means we need to use softmax activation function at output layer.

At the hidden layer

Choosing an activation function at the hidden layer requires some experimentation for monitoring the performance to see which activation function works well.

  • In a classification problem, we need to predict the probability a sample belongs to a specific class. That’s why we need an activation function that gives us probabilistic values. To reach this goal, sigmoid activation function can help us.

  • One of the major problems associated with sigmoid function is vanishing gradient problem. To overcome such problem, we can use ReLU activation function that coverts all negative values to zero and works as a pass-through filter for positive values.

Picking a loss function

Once, we have the structure for our NN model, we must have to optimise it. For optimising we need a loss function. Unlike activation functions, we have very less loss functions to choose from. However, choosing a loss function will depend upon the kind of problem we are going to solve with our model.

For example, in a classification problem, we should use a loss function that can measure the difference between a predicted class and an actual class.

loss function

For the classification problem, we are going to solve with our NN model, categorical cross entropy loss function is the best candidate. In CNTK, it is implemented as cross_entropy_with_softmax which can be imported from cntk.losses package, as follows−

label= input_variable(3)
loss = cross_entropy_with_softmax(z, label)


With having the structure for our NN model and a loss function to apply, we have all the ingredients to start making the recipe for optimising our deep learning model. But, before getting deep dive into this, we should learn about metrics.


CNTK has the package named cntk.metrics from which we can import the metrics we are going to use. As we are building a classification model, we will be using classification_error matric that will produce a number between 0 and 1. The number between 0 and 1 indicates the percentage of samples correctly predicted −

First, we need to import the metric from cntk.metrics package −

from cntk.metrics import classification_error
error_rate = classification_error(z, label)

The above function actually needs the output of the NN and the expected label as input.