CNTK - Monitoring the Model

In this chapter, we will understand how to monitor a model in CNTK.

Introduction

In previous sections, we have done some validation on our NN models. But, is it also necessary and possible to monitor our model during training?

Yes, already we have used ProgressWriter class to monitor our model and there are many more ways to do so. Before getting deep into the ways, first let’s have a look how monitoring in CNTK works and how we can use it to detect problems in our NN model.

Callbacks in CNTK

Actually, during training and validation, CNTK allows us to specify callbacks in several spots in the API. First, let’s take a closer look at when CNTK invokes callbacks.

When CNTK invoke callbacks?

CNTK will invoke the callbacks at the training and testing set moments when−

A minibatch is completed.
A full sweep over the dataset is completed during training.
A minibatch of testing is completed.
A full sweep over the dataset is completed during testing.

Specifying callbacks

While working with CNTK, we can specify callbacks in several spots in the API. For example−

When call train on a loss function?

Here, when we call train on a loss function, we can specify a set of callbacks through the callbacks argument as follows−

training_summary=loss.train((x_train,y_train),
parameter_learners=[learner],
callbacks=[progress_writer]),
minibatch_size=16, max_epochs=15)

When working with minibatch sources or using a manual minibatch loop−

In this case, we can specify callbacks for monitoring purpose while creating the Trainer as follows−

from cntk.logging import ProgressPrinter
callbacks = [
   ProgressPrinter(0)
]
Trainer = Trainer(z, (loss, metric), learner, [callbacks])

Various monitoring tools

Let us study about different monitoring tools.

ProgressPrinter

While reading this tutorial, you will find ProgressPrinter as the most used monitoring tool. Some of the characteristics of ProgressPrinter monitoring tool are−

ProgressPrinter class implements basic console-based logging to monitor our model. It can log to disk we want it to.

Especially useful while working in a distributed training scenario.

It is also very useful while working in a scenario where we can’t log in on the console to see the output of our Python program.

With the help of following code, we can create an instance of ProgressPrinter−

ProgressPrinter(0, log_to_file=’test.txt’)

We will get the output something that we have seen in the earlier sections−

Test.txt
CNTKCommandTrainInfo: train : 300
CNTKCommandTrainInfo: CNTKNoMoreCommands_Total : 300
CNTKCommandTrainBegin: train
-------------------------------------------------------------------
average since average since examples
loss last metric last
------------------------------------------------------
Learning rate per minibatch: 0.1
1.45 1.45 -0.189 -0.189 16
1.24 1.13 -0.0382 0.0371 48
[………]

TensorBoard

One of the disadvantages of using ProgressPrinter is that, we can’t get a good view of how the loss and metric progress over time is hard. TensorBoardProgressWriter is a great alternative to the ProgressPrinter class in CNTK.

Before using it, we need to first install it with the help of following command −

pip install tensorboard

Now, in order to use TensorBoard, we need to set up TensorBoardProgressWriter in our training code as follows−

import time
from cntk.logging import TensorBoardProgressWriter
tensorbrd_writer = TensorBoardProgressWriter(log_dir=’logs/{}’.format(time.time()),freq=1,model=z)

It is a good practice to call the close method on TensorBoardProgressWriter instance after done with the training of NNmodel.

We can visualise the TensorBoard logging data with the help of following command −

Tensorboard –logdir logs