spaCy - Updating Neural Network Model



In this chapter, we will learn how to update the neural network model in spaCy.

Reasons to update

Following are the reasons to update an existing model −

  • The updated model will provide better results on your specific domain.

  • While updating an existing model, you can learn classification schemes for your problem.

  • Updating an existing model is essential for text classification.

  • It is especially useful for named entity recognition.

  • It is less critical for POS tagging as well as dependency parsing.

Updating an existing model

With the help of spaCy, we can update an existing pre-trained model with more data. For example, we can update the model to improve its predictions on different texts.

Updating an existing pre-trained model is very useful, if you want to improve the categories which the model already knows. For example, "person" or "organization". We can also update an existing pre-trained model for adding new categories.

It is recommended to always update an existing pre-trained model with examples of the new category as well as examples of the other categories, which the model previously predicted correctly. If not done, improving the new category might hurt the other categories.

Setting up a new pipeline

From the below given example, let us understand how we can set up a new pipeline from scratch for updating an existing model −

  • First, we will start with blank English model by using spacy.blank method. It only has the language data and tokenization rules and does not have any pipeline component.

  • After that we will create a blank entity recognizer and will add it to the pipeline. Next, we will add the new string labels to the model by using add_label.

  • Now, we can initialize the model with random weights by calling nlp.begin_training.

  • Next, we need to randomly shuffle the data on each iteration. It is to get better accuracy.

  • Once shuffled, divide the example into batches by using spaCy’s minibatch function. At last, update the model with texts and annotations and then, continue to loop.

Examples

Given below is an example for starting with blank English model by using spacy.blank

nlp = spacy.blank("en")

Following is an example for creating blank entity recognizer and adding it to the pipeline

ner = nlp.create_pipe("ner")
nlp.add_pipe(ner)

Here is an example for adding a new label by using add_label

ner.add_label("GADGET")

An example for starting the training by using nlp.begin_training is as follows

nlp.begin_training()

This is an example for training for iterations and shuffling the data on each iteration.

for itn in range(10):
   random.shuffle(examples)

This is an example for dividing the examples into batches using minibatch utility function for batch in spacy.util.minibatch(examples, size=2).

texts = [text for text, annotation in batch]
annotations = [annotation for text, annotation in batch]

Given below is an example for updating the model with texts and annotations

nlp.update(texts, annotations)
Advertisements