Doc Class ContextManager and Property



In this chapter, let us learn about the context manager and the properties of Doc Class in spaCy.

Context Manager

It is a context manager, which is used to handle the retokenization of the Doc class. Let us now learn about the same in detail.

Doc.retokenize

When you use this context manager, it will first modify the Doc’s tokenization, store it, and then, make all at once, when the context manager exists.

The advantage of this context manager is that it is more efficient and less error prone.

Example 1

Refer the example for Doc.retokenize context manager given below −

import spacy
nlp_model = spacy.load("en_core_web_sm")
from spacy.tokens import Doc
doc = nlp_model("This is Tutorialspoint.com.")
with doc.retokenize() as retokenizer:
   retokenizer.merge(doc[0:0])
doc

Output

You will see the following output −

is Tutorialspoint.com.

Example 2

Here is another example of Doc.retokenize context manager −

import spacy
nlp_model = spacy.load("en_core_web_sm")
from spacy.tokens import Doc
doc = nlp_model("This is Tutorialspoint.com.")
with doc.retokenize() as retokenizer:
   retokenizer.merge(doc[0:2])
doc

Output

You will see the following output −

This is Tutorialspoint.com.

Retokenize Methods

Given below is the table, which provides information about the retokenize methods in a nutshell. The two retokenize methods are explained below the table in detail.

Sr.No. Method & Description
1 Retokenizer.merge

It will mark a span for merging.

2 Retokenizer.split

It will mark a token for splitting into the specified orths.

Properties

The properties of Doc Class in spaCy are explained below −

Sr.No. Doc Property & Description
1 Doc.ents

Used for the named entities in the document.

2 Doc.noun_chunks

Used to iterate over the base noun phrases in a particular document.

3 Doc.sents

Used to iterate over the sentences in a particular document.

4 Doc.has_vector

Represents a Boolean value which indicates whether a word vector is associated with the object or not.

5 Doc.vector

Represents a real-valued meaning.

6 Doc.vector_norm

Represents the L2 norm of the document’s vector representation.

Advertisements