spaCy - Container Span Class



This chapter will help you in understanding the Span Class in spaCy.

Span Class

It is a slice from Doc object, we discussed above.

Attributes

The table below explains its arguments −

NAME TYPE DESCRIPTION
doc Doc It represents the parent document.
tensor V2.1.7 Ndarray Introduced in version 2.1.7 represents the span’s slice of the parent Doc’s tensor.
sent Span It is actually the sentence span that this span is a part of.
start Int This attribute is the token offset for the start of the span.
end Int This attribute is the token offset for the end of the span.
start_char Int Integer type attribute representing the character offset for the start of the span.
end_char Int Integer type attribute representing the character offset for the end of the span.
text Unicode It is a Unicode that represents the span text.
text_with_ws Unicode It represents the text content of the span with a trailing whitespace character if the last token has one.
orth Int This attribute is the ID of the verbatim text content.
orth_ Unicode It is the Unicode Verbatim text content, which is identical to Token.text. This text content exists mostly for consistency with the other attributes.
label Int This integer attribute is the hash value of the span’s label.
label_ Unicode It is the label of span.
lemma_ Unicode It is the lemma of span.
kb_id Int It represents the hash value of the knowledge base ID, which is referred to by the span.
kb_id_ Unicode It represents the knowledge base ID, which is referred to by the span.
ent_id Int This attribute represents the hash value of the named entity the token is an instance of.
ent_id_ Unicode This attribute represents the string ID of the named entity the token is an instance of.
sentiment Float A float kind scalar value that indicates the positivity or negativity of the span.
_ Underscore It is representing the user space for adding custom attribute extension.

Methods

Following are the methods used in Span class −

Sr.No. Method & Description
1 Span._ _init_ _

To construct a Span object from the slice doc[start : end].

2 Span._ _getitem_ _

To get a token object at a particular position say n, where n is an integer.

3 Span._ _iter_ _

To iterate over those token objects from which the annotations can be easily accessed.

4 Span._ _len_ _

To get the number of tokens in span.

5 Span.similarity

To make a semantic similarity estimate.

6 Span.merge

To retokenize the document in a way that the span is merged into a single token.

ClassMethods

Following are the classmethods used in Span class −

Sr.No. Classmethod & Description
1 Span.set_extension

It defines a custom attribute on the Span.

2 Span.get_extension

To look up a previously extension by name.

3 Span.has_extension

To check whether an extension has been registered on the Span class or not.

4 Span.remove_extension

To remove a previously registered extension on the Span class.

Advertisements