- spaCy Tutorial
- spaCy - Home
- spaCy - Introduction
- spaCy - Getting Started
- spaCy - Models and Languages
- spaCy - Architecture
- spaCy - Command Line Helpers
- spaCy - Top-level Functions
- spaCy - Visualization Function
- spaCy - Utility Functions
- spaCy - Compatibility Functions
- spaCy - Containers
- Doc Class ContextManager and Property
- spaCy - Container Token Class
- spaCy - Token Properties
- spaCy - Container Span Class
- spaCy - Span Class Properties
- spaCy - Container Lexeme Class
- Training Neural Network Model
- Updating Neural Network Model
- spaCy Useful Resources
- spaCy - Quick Guide
- spaCy - Useful Resources
- spaCy - Discussion
spaCy - Container Span Class
This chapter will help you in understanding the Span Class in spaCy.
Span Class
It is a slice from Doc object, we discussed above.
Attributes
The table below explains its arguments −
NAME | TYPE | DESCRIPTION |
---|---|---|
doc | Doc | It represents the parent document. |
tensor V2.1.7 | Ndarray | Introduced in version 2.1.7 represents the span’s slice of the parent Doc’s tensor. |
sent | Span | It is actually the sentence span that this span is a part of. |
start | Int | This attribute is the token offset for the start of the span. |
end | Int | This attribute is the token offset for the end of the span. |
start_char | Int | Integer type attribute representing the character offset for the start of the span. |
end_char | Int | Integer type attribute representing the character offset for the end of the span. |
text | Unicode | It is a Unicode that represents the span text. |
text_with_ws | Unicode | It represents the text content of the span with a trailing whitespace character if the last token has one. |
orth | Int | This attribute is the ID of the verbatim text content. |
orth_ | Unicode | It is the Unicode Verbatim text content, which is identical to Token.text. This text content exists mostly for consistency with the other attributes. |
label | Int | This integer attribute is the hash value of the span’s label. |
label_ | Unicode | It is the label of span. |
lemma_ | Unicode | It is the lemma of span. |
kb_id | Int | It represents the hash value of the knowledge base ID, which is referred to by the span. |
kb_id_ | Unicode | It represents the knowledge base ID, which is referred to by the span. |
ent_id | Int | This attribute represents the hash value of the named entity the token is an instance of. |
ent_id_ | Unicode | This attribute represents the string ID of the named entity the token is an instance of. |
sentiment | Float | A float kind scalar value that indicates the positivity or negativity of the span. |
_ | Underscore | It is representing the user space for adding custom attribute extension. |
Methods
Following are the methods used in Span class −
Sr.No. | Method & Description |
---|---|
1 | Span._ _init_ _ To construct a Span object from the slice doc[start : end]. |
2 | Span._ _getitem_ _ To get a token object at a particular position say n, where n is an integer. |
3 | Span._ _iter_ _ To iterate over those token objects from which the annotations can be easily accessed. |
4 | Span._ _len_ _ To get the number of tokens in span. |
5 | Span.similarity To make a semantic similarity estimate. |
6 | Span.merge To retokenize the document in a way that the span is merged into a single token. |
ClassMethods
Following are the classmethods used in Span class −
Sr.No. | Classmethod & Description |
---|---|
1 | Span.set_extension It defines a custom attribute on the Span. |
2 | Span.get_extension To look up a previously extension by name. |
3 | Span.has_extension To check whether an extension has been registered on the Span class or not. |
4 | Span.remove_extension To remove a previously registered extension on the Span class. |