- spaCy - Home
- spaCy - Introduction
- spaCy - Getting Started
- spaCy - Models and Languages
- spaCy - Architecture
- spaCy - Command Line Helpers
- spaCy - Top-level Functions
- spaCy - Visualization Function
- spaCy - Utility Functions
- spaCy - Compatibility Functions
- spaCy - Containers
- Doc Class ContextManager and Property
- spaCy - Container Token Class
- spaCy - Token Properties
- spaCy - Container Span Class
- spaCy - Span Class Properties
- spaCy - Container Lexeme Class
- Training Neural Network Model
- Updating Neural Network Model
- spaCy Useful Resources
- spaCy - Quick Guide
- spaCy - Useful Resources
- spaCy - Discussion
spaCy - Retokenizer.split Method
This retokenizer method will mark a token for splitting into the specified orths.
Arguments
The table below explains its arguments −
| NAME | TYPE | DESCRIPTION |
|---|---|---|
| Token | Token | It represents the token to split. |
| Orths | List | It represents the verbatim text of the split tokens. The condition is that it must match the text of original token. |
| Heads | List | It is the list of tokens or tuples that specifies the tokens to attach the newly split sub-tokens to. |
| Attrs | Dict | These are the attributes to set on all split tokens. It is required that attribute names must be mapped to the list of per-token attribute values. |
Example
An example of Retokenizer.split method is as follows −
import spacy
nlp_model = spacy.load("en_core_web_sm")
doc = nlp_model("I like the Tutorialspoint.com")
with doc.retokenize() as retokenizer:
heads = [(doc[3], 1), doc[2]]
attrs = {"POS": ["PROPN", "PROPN"],
"DEP": ["pobj", "compound"]}
retokenizer.split(doc[3], ["Tutorials", "point.com"], heads=heads, attrs=attrs)
doc
Output
You will receive the following output −
I like the Tutorialspoint.com
spacy_doc_class_contextmanager_and_property.htm
Advertisements