- spaCy Tutorial
- spaCy - Home
- spaCy - Introduction
- spaCy - Getting Started
- spaCy - Models and Languages
- spaCy - Architecture
- spaCy - Command Line Helpers
- spaCy - Top-level Functions
- spaCy - Visualization Function
- spaCy - Utility Functions
- spaCy - Compatibility Functions
- spaCy - Containers
- Doc Class ContextManager and Property
- spaCy - Container Token Class
- spaCy - Token Properties
- spaCy - Container Span Class
- spaCy - Span Class Properties
- spaCy - Container Lexeme Class
- Training Neural Network Model
- Updating Neural Network Model
- spaCy Useful Resources
- spaCy - Quick Guide
- spaCy - Useful Resources
- spaCy - Discussion
spaCy - Retokenizer.split Method
This retokenizer method will mark a token for splitting into the specified orths.
Arguments
The table below explains its arguments −
NAME | TYPE | DESCRIPTION |
---|---|---|
Token | Token | It represents the token to split. |
Orths | List | It represents the verbatim text of the split tokens. The condition is that it must match the text of original token. |
Heads | List | It is the list of tokens or tuples that specifies the tokens to attach the newly split sub-tokens to. |
Attrs | Dict | These are the attributes to set on all split tokens. It is required that attribute names must be mapped to the list of per-token attribute values. |
Example
An example of Retokenizer.split method is as follows −
import spacy nlp_model = spacy.load("en_core_web_sm") doc = nlp_model("I like the Tutorialspoint.com") with doc.retokenize() as retokenizer: heads = [(doc[3], 1), doc[2]] attrs = {"POS": ["PROPN", "PROPN"], "DEP": ["pobj", "compound"]} retokenizer.split(doc[3], ["Tutorials", "point.com"], heads=heads, attrs=attrs) doc
Output
You will receive the following output −
I like the Tutorialspoint.com
spacy_doc_class_contextmanager_and_property.htm
Advertisements