spaCy - Retokenizer.split Method

This retokenizer method will mark a token for splitting into the specified orths.

Arguments

The table below explains its arguments −

NAME	TYPE	DESCRIPTION
Token	Token	It represents the token to split.
Orths	List	It represents the verbatim text of the split tokens. The condition is that it must match the text of original token.
Heads	List	It is the list of tokens or tuples that specifies the tokens to attach the newly split sub-tokens to.
Attrs	Dict	These are the attributes to set on all split tokens. It is required that attribute names must be mapped to the list of per-token attribute values.

Example

An example of Retokenizer.split method is as follows −

import spacy
nlp_model = spacy.load("en_core_web_sm")
doc = nlp_model("I like the Tutorialspoint.com")
with doc.retokenize() as retokenizer:
   heads = [(doc[3], 1), doc[2]]
   attrs = {"POS": ["PROPN", "PROPN"],
      "DEP": ["pobj", "compound"]}
   retokenizer.split(doc[3], ["Tutorials", "point.com"], heads=heads, attrs=attrs)
doc

Output

You will receive the following output −

I like the Tutorialspoint.com

spacy_doc_class_contextmanager_and_property.htm