spaCy - Retokenizer.split Method



This retokenizer method will mark a token for splitting into the specified orths.

Arguments

The table below explains its arguments −

NAME TYPE DESCRIPTION
Token Token It represents the token to split.
Orths List It represents the verbatim text of the split tokens. The condition is that it must match the text of original token.
Heads List It is the list of tokens or tuples that specifies the tokens to attach the newly split sub-tokens to.
Attrs Dict These are the attributes to set on all split tokens. It is required that attribute names must be mapped to the list of per-token attribute values.

Example

An example of Retokenizer.split method is as follows −

import spacy
nlp_model = spacy.load("en_core_web_sm")
doc = nlp_model("I like the Tutorialspoint.com")
with doc.retokenize() as retokenizer:
   heads = [(doc[3], 1), doc[2]]
   attrs = {"POS": ["PROPN", "PROPN"],
      "DEP": ["pobj", "compound"]}
   retokenizer.split(doc[3], ["Tutorials", "point.com"], heads=heads, attrs=attrs)
doc

Output

You will receive the following output −

I like the Tutorialspoint.com
spacy_doc_class_contextmanager_and_property.htm
Advertisements