spaCy - Container Lexeme Class

In this chapter, Lexeme Class in spaCy is explained in detail.

Lexeme Class

Lexeme class is an entry in the vocabulary. It has no string context. As opposed to a word token, it is a word type. That’s the reason it has no POS(part-of-speech) tag, dependency parse or lemma.

Attributes

The table below explains its arguments −

NAME	TYPE	DESCRIPTION
vocab	Vocab	It represents the vocabulary of the lexeme.
text	unicode	A Unicode attribute representing verbatim text content.
orth	int	It is an integer type attribute that represents ID of the verbatim text content.
orth_	unicode	It is the Unicode Verbatim text content which is identical to Lexeme.text. This text content exists mostly for consistency with the other attributes.
rank	int	It represents the sequential ID of the lexeme’s lexical type which is used to index into tables.
flags	int	It represents the container of the lexeme’s binary flags.
norm	int	This attribute represents the lexeme’s norm.
norm_	unicode	This attribute represents the lexeme’s norm.
lower	int	As name implies, it is the lowercase form of the word.
lower_	unicode	It is also the lowercase form of the word.
shape	int	To show orthographic features, this attribute is for transform of the word’s string.
shape_	unicode	To show orthographic features, this attribute is for transform of the word’s string.
prefix	int	It is the hash value of a length-N substring from the start of the word. The defaults value is N=1.
prefix_	unicode	It is a length-N substring from the start of the word. The default value is N=1.
suffix	int	It is the hash value of a length-N substring from the end of the word. The default value is N=3.
suffix_	unicode	It is the length-N substring from the end of the word. The default value is N=3.
is_alpha	bool	This attribute represents whether the lexeme consist of alphabetic characters or not? It is equivalent to lexeme.text.isalpha().
is_ascii	bool	This attribute represents whether the lexeme consist of ASCII characters or not? It is equivalent to all(ord(c) < 128 for c in lexeme.text).
is_digit	Bool	This attribute represents whether the lexeme consist of digits or not? It is equivalent to lexeme.text.isdigit().
is_lower	Bool	This attribute represents whether the lexeme is in lowercase or not? It is equivalent to lexeme.text.islower().
is_upper	Bool	This attribute represents whether the lexeme is in uppercase or not? It is equivalent to lexeme.text.isupper().
is_title	bool	This attribute represents whether the lexeme is in titlecase or not? It is equivalent to lexeme.text.istitle().
is_punct	bool	This attribute represents whether the lexeme a punctuation?
is_left_punct	bool	This attribute represents whether the lexeme a left punctuation mark, e.g. '(' ?
is_right_punct	bool	This attribute represents whether the lexeme a right punctuation mark, e.g. ')' ?
is_space	bool	This attribute represents whether the lexeme consist of whitespace characters or not? It is equivalent to lexeme.text.isspace().
is_bracket	bool	This attribute represents whether the lexeme is a bracket or not?
is_quote	bool	This attribute represents whether the lexeme a quotation mark or not?
is_currency	bool	Introduced in version 2.0.8, this attribute represents whether the lexeme is a currency symbol or not?
like_url	bool	This attribute represents whether the lexeme resemble a URL or not?
like_num	bool	This attribute represents whether the lexeme represent a number or not?
like_email	bool	This attribute represents whether the lexeme resemble an email address or not?
is_oov	bool	This attribute represents whether the lexeme have a word vector or not?
is_stop	bool	This attribute represents whether the lexeme is part of a “stop list” or not?
Lang	Int	This attribute represents the language of the parent document’s vocabulary.
lang_	unicode	This attribute represents the language of the parent document’s vocabulary.
Prob	float	It is the smoothed log probability estimate of lexeme’s word type.
cluster	int	It represents the brown cluster ID.
Sentiment	float	It represents a scalar value that indicates the positivity or negativity of the lexeme.

Methods

Following are the methods used in Lexeme class −

Sr.No.	Methods & Description
1	Lexeme._ _init_ _ To construct a Lexeme object.
2	Lexeme.set_flag To change the value of a Boolean flag.
3	Lexeme.check_flag To check the value of a Boolean flag.
4	Lexeme.similarity To compute a semantic similarity estimate.

Lexeme._ _init_ _

This is one of the most useful methods of Lexeme class. As name implies, it is used to construct a Lexeme object.

Arguments

The table below explains its arguments −

NAME	TYPE	DESCRIPTION
Vocab	Vocab	This argument represents the parent vocabulary.
Orth	int	It is the orth id of the lexeme.

Example

An example of Lexeme._ _init_ _ method is given below −

import spacy
nlp_model = spacy.load("en_core_web_sm")
doc = nlp_model("The website is Tutorialspoint.com.")
lexeme = doc[3]
lexeme.text

Output

When you run the code, you will see the following output −

'Tutorialspoint.com'

Lexeme.set_flag

This method is used to change the value of a Boolean flag.

Arguments

The table below explains its arguments −

NAME	TYPE	DESCRIPTION
flag_id	Int	It represents the attribute ID of the flag, which is to be set.
value	bool	It is the new value of the flag.

Example

An example of Lexeme.set_flag method is given below −

import spacy
nlp_model = spacy.load("en_core_web_sm")
New_FLAG = nlp_model.vocab.add_flag(lambda text: False)
nlp_model.vocab["Tutorialspoint.com"].set_flag(New_FLAG, True)
New_FLAG

Output

When you run the code, you will see the following output −

Lexeme.check_flag

This method is used to check the value of a Boolean flag.

Argument

The table below explains its argument −

NAME	TYPE	DESCRIPTION
flag_id	Int	It represents the attribute ID of the flag which is to be checked.

Example 1

An example of Lexeme.check_flag method is given below −

import spacy
nlp_model = spacy.load("en_core_web_sm")
library = lambda text: text in ["Website", "Tutorialspoint.com"]
my_library = nlp_model.vocab.add_flag(library)
nlp_model.vocab["Tutorialspoint.com"].check_flag(my_library)

Output

When you run the code, you will see the following output −

True

Example 2

Given below is another example of Lexeme.check_flag method −

nlp_model.vocab["Hello"].check_flag(my_library)

Output

When you run the code, you will see the following output −

False

Lexeme.similarity

This method is used to compute a semantic similarity estimate. The default is cosine over vectors.

Argument

The table below explains its argument −

NAME	TYPE	DESCRIPTION
Other	-	It is the object with which the comparison will be done. By default, it will accept Doc, Span, Token, and Lexeme objects.

Example

An example of Lexeme.similarity method is as follows −

import spacy
nlp_model = spacy.load("en_core_web_sm")
apple = nlp.vocab["apple"]
orange = nlp.vocab["orange"]
apple_orange = apple.similarity(orange)
orange_apple = orange.similarity(apple)
apple_orange == orange_apple

Output

When you run the code, you will see the following output −

True

Properties

Following are the properties of Lexeme Class.

Sr.No.	Property & Description
1	Lexeme.vector It will return a 1-dimensional array representing the lexeme’s semantics.
2	Lexeme.vector_norm It represents the L2 norm of the lexeme’s vector representation.

Sr.No.

Property & Description

Lexeme.vector

It will return a 1-dimensional array representing the lexeme’s semantics.

Lexeme.vector_norm

It represents the L2 norm of the lexeme’s vector representation.

Lexeme.vector

This Lexeme property represents a real-valued meaning. It will return a one-dimensional array representing the lexeme’s semantics.

Example

An example of Lexeme.vector property is given below −

import spacy
nlp_model = spacy.load("en_core_web_sm")
apple = nlp_model.vocab["apple"]
apple.vector.dtype

Output

You will see the following output −

dtype('float32')

Lexeme.vector_norm

This token property represents the L2 norm of the lexeme’s vector representation.

Example

An example of Lexeme.vector_norm property is as follows −

import spacy
nlp_model = spacy.load("en_core_web_sm")
apple = nlp.vocab["apple"]
pasta = nlp.vocab["pasta"]
apple.vector_norm != pasta.vector_norm

Output

You will see the following output −

True