Lucene - Token



Token represents the text or the word in a document with relevant details like its metadata (position, start offset, end offset, token type and its position increment).

Class Declaration

Following is the declaration for the org.apache.lucene.analysis.Token class:

public class Token
   extends TermAttributeImpl
      implements TypeAttribute, PositionIncrementAttribute, 
         FlagsAttribute, OffsetAttribute, 
         PayloadAttribute, PositionLengthAttribute

Fields

Following are the fields for the org.apache.lucene.analysis.Token class −

  • static AttributeSource.AttributeFactory TOKEN_ATTRIBUTE_FACTORY − Convenience factory that returns Token as implementation for the basic attributes and return the default impl (with "Impl" appended) for all other attributes.

Class Constructors

The following table shows the different class constructors −

S.No. Constructor & Description
1

Token()

Constructs a Token will null text.

2

Token(char[] startTermBuffer, int termBufferOffset, int termBufferLength, int start, int end)

Constructs a Token with the given term buffer (offset & length), start and end offsets

3

Token(int start, int end)

Constructs a Token with null text and start & end offsets.

4

Constructs a Token with null text and start & end offsets plus flags.

5

Constructs a Token with null text and start/ end offsets plus the Token type.

6

Token(String text, int start, int end)

Constructs a Token with the given term text, and start/ end offsets.

7

Token(String text, int start, int end, int flags)

Constructs a Token with the given text, start/ end offsets, and type.

8

Token(String text, int start, int end, String typ)

Constructs a Token with the given text, start/ end offsets, and type.

Class Methods

The following table shows the different class methods −

S.No. Method & Description
1

void clear()

Resets the term text, payload, flags, and positionIncrement, startOffset, endOffset and token type to default.

2

Object clone()

This is a shallow clone.

3

Token clone(char[] newTermBuffer, int newTermOffset, int newTermLength, int newStartOffset, int newEndOffset)

Makes a clone, but replaces the term buffer & start/end offset in the process.

4

void copyTo(AttributeImpl target)

Copies the values from this Attribute into the passed-in target attribute.

5

int endOffset()

Returns the Token's ending offset; one greater than the position of the last character corresponding to this token in the source text.

6

boolean equals(Object obj)

7

int getFlags()

Gets the bitset for any bits that have been set.

8

Payload getPayload()

Returns this Token's payload.

9

int getPositionIncrement()

Returns the position increment of this Token.

10

int getPositionLength()

Get the position length.

11

int hashCode()

12

void reflectWith(AttributeReflector reflector)

This method is for introspection of attributes, it should simply add the key/values this attribute holds to the given AttributeReflector.

13

Token reinit(char[] newTermBuffer, int newTermOffset, int newTermLength, int newStartOffset, int newEndOffset)

Shorthand for calling clear(), CharTermAttributeImpl.copyBuffer(char[], int, int), setStartOffset(int), setEndOffset(int) setType(java.lang.String) on Token.DEFAULT_TYPE

14

Token reinit(char[] newTermBuffer, int newTermOffset, int newTermLength, int newStartOffset, int newEndOffset, String newType)

Shorthand for calling clear(), CharTermAttributeImpl.copyBuffer(char[], int, int), setStartOffset(int), setEndOffset(int), setType(java.lang.String)

15

Token reinit(String newTerm, int newStartOffset, int newEndOffset)

Shorthand for calling clear(), CharTermAttributeImpl.append(CharSequence), setStartOffset(int), setEndOffset(int) setType(java.lang.String) on Token.DEFAULT_TYPE

16

Token reinit(String newTerm, int newTermOffset, int newTermLength, int newStartOffset, int newEndOffset)

Shorthand for calling clear(), CharTermAttributeImpl.append(CharSequence, int, int), setStartOffset(int), setEndOffset(int) setType(java.lang.String) on Token.DEFAULT_TYPE

17

Token reinit(String newTerm, int newTermOffset, int newTermLength, int newStartOffset, int newEndOffset, String newType)

Shorthand for calling clear(), CharTermAttributeImpl.append(CharSequence, int, int), setStartOffset(int), setEndOffset(int) setType(java.lang.String)

18

Token reinit(String newTerm, int newStartOffset, int newEndOffset, String newType)

Shorthand for calling clear(), CharTermAttributeImpl.append(CharSequence), setStartOffset(int), setEndOffset(int) setType(java.lang.String)

19

void reinit(Token prototype)

Copies the prototype token's fields into this one.

20

void reinit(Token prototype, char[] newTermBuffer, int offset, int length)

Copies the prototype token's fields into this one, with a different term.

21

void reinit(Token prototype, String newTerm)

Copies the prototype token's fields into this one, with a different term.

22

void setEndOffset(int offset)

Sets the ending offset.

23

void setFlags(int flags)

24

void setOffset(int startOffset, int endOffset)

Sets the starting and ending offset.

25

void setPayload(Payload payload)

Sets this Token's payload.

26

void setPositionIncrement(int positionIncrement)

Sets the position increment.

27

void setPositionLength(int positionLength)

Set the position length.

28

void setStartOffset(int offset)

Set the starting offset.

29

void setType(String type)

Sets the lexical type.

30

int startOffset()

Returns this Token's starting offset, the position of the first character corresponding to this token in the source text.

31

String type()

Returns this Token's lexical type.

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.analysis.tokenattributes.TermAttributeImpl
  • org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl
  • org.apache.lucene.util.AttributeImpl
  • java.lang.Object
lucene_analysis.htm
Advertisements