Lucene - Indexing Operations

Lucene - Query Programming

Lucene - Analysis

Lucene - Resources

Lucene - Analyzer Class

Quiz

Analyzer class is responsible to analyze a document and get the tokens/words from the text which is to be indexed. Without analysis, IndexWriter cannot create index.

Class declaration

Following is the declaration for org.apache.lucene.analysis.Analyzer class −

public abstract class Analyzer
   extends Object
      implements Closeable

Example

Analyzer analyzer = new Analyzer() {
   @Override
   protected TokenStreamComponents createComponents(String fieldName) {
      Tokenizer source = new CustomTokenizer(reader);
      TokenStream filter = new CustomFilter(source);
      filter = new BarFilter(filter);
      return new TokenStreamComponents(source, filter);
   }
   @Override
   protected TokenStream normalize(String fieldName, TokenStream in) {
      // Assuming CustomFilter is about normalization and BarFilter is about
      // stemming, only CustomFilter should be applied
      return new CustomFilter(in);
   }
};

Class Fields

Following table shows the class fields for Analyzer −

S.No.	Field & Description
1	static final Analyzer.ReuseStrategy GLOBAL_REUSE_STRATEGY A predefined Analyzer.ReuseStrategy that reuses the same components for every field.
2	static final Analyzer.ReuseStrategy PER_FIELD_REUSE_STRATEGY A predefined Analyzer.ReuseStrategy that reuses components per-field by maintaining a Map of TokenStreamComponent per field name.

S.No.

Field & Description

static final Analyzer.ReuseStrategy GLOBAL_REUSE_STRATEGY

A predefined Analyzer.ReuseStrategy that reuses the same components for every field.

static final Analyzer.ReuseStrategy PER_FIELD_REUSE_STRATEGY

A predefined Analyzer.ReuseStrategy that reuses components per-field by maintaining a Map of TokenStreamComponent per field name.

Class Constructors

Following table shows the class constructors for Analyzer −

S.No.	Constructor & Description
1	protected Analyzer() Create a new Analyzer, reusing the same set of components per-thread across calls to tokenStream(String, Reader).
2	protected Analyzer(Analyzer.ReuseStrategy reuseStrategy) Expert: create a new Analyzer with a custom Analyzer.ReuseStrategy.

S.No.

Constructor & Description

protected Analyzer()

Create a new Analyzer, reusing the same set of components per-thread across calls to tokenStream(String, Reader).

protected Analyzer(Analyzer.ReuseStrategy reuseStrategy)

Expert: create a new Analyzer with a custom Analyzer.ReuseStrategy.

Class Methods

S.No.	Method & Description
1	protected AttributeFactory attributeFactory(String fieldName) Return the AttributeFactory to be used for analysis and normalization on the given FieldName.
2	void close() Frees persistent resources used by this Analyzer
3	protected abstract Analyzer.TokenStreamComponents createComponents(String fieldName) Creates a new Analyzer.TokenStreamComponents instance for this analyzer.
4	int getOffsetGap(String fieldName) Just like getPositionIncrementGap(java.lang.String), except for Token offsets instead.
5	int getPositionIncrementGap(String fieldName) Invoked before indexing a IndexableField instance if terms have already been added to that field.
6	final Analyzer.ReuseStrategy getReuseStrategy() Returns the used Analyzer.ReuseStrategy.
7	protected Reader initReader(String fieldName, Reader reader) Override this if you want to add a CharFilter chain.
8	protected Reader initReaderForNormalization(String fieldName, Reader reader) Wrap the given Reader with CharFilters that make sense for normalization.
9	final BytesRef normalize(String fieldName, String text) Normalize a string down to the representation that it would have in the index.
10	protected TokenStream normalize(String fieldName, TokenStream in) Wrap the given TokenStream in order to apply normalization filters.
11	final TokenStream tokenStream(String fieldName, Reader reader) Returns a TokenStream suitable for fieldName, tokenizing the contents of reader.
12	final TokenStream tokenStream(String fieldName, String text) Returns a TokenStream suitable for fieldName, tokenizing the contents of text.

Methods Inherited

This class inherits methods from the following classes −

java.lang.Object

lucene_indexing_classes.htm

Print Page