- Lucene Tutorial
- Lucene - Home
- Lucene - Overview
- Lucene - Environment Setup
- Lucene - First Application
- Lucene - Indexing Classes
- Lucene - Searching Classes
- Lucene - Indexing Process
- Lucene - Indexing Operations
- Lucene - Search Operation
- Lucene - Query Programming
- Lucene - Analysis
- Lucene - Sorting
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Lucene - Analyzer
The Analyzer class is responsible to analyze a document and get the tokens/words from the text which is to be indexed. Without analysis=, the IndexWriter cannot create index.
Following is the declaration for the org.apache.lucene.analysis.Analyzer class −
public abstract class Analyzer extends Object implements Closeable
The following table shows a class constructor −
|S.No.||Constructor & Description|
The following table shows the different class methods −
|S.No.||Method & Description|
Frees persistent resources used by the Analyzer.
int getOffsetGap(Fieldable field)
This is similar to getPositionIncrementGap(java.lang.String), except for Token offsets.
int getPositionIncrementGap(String fieldName)
This is invoked before indexing a Fieldable instance if terms have already been added to that field.
protected Object getPreviousTokenStream()
Used by Analyzers that implement reusable TokenStream to retrieve previously saved TokenStreams for re-use by the same thread.
TokenStream reusableTokenStream(String fieldName, Reader reader)
Creates a TokenStream that is allowed to be re-used from the previous time that the same thread called this method.
protected void setPreviousTokenStream(Object obj)
Used by Analyzers that implement reusableTokenStream to save a TokenStream for later re-use by the same thread.
abstract TokenStream tokenStream(String fieldName, Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader.
This class inherits methods from the following classes −
This analyzer splits the text in a document based on the whitespace.