- Lucene - Home
- Lucene - Overview
- Lucene - Environment Setup
- Lucene - First Application
- Lucene - Indexing Classes
- Lucene - Searching Classes
- Lucene - Indexing Process
- Lucene - Search Operation
- Lucene - Sorting
Lucene - Indexing Operations
- Lucene - Indexing Operations
- Lucene - Add Document
- Lucene - Update Document
- Lucene - Delete Document
- Lucene - Field Options
Lucene - Query Programming
- Lucene - Query Programming
- Lucene - TermQuery
- Lucene - TermRangeQuery
- Lucene - PrefixQuery
- Lucene - BooleanQuery
- Lucene - PhraseQuery
- Lucene - WildCardQuery
- Lucene - FuzzyQuery
- Lucene - MatchAllDocsQuery
- Lucene - MatchNoDocsQuery
- Lucene - RegexpQuery
Lucene - Analysis
- Lucene - Analysis
- Lucene - WhitespaceAnalyzer
- Lucene - SimpleAnalyzer
- Lucene - StopAnalyzer
- Lucene - StandardAnalyzer
- Lucene - KeywordAnalyzer
- Lucene - CustomAnalyzer
- Lucene - EnglishAnalyzer
- Lucene - FrenchAnalyzer
- Lucene - SpanishAnalyzer
Lucene - Resources
Lucene - Analyzer Class
Analyzer class is responsible to analyze a document and get the tokens/words from the text which is to be indexed. Without analysis, IndexWriter cannot create index.
Class declaration
Following is the declaration for org.apache.lucene.analysis.Analyzer class −
public abstract class Analyzer
extends Object
implements Closeable
Example
Analyzer analyzer = new Analyzer() {
@Override
protected TokenStreamComponents createComponents(String fieldName) {
Tokenizer source = new CustomTokenizer(reader);
TokenStream filter = new CustomFilter(source);
filter = new BarFilter(filter);
return new TokenStreamComponents(source, filter);
}
@Override
protected TokenStream normalize(String fieldName, TokenStream in) {
// Assuming CustomFilter is about normalization and BarFilter is about
// stemming, only CustomFilter should be applied
return new CustomFilter(in);
}
};
Class Fields
Following table shows the class fields for Analyzer −
| S.No. | Field & Description |
|---|---|
| 1 |
static final Analyzer.ReuseStrategy GLOBAL_REUSE_STRATEGY A predefined Analyzer.ReuseStrategy that reuses the same components for every field. |
| 2 |
static final Analyzer.ReuseStrategy PER_FIELD_REUSE_STRATEGY A predefined Analyzer.ReuseStrategy that reuses components per-field by maintaining a Map of TokenStreamComponent per field name. |
Class Constructors
Following table shows the class constructors for Analyzer −
| S.No. | Constructor & Description |
|---|---|
| 1 |
protected Analyzer() Create a new Analyzer, reusing the same set of components per-thread across calls to tokenStream(String, Reader). |
| 2 |
protected Analyzer(Analyzer.ReuseStrategy reuseStrategy) Expert: create a new Analyzer with a custom Analyzer.ReuseStrategy. |
Class Methods
| S.No. | Method & Description |
|---|---|
| 1 |
protected AttributeFactory attributeFactory(String fieldName) Return the AttributeFactory to be used for analysis and normalization on the given FieldName. |
| 2 |
void close() Frees persistent resources used by this Analyzer |
| 3 |
protected abstract Analyzer.TokenStreamComponents createComponents(String fieldName) Creates a new Analyzer.TokenStreamComponents instance for this analyzer. |
| 4 |
int getOffsetGap(String fieldName) Just like getPositionIncrementGap(java.lang.String), except for Token offsets instead. |
| 5 |
int getPositionIncrementGap(String fieldName) Invoked before indexing a IndexableField instance if terms have already been added to that field. |
| 6 |
final Analyzer.ReuseStrategy getReuseStrategy() Returns the used Analyzer.ReuseStrategy. |
| 7 |
protected Reader initReader(String fieldName, Reader reader) Override this if you want to add a CharFilter chain. |
| 8 |
protected Reader initReaderForNormalization(String fieldName, Reader reader) Wrap the given Reader with CharFilters that make sense for normalization. |
| 9 |
final BytesRef normalize(String fieldName, String text) Normalize a string down to the representation that it would have in the index. |
| 10 |
protected TokenStream normalize(String fieldName, TokenStream in) Wrap the given TokenStream in order to apply normalization filters. |
| 11 |
final TokenStream tokenStream(String fieldName, Reader reader) Returns a TokenStream suitable for fieldName, tokenizing the contents of reader. |
| 12 |
final TokenStream tokenStream(String fieldName, String text) Returns a TokenStream suitable for fieldName, tokenizing the contents of text. |
Methods Inherited
This class inherits methods from the following classes −
- java.lang.Object