
- Lucene - Home
- Lucene - Overview
- Lucene - Environment Setup
- Lucene - First Application
- Lucene - Indexing Classes
- Lucene - Searching Classes
- Lucene - Indexing Process
- Lucene - Search Operation
- Lucene - Sorting
Lucene - Indexing Operations
- Lucene - Indexing Operations
- Lucene - Add Document
- Lucene - Update Document
- Lucene - Delete Document
- Lucene - Field Options
Lucene - Query Programming
- Lucene - Query Programming
- Lucene - TermQuery
- Lucene - TermRangeQuery
- Lucene - PrefixQuery
- Lucene - BooleanQuery
- Lucene - PhraseQuery
- Lucene - WildCardQuery
- Lucene - FuzzyQuery
- Lucene - MatchAllDocsQuery
- Lucene - MatchNoDocsQuery
- Lucene - RegexpQuery
Lucene - Analysis
- Lucene - Analysis
- Lucene - WhitespaceAnalyzer
- Lucene - SimpleAnalyzer
- Lucene - StopAnalyzer
- Lucene - StandardAnalyzer
- Lucene - KeywordAnalyzer
- Lucene - CustomAnalyzer
- Lucene - EnglishAnalyzer
- Lucene - FrenchAnalyzer
- Lucene - SpanishAnalyzer
Lucene - Resources
Lucene - Analyzer Class
Analyzer class is responsible to analyze a document and get the tokens/words from the text which is to be indexed. Without analysis, IndexWriter cannot create index.
Class declaration
Following is the declaration for org.apache.lucene.analysis.Analyzer class −
public abstract class Analyzer extends Object implements Closeable
Example
Analyzer analyzer = new Analyzer() { @Override protected TokenStreamComponents createComponents(String fieldName) { Tokenizer source = new CustomTokenizer(reader); TokenStream filter = new CustomFilter(source); filter = new BarFilter(filter); return new TokenStreamComponents(source, filter); } @Override protected TokenStream normalize(String fieldName, TokenStream in) { // Assuming CustomFilter is about normalization and BarFilter is about // stemming, only CustomFilter should be applied return new CustomFilter(in); } };
Class Fields
Following table shows the class fields for Analyzer −
S.No. | Field & Description |
---|---|
1 |
static final Analyzer.ReuseStrategy GLOBAL_REUSE_STRATEGY A predefined Analyzer.ReuseStrategy that reuses the same components for every field. |
2 |
static final Analyzer.ReuseStrategy PER_FIELD_REUSE_STRATEGY A predefined Analyzer.ReuseStrategy that reuses components per-field by maintaining a Map of TokenStreamComponent per field name. |
Class Constructors
Following table shows the class constructors for Analyzer −
S.No. | Constructor & Description |
---|---|
1 |
protected Analyzer() Create a new Analyzer, reusing the same set of components per-thread across calls to tokenStream(String, Reader). |
2 |
protected Analyzer(Analyzer.ReuseStrategy reuseStrategy) Expert: create a new Analyzer with a custom Analyzer.ReuseStrategy. |
Class Methods
S.No. | Method & Description |
---|---|
1 |
protected AttributeFactory attributeFactory(String fieldName) Return the AttributeFactory to be used for analysis and normalization on the given FieldName. |
2 |
void close() Frees persistent resources used by this Analyzer |
3 |
protected abstract Analyzer.TokenStreamComponents createComponents(String fieldName) Creates a new Analyzer.TokenStreamComponents instance for this analyzer. |
4 |
int getOffsetGap(String fieldName) Just like getPositionIncrementGap(java.lang.String), except for Token offsets instead. |
5 |
int getPositionIncrementGap(String fieldName) Invoked before indexing a IndexableField instance if terms have already been added to that field. |
6 |
final Analyzer.ReuseStrategy getReuseStrategy() Returns the used Analyzer.ReuseStrategy. |
7 |
protected Reader initReader(String fieldName, Reader reader) Override this if you want to add a CharFilter chain. |
8 |
protected Reader initReaderForNormalization(String fieldName, Reader reader) Wrap the given Reader with CharFilters that make sense for normalization. |
9 |
final BytesRef normalize(String fieldName, String text) Normalize a string down to the representation that it would have in the index. |
10 |
protected TokenStream normalize(String fieldName, TokenStream in) Wrap the given TokenStream in order to apply normalization filters. |
11 |
final TokenStream tokenStream(String fieldName, Reader reader) Returns a TokenStream suitable for fieldName, tokenizing the contents of reader. |
12 |
final TokenStream tokenStream(String fieldName, String text) Returns a TokenStream suitable for fieldName, tokenizing the contents of text. |
Methods Inherited
This class inherits methods from the following classes −
- java.lang.Object