- Lucene - Home
- Lucene - Overview
- Lucene - Environment Setup
- Lucene - First Application
- Lucene - Indexing Classes
- Lucene - Searching Classes
- Lucene - Indexing Process
- Lucene - Search Operation
- Lucene - Sorting
Lucene - Indexing Operations
- Lucene - Indexing Operations
- Lucene - Add Document
- Lucene - Update Document
- Lucene - Delete Document
- Lucene - Field Options
Lucene - Query Programming
- Lucene - Query Programming
- Lucene - TermQuery
- Lucene - TermRangeQuery
- Lucene - PrefixQuery
- Lucene - BooleanQuery
- Lucene - PhraseQuery
- Lucene - WildCardQuery
- Lucene - FuzzyQuery
- Lucene - MatchAllDocsQuery
- Lucene - MatchNoDocsQuery
- Lucene - RegexpQuery
Lucene - Analysis
- Lucene - Analysis
- Lucene - WhitespaceAnalyzer
- Lucene - SimpleAnalyzer
- Lucene - StopAnalyzer
- Lucene - StandardAnalyzer
- Lucene - KeywordAnalyzer
- Lucene - CustomAnalyzer
- Lucene - EnglishAnalyzer
- Lucene - FrenchAnalyzer
- Lucene - SpanishAnalyzer
Lucene - Resources
Lucene - Analysis
In one of our previous chapters, we have seen that Lucene uses IndexWriter to analyze the Document(s) using the Analyzer and then creates/open/edit indexes as required. In this chapter, we are going to discuss the various types of Analyzer objects and other relevant objects which are used during the analysis process. Understanding the Analysis process and how analyzers work will give you great insight over how Lucene indexes the documents.
Important Analyzer
Following is the list of objects that we'll discuss in due course.
| S.No. | Class & Description |
|---|---|
| 1 |
WhitespaceAnalyzer
This analyzer splits the text in a document based on whitespace. |
| 2 |
SimpleAnalyzer
This analyzer splits the text in a document based on non-letter characters and puts the text in lowercase. |
| 3 |
StopAnalyzer
This analyzer works just as the SimpleAnalyzer and removes the common words like 'a', 'an', 'the', etc. |
| 4 |
StandardAnalyzer
This is the most sophisticated analyzer and is capable of handling names, email addresses, etc. It lowercases each token and removes common words and punctuations, if any. |
| 5 |
KeywordAnalyzer
This analyzer treats entire stream as a token. It is best suited for identifiers, zip codes, product names etc. |
| 6 |
CustomAnalyzer
We can create our own custom analyzer as per custom requirements using CustomAnalyzer.builder() method. |
| 7 |
EnglishAnalyzer
Analyzer for English language. |
| 8 |
FrenchAnalyzer
Analyzer for French language. |
| 9 |
Lucene - SpanishAnalyzer
Analyzer for Spanish language. |