Lucene - Quick Guide



Lucene - Overview

Lucene is a simple yet powerful Java-based Search library. It can be used in any application to add search capability to it. Lucene is an open-source project. It is scalable. This high-performance library is used to index and search virtually any kind of text. Lucene library provides the core operations which are required by any search application. Indexing and Searching.

How Search Application works?

A Search application performs all or a few of the following operations −

Step Title Description
1

Acquire Raw Content

The first step of any search application is to collect the target contents on which search application is to be conducted.

2

Build the document

The next step is to build the document(s) from the raw content, which the search application can understand and interpret easily.

3

Analyze the document

Before the indexing process starts, the document is to be analyzed as to which part of the text is a candidate to be indexed. This process is where the document is analyzed.

4

Indexing the document

Once documents are built and analyzed, the next step is to index them so that this document can be retrieved based on certain keys instead of the entire content of the document. Indexing process is similar to indexes at the end of a book where common words are shown with their page numbers so that these words can be tracked quickly instead of searching the complete book.

5

User Interface for Search

Once a database of indexes is ready then the application can make any search. To facilitate a user to make a search, the application must provide a user a mean or a user interface where a user can enter text and start the search process.

6

Build Query

Once a user makes a request to search a text, the application should prepare a Query object using that text which can be used to inquire index database to get the relevant details.

7

Search Query

Using a query object, the index database is then checked to get the relevant details and the content documents.

8

Render Results

Once the result is received, the application should decide on how to show the results to the user using User Interface. How much information is to be shown at first look and so on.

Apart from these basic operations, a search application can also provide administration user interface and help administrators of the application to control the level of search based on the user profiles. Analytics of search results is another important and advanced aspect of any search application.

Lucene's Role in Search Application

Lucene plays role in steps 2 to step 7 mentioned above and provides classes to do the required operations. In a nutshell, Lucene is the heart of any search application and provides vital operations pertaining to indexing and searching. Acquiring contents and displaying the results is left for the application part to handle.

In the next chapter, we will perform a simple Search application using Lucene Search library.

Lucene - Environment

This chapter will guide you on how to prepare a development environment to start your work with Lucene. It will also teach you how to set up JDK on your machine before you set up Apache Lucene −

Setup Java Development Kit (JDK)

You can download the latest version of SDK from Oracle's Java site − Java SE Downloads. You will find instructions for installing JDK in downloaded files, follow the given instructions to install and configure the setup. Finally set PATH and JAVA_HOME environment variables to refer to the directory that contains java and javac, typically java_install_dir/bin and java_install_dir respectively.

If you are running Windows and have installed the JDK in C:\jdk-24, you would have to put the following line in your C:\autoexec.bat file.

set PATH=C:\jdk-24;%PATH% 
set JAVA_HOME=C:\jdk-24

Alternatively, on Windows NT/2000/XP, you will have to right-click on My Computer, select Properties → Advanced → Environment Variables. Then, you will have to update the PATH value and click the OK button.

On Unix (Solaris, Linux, etc.), if the SDK is installed in /usr/local/jdk-24 and you use the C shell, you will have to put the following into your .cshrc file.

setenv PATH /usr/local/jdk-24/bin:$PATH 
setenv JAVA_HOME /usr/local/jdk-24

Alternatively, if you use an Integrated Development Environment (IDE) like Borland JBuilder, Eclipse, IntelliJ IDEA, or Sun ONE Studio, you will have to compile and run a simple program to confirm that the IDE knows where you have installed Java. Otherwise, you will have to carry out a proper setup as given in the document of the IDE.

Popular Java Editors

To write your Java programs, you need a text editor. There are many sophisticated IDEs available in the market. But for now, you can consider one of the following −

  • Notepad − On Windows machine, you can use any simple text editor like Notepad (Recommended for this tutorial), TextPad.

  • Netbeans − It is a Java IDE that is open-source and free, which can be downloaded from www.netbeans.org/index.html.

  • Eclipse − It is also a Java IDE developed by the eclipse open-source community and can be downloaded from www.eclipse.org.

Step 3 - Setup Lucene Framework Libraries

If the startup is successful, then you can proceed to set up your Lucene framework. Following are the simple steps to download and install the framework on your machine.

https://downloads.apache.org/lucene/java/10.2.2/

  • Make a choice whether you want to install Lucene on Windows, or Unix and then proceed to the next step to download the .zip file for windows and .tz file for Unix.

  • Download the suitable version of Lucene framework binaries from https://downloads.apache.org/lucene/java/10.2.2/.

  • At the time of writing this tutorial, I downloaded lucene-10.2.2.tgz on my Windows machine and extracted to C:\lucene.

Lucene Directories

You will find all the Lucene libraries in the directory C:\lucene\modules. Make sure you set your CLASSPATH variable on this directory properly otherwise, you will face problem while running your application. If you are using Eclipse, then it is not required to set CLASSPATH because all the setting will be done through Eclipse.

Once you are done with this last step, you are ready to proceed for your first Example which you will see in the next chapter.

Lucene - First Application

In this chapter, we will learn the actual programming with Lucene Framework. Before you start writing your first example using Lucene framework, you have to make sure that you have set up your Lucene environment properly as explained in Lucene - Environment Setup tutorial. It is recommended you have the working knowledge of Eclipse IDE.

Let us now proceed by writing a simple Search Application which will print the number of search results found. We'll also see the list of indexes created during this process.

Step 1 - Create Java Project

The first step is to create a simple Java Project using Eclipse IDE. Follow the option File > New -> Project and finally select Java Project wizard from the wizard list. Now name your project as LuceneFirstApplication using the wizard window as follows −

Create Project Wizard

Once your project is created successfully, you will have following content in your Project Explorer

Lucene First Application Directories

Step 2 - Add Required Libraries

Let us now add Lucene core Framework library in our project. To do this, right click on your project name LuceneFirstApplication and then follow the following option available in context menu: Build Path -> Configure Build Path to display the Java Build Path window as follows −

Java Build Path

Now use Add External JARs button available under Libraries tab to add the following core JAR from the Lucene installation directory −

  • lucene-core-10.2.2.jar

Step 3 - Create Source Files

Let us now create actual source files under the LuceneFirstApplication project. First we need to create a package called com.tutorialspoint.lucene. To do this, right-click on src in package explorer section and follow the option : New -> Package.

Next we will create LuceneTester.java and other java classes under the com.tutorialspoint.lucene package.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

TextFileFilter.java

This class is used as a .txt file filter.

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.FileFilter;

public class TextFileFilter implements FileFilter {

   @Override
   public boolean accept(File pathname) {
      return pathname.getName().toLowerCase().endsWith(".txt");
   }
}

Indexer.java

This class is used to index the raw data so that we can make it searchable using the Lucene library.

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.FileFilter;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Paths;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class Indexer {

   private IndexWriter writer;

   public Indexer(String indexDirectoryPath) throws IOException {
      //this directory will contain the indexes
	  Directory indexDirectory = FSDirectory.open(Paths.get(indexDirectoryPath));
      StandardAnalyzer analyzer = new StandardAnalyzer();
      IndexWriterConfig config = new IndexWriterConfig(analyzer);
      writer = new IndexWriter(indexDirectory, config);
   }

   public void close() throws CorruptIndexException, IOException {
      writer.close();
   }

   private Document getDocument(File file) throws IOException {
      Document document = new Document();

      //index file contents
      Field contentField = new TextField(LuceneConstants.CONTENTS, new FileReader(file));
      //index file name
      Field fileNameField = new StringField(LuceneConstants.FILE_NAME,
         file.getName(),Field.Store.YES);
      //index file path
      Field filePathField = new StringField(LuceneConstants.FILE_PATH,
         file.getCanonicalPath(),Field.Store.YES);

      document.add(contentField);
      document.add(fileNameField);
      document.add(filePathField);

      return document;
   }   

   private void indexFile(File file) throws IOException {
      System.out.println("Indexing "+file.getCanonicalPath());
      Document document = getDocument(file);
      writer.addDocument(document);
   }

   public int createIndex(String dataDirPath, FileFilter filter) 
      throws IOException {
      //get all files in the data directory
      File[] files = new File(dataDirPath).listFiles();

      for (File file : files) {
         if(!file.isDirectory()
            && !file.isHidden()
            && file.exists()
            && file.canRead()
            && filter.accept(file)
         ){
            indexFile(file);
         }
      }
      return writer.getDocStats().numDocs;
   }
}

Searcher.java

This class is used to search the indexes created by the Indexer to search the requested content.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.QueryBuilder;

public class Searcher {
	
   IndexSearcher indexSearcher;
   QueryBuilder queryBuilder;
   Query query;
   
   public Searcher(String indexDirectoryPath) 
      throws IOException {
      DirectoryReader indexDirectory = DirectoryReader.open(FSDirectory.open(Paths.get(indexDirectoryPath)));
      indexSearcher = new IndexSearcher(indexDirectory);
      StandardAnalyzer analyzer = new StandardAnalyzer();
      queryBuilder = new QueryBuilder(analyzer);
   }
   
   public TopDocs search( String searchQuery) 
      throws IOException, ParseException {
      query = queryBuilder.createPhraseQuery(LuceneConstants.CONTENTS, searchQuery);
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public Document getDocument(ScoreDoc scoreDoc) 
      throws CorruptIndexException, IOException {
      return indexSearcher.storedFields().document(scoreDoc.doc);	
   }
}

LuceneTester.java

This class is used to test the indexing and search capability of lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.text.ParseException;

import org.apache.lucene.document.Document;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;

public class LuceneTester {
	
   String indexDir = "D:\\lucene\\Index";
   String dataDir = "D:\\lucene\\Data";
   Indexer indexer;
   Searcher searcher;

   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         tester.createIndex();
         tester.search("Mohan");
      } catch (IOException e) {
         e.printStackTrace();
      } catch (ParseException e) {
         e.printStackTrace();
      }
   }

   private void createIndex() throws IOException {
      indexer = new Indexer(indexDir);
      int numIndexed;
      long startTime = System.currentTimeMillis();	
      numIndexed = indexer.createIndex(dataDir, new TextFileFilter());
      long endTime = System.currentTimeMillis();
      indexer.close();
      System.out.println(numIndexed+" File indexed, time taken: "
         +(endTime-startTime)+" ms");		
   }

   private void search(String searchQuery) throws IOException, ParseException {
      searcher = new Searcher(indexDir);
      long startTime = System.currentTimeMillis();
      TopDocs hits = searcher.search(searchQuery);
      long endTime = System.currentTimeMillis();
   
      System.out.println(hits.totalHits +
         " documents found. Time :" + (endTime - startTime));
      for(ScoreDoc scoreDoc : hits.scoreDocs) {
         Document doc = searcher.getDocument(scoreDoc);
            System.out.println("File: "
            + doc.get(LuceneConstants.FILE_PATH));
      }
   }
}

Step 4 - Data & Index directory creation

We have used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory D:\Lucene\Data. Test Data. An index directory path should be created as D:\Lucene\Index. After running this program, you can see the list of index files created in that folder.

Step 5 - Running the program

Once you are done with the creation of the source, the raw data, the data directory and the index directory, you are ready for compiling and running of your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If the application runs successfully, it will print the following message in Eclipse IDE's console −

Output

Sept 08, 2025 5:39:24 PM org.apache.lucene.internal.vectorization.VectorizationProvider lookup
WARNING: Java vector incubator module is not readable. For optimal vector performance, pass '--add-modules jdk.incubator.vector' to enable Vector API.
Indexing D:\lucene\Data\record1.txt
Indexing D:\lucene\Data\record10.txt
Indexing D:\lucene\Data\record2.txt
Indexing D:\lucene\Data\record3.txt
Indexing D:\lucene\Data\record4.txt
Indexing D:\lucene\Data\record5.txt
Indexing D:\lucene\Data\record6.txt
Indexing D:\lucene\Data\record7.txt
Indexing D:\lucene\Data\record8.txt
Indexing D:\lucene\Data\record9.txt
10 File indexed, time taken: 88 ms
1 hits documents found. Time :22
File: D:\lucene\Data\record4.txt

Once you've run the program successfully, you will have the following content in your index directory

Lucene Index Directory

Lucene - Indexing Classes

Indexing process is one of the core functionalities provided by Lucene. The following diagram illustrates the indexing process and the use of classes. IndexWriter is the most important and the core component of the indexing process.

Indexing Process

We add Document(s) containing Field(s) to IndexWriter which analyzes the Document(s) using the Analyzer and then creates/open/edit indexes as required and store/update them in a Directory. IndexWriter is used to update or create indexes. It is not used to read indexes.

Indexing Classes

Following is a list of commonly-used classes during the indexing process.

S.No. Class & Description
1 IndexWriter

This class acts as a core component which creates/updates indexes during the indexing process.

2 Directory

This class represents the storage location of the indexes.

3 Analyzer

This class is responsible to analyze a document and get the tokens/words from the text which is to be indexed. Without analysis done, IndexWriter cannot create index.

4 Document

This class represents a virtual document with Fields where the Field is an object which can contain the physical document's contents, its meta data and so on. The Analyzer can understand a Document only.

5 Field

This is the lowest unit or the starting point of the indexing process. It represents the key value pair relationship where a key is used to identify the value to be indexed. Let us assume a field used to represent contents of a document will have key as "contents" and the value may contain the part or all of the text or numeric content of the document. Lucene can index only text or numeric content only.

6 TokenStream

TokenStream is an output of the analysis process and it comprises of a series of tokens. It is an abstract class.

Lucene - Searching Classes

The process of Searching is again one of the core functionalities provided by Lucene. Its flow is similar to that of the indexing process. Basic search of Lucene can be made using the following classes which can also be termed as foundation classes for all search related operations.

Searching Classes

Following is a list of commonly-used classes during searching process.

S.No. Class & Description
1 IndexSearcher

This class act as a core component which reads/searches indexes created after the indexing process. It takes directory instance pointing to the location containing the indexes.

2 Term

This class is the lowest unit of searching. It is similar to Field in indexing process.

3 Query

Query is an abstract class and contains various utility methods and is the parent of all types of queries that Lucene uses during search process.

4 TermQuery

TermQuery is the most commonly-used query object and is the foundation of many complex queries that Lucene can make use of.

5 TopDocs

TopDocs points to the top N search results which matches the search criteria. It is a simple container of pointers to point to documents which are the output of a search result.

Lucene - Indexing Process

Indexing process is one of the core functionality provided by Lucene. Following diagram illustrates the indexing process and use of classes. IndexWriter is the most important and core component of the indexing process.

Indexing Process

We add Document(s) containing Field(s) to IndexWriter which analyzes the Document(s) using the Analyzer and then creates/open/edit indexes as required and store/update them in a Directory. IndexWriter is used to update or create indexes. It is not used to read indexes.

Now we'll show you a step by step process to get a kick start in understanding of indexing process using a basic example.

Create a document

  • Create a method to get a lucene document from a text file.

  • Create various types of fields which are key value pairs containing keys as names and values as contents to be indexed.

  • Set field to be analyzed or not. In our case, only contents is to be analyzed as it can contain data such as a, am, are, an etc. which are not required in search operations.

  • Add the newly created fields to the document object and return it to the caller method.

private Document getDocument(File file) throws IOException {
   Document document = new Document();

   //index file contents
   Field contentField = new TextField(LuceneConstants.CONTENTS, new FileReader(file));
   //index file name
   Field fileNameField = new StringField(LuceneConstants.FILE_NAME,
      file.getName(),Field.Store.YES);
   //index file path
   Field filePathField = new StringField(LuceneConstants.FILE_PATH,
      file.getCanonicalPath(),Field.Store.YES);

   document.add(contentField);
   document.add(fileNameField);
   document.add(filePathField);

   return document;
}    

Create a IndexWriter

IndexWriter class acts as a core component which creates/updates indexes during indexing process. Follow these steps to create a IndexWriter −

Step 1 − Create object of IndexWriter.

Step 2 − Create a Lucene directory which should point to location where indexes are to be stored.

Step 3 − Initialize the IndexWriter object created with the index directory, a standard analyzer and other required/optional parameters.

private IndexWriter writer;

public Indexer(String indexDirectoryPath) throws IOException {
   //this directory will contain the indexes
   Directory indexDirectory = FSDirectory.open(Paths.get(indexDirectoryPath));
   StandardAnalyzer analyzer = new StandardAnalyzer();
   IndexWriterConfig config = new IndexWriterConfig(analyzer);
   writer = new IndexWriter(indexDirectory, config);
}

Start Indexing Process

The following program shows how to start an indexing process −

private void indexFile(File file) throws IOException {
   System.out.println("Indexing "+file.getCanonicalPath());
   Document document = getDocument(file);
   writer.addDocument(document);
}

Example Application

To test the indexing process, we need to create a Lucene application test.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the indexing process.

2

Create LuceneConstants.java,TextFileFilter.java and Indexer.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and build the application to make sure the business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

TextFileFilter.java

This class is used as a .txt file filter.

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.FileFilter;

public class TextFileFilter implements FileFilter {

   @Override
   public boolean accept(File pathname) {
      return pathname.getName().toLowerCase().endsWith(".txt");
   }
}

Indexer.java

This class is used to index the raw data so that we can make it searchable using the Lucene library.

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.FileFilter;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Paths;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class Indexer {

   private IndexWriter writer;

   public Indexer(String indexDirectoryPath) throws IOException {
      //this directory will contain the indexes
	  Directory indexDirectory = FSDirectory.open(Paths.get(indexDirectoryPath));
      StandardAnalyzer analyzer = new StandardAnalyzer();
      IndexWriterConfig config = new IndexWriterConfig(analyzer);
      writer = new IndexWriter(indexDirectory, config);
   }

   public void close() throws CorruptIndexException, IOException {
      writer.close();
   }

   private Document getDocument(File file) throws IOException {
      Document document = new Document();

      //index file contents
      Field contentField = new TextField(LuceneConstants.CONTENTS, new FileReader(file));
      //index file name
      Field fileNameField = new StringField(LuceneConstants.FILE_NAME,
         file.getName(),Field.Store.YES);
      //index file path
      Field filePathField = new StringField(LuceneConstants.FILE_PATH,
         file.getCanonicalPath(),Field.Store.YES);

      document.add(contentField);
      document.add(fileNameField);
      document.add(filePathField);

      return document;
   }   

   private void indexFile(File file) throws IOException {
      System.out.println("Indexing "+file.getCanonicalPath());
      Document document = getDocument(file);
      writer.addDocument(document);
   }

   public int createIndex(String dataDirPath, FileFilter filter) 
      throws IOException {
      //get all files in the data directory
      File[] files = new File(dataDirPath).listFiles();

      for (File file : files) {
         if(!file.isDirectory()
            && !file.isHidden()
            && file.exists()
            && file.canRead()
            && filter.accept(file)
         ){
            indexFile(file);
         }
      }
      return writer.getDocStats().numDocs;
   }
}

LuceneTester.java

This class is used to test the indexing capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;

public class LuceneTester {
	
   String indexDir = "D:\\Lucene\\Index";
   String dataDir = "D:\\Lucene\\Data";
   Indexer indexer;
   
   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         tester.createIndex();
      } catch (IOException e) {
         e.printStackTrace();
      } 
   }

   private void createIndex() throws IOException {
      indexer = new Indexer(indexDir);
      int numIndexed;
      long startTime = System.currentTimeMillis();	
      numIndexed = indexer.createIndex(dataDir, new TextFileFilter());
      long endTime = System.currentTimeMillis();
      indexer.close();
      System.out.println(numIndexed+" File indexed, time taken: "
         +(endTime-startTime)+" ms");		
   }
}

Data & Index Directory Creation

We have used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory D:\Lucene\Data. Test Data. An index directory path should be created as D:\Lucene\Index. After running this program, you can see the list of index files created in that folder.

Running the Program

Once you are done with the creation of the source, the raw data, the data directory and the index directory, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

Indexing D:\Lucene\Data\record1.txt
Indexing D:\Lucene\Data\record10.txt
Indexing D:\Lucene\Data\record2.txt
Indexing D:\Lucene\Data\record3.txt
Indexing D:\Lucene\Data\record4.txt
Indexing D:\Lucene\Data\record5.txt
Indexing D:\Lucene\Data\record6.txt
Indexing D:\Lucene\Data\record7.txt
Indexing D:\Lucene\Data\record8.txt
Indexing D:\Lucene\Data\record9.txt
10 File indexed, time taken: 109 ms

Once you've run the program successfully, you will have the following content in your index directory −

Lucene Index Directory

Lucene - Search Operation

The process of searching is one of the core functionalities provided by Lucene. Following diagram illustrates the process and its use. IndexSearcher is one of the core components of the searching process.

Searching Process

We first create Directory(s) containing indexes and then pass it to IndexSearcher which opens the Directory using IndexReader. Then we create a Query with a Term and make a search using IndexSearcher by passing the Query to the searcher. IndexSearcher returns a TopDocs object which contains the search details along with document ID(s) of the Document which is the result of the search operation.

We will now show you a step-wise approach and help you understand the indexing process using a basic example.

Create a QueryBuilder

QueryBuilder class is used to build a query using the user entered input into Lucene understandable format query. Follow these steps to create a QueryBuilder −

Step 1 − Create object of QueryBuilder.

Step 2 − Initialize the QueryBuilder object created with a standard analyzer.

QueryBuilder queryBuilder;

public Searcher(String indexDirectoryPath) throws IOException {
   StandardAnalyzer analyzer = new StandardAnalyzer();
   queryBuilder = new QueryBuilder(analyzer);
}

Create a IndexSearcher

IndexSearcher class acts as a core component which searcher indexes created during indexing process. Follow these steps to create a IndexSearcher −

Step 1 − Create object of IndexSearcher.

Step 2 − Create a Lucene directory which should point to location where indexes are to be stored.

Step 3 − Initialize the IndexSearcher object created with the index directory.

IndexSearcher indexSearcher;

public Searcher(String indexDirectoryPath) throws IOException {
   DirectoryReader indexDirectory = DirectoryReader.open(FSDirectory.open(Paths.get(indexDirectoryPath)));
   indexSearcher = new IndexSearcher(indexDirectory);
}

Make search

Follow these steps to make search −

Step 1 − Create a Query object by parsing the search expression through QueryBuilder.

Step 2 − Make search by calling the IndexSearcher.search() method.

Query query;

public TopDocs search( String searchQuery) throws IOException, ParseException {
   query = queryBuilder.createPhraseQuery(LuceneConstants.CONTENTS, searchQuery);
   return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
}

Get the Document

The following program shows how to get the document.

public Document getDocument(ScoreDoc scoreDoc) 
   throws CorruptIndexException, IOException {
      return indexSearcher.storedFields().document(scoreDoc.doc);	
}

Example Application

Let us create a test Lucene application to test searching process.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java,TextFileFilter.java and Searcher.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

TextFileFilter.java

This class is used as a .txt file filter.

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.FileFilter;

public class TextFileFilter implements FileFilter {

   @Override
   public boolean accept(File pathname) {
      return pathname.getName().toLowerCase().endsWith(".txt");
   }
}

Searcher.java

This class is used to read the indexes made on raw data and searches data using the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.QueryBuilder;

public class Searcher {
	
   IndexSearcher indexSearcher;
   QueryBuilder queryBuilder;
   Query query;
   
   public Searcher(String indexDirectoryPath) 
      throws IOException {
      DirectoryReader indexDirectory = DirectoryReader.open(FSDirectory.open(Paths.get(indexDirectoryPath)));
      indexSearcher = new IndexSearcher(indexDirectory);
      StandardAnalyzer analyzer = new StandardAnalyzer();
      queryBuilder = new QueryBuilder(analyzer);
   }
   
   public TopDocs search( String searchQuery) 
      throws IOException, ParseException {
      query = queryBuilder.createPhraseQuery(LuceneConstants.CONTENTS, searchQuery);
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public Document getDocument(ScoreDoc scoreDoc) 
      throws CorruptIndexException, IOException {
      return indexSearcher.storedFields().document(scoreDoc.doc);	
   }
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;

import org.apache.lucene.document.Document;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;

public class LuceneTester {
	
   String indexDir = "D:\\Lucene\\Index";
   String dataDir = "D:\\Lucene\\Data";
   Searcher searcher;

   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         tester.search("Mohan");
      } catch (IOException e) {
         e.printStackTrace();
      } catch (ParseException e) {
         e.printStackTrace();
      }
   }

   private void search(String searchQuery) throws IOException, ParseException {
      searcher = new Searcher(indexDir);
      long startTime = System.currentTimeMillis();
      TopDocs hits = searcher.search(searchQuery);
      long endTime = System.currentTimeMillis();

      System.out.println(hits.totalHits +
         " documents found. Time :" + (endTime - startTime) +" ms");
      for(ScoreDoc scoreDoc : hits.scoreDocs) {
         Document doc = searcher.getDocument(scoreDoc);
         System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH));
      }
   }	
}

Data & Index Directory Creation

We have used 10 text files named record1.txt to record10.txt containing names and other details of the students and put them in the directory D:\Lucene\Data. Test Data. An index directory path should be created as D:\Lucene\Index. After running the indexing program in the chapter Lucene - Indexing Process, you can see the list of index files created in that folder.

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTesterapplication. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

1 hits documents found. Time :30 ms
File: D:\lucene\Data\record4.txt

Lucene - Sorting

Lucene gives the search results by default sorted by relevance and which can be manipulated as required.

Sorting by Relevance is the default sorting mode used by Lucene. Lucene provides results by the most relevant hit at the top.

Steps to sort Search results

Step 1: Create Index for the item to be sorted.

Add SortedDocValuesField for the field to be sorted.

//index file name
Field fileNameField = new Field(LuceneConstants.FILE_NAME, file.getName(),type);
//sort file name
Field sortedFileNameField = new SortedDocValuesField(LuceneConstants.FILE_NAME, new BytesRef(file.getName()));

// add fields
document.add(fileNameField);
document.add(sortedFileNameField);

Step 2: Create SortField and Sort Objects

Create Sort Object for the field to be searched.

// Sort by a string field  
SortField fileNameSort = new SortField(LuceneConstants.FILE_NAME, SortField.Type.STRING); 
Sort sort = new Sort(fileNameSort);

Step 3: Search using Sort Object

// sort and return search results
return indexSearcher.search(query, LuceneConstants.MAX_SEARCH, sort);

Example Application

To test sorting by relevance, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java,TextFileFilter.java, Indexer.java and Searcher.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

TextFileFilter.java

This class is used as a .txt file filter.

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.FileFilter;

public class TextFileFilter implements FileFilter {

   @Override
   public boolean accept(File pathname) {
      return pathname.getName().toLowerCase().endsWith(".txt");
   }
}

Indexer.java

This class is used to create index using lucene library.

package com.tutorialspoint.lucene;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileFilter;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Paths;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.document.SortedDocValuesField;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.DocValuesType;
import org.apache.lucene.index.IndexOptions;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.BytesRef;

public class Indexer {

   private IndexWriter writer;

   public Indexer(String indexDirectoryPath) throws IOException {
      //this directory will contain the indexes
	  Directory indexDirectory = FSDirectory.open(Paths.get(indexDirectoryPath));
      StandardAnalyzer analyzer = new StandardAnalyzer();
      IndexWriterConfig config = new IndexWriterConfig(analyzer);
      writer = new IndexWriter(indexDirectory, config);
   }

   public void close() throws CorruptIndexException, IOException {
      writer.close();
   }
   
   private Document getDocument(File file) throws IOException {
      Document document = new Document();

      //index file contents
      Field contentField = new TextField(LuceneConstants.CONTENTS, 
      new FileReader(file));

      FieldType type = new FieldType();
      type.setStored(true);
      type.setTokenized(false);
      type.setIndexOptions(IndexOptions.DOCS);
      type.setOmitNorms(true);

      //index file name
      Field fileNameField = new Field(LuceneConstants.FILE_NAME,
      file.getName(),type);

      //sort file name
      Field sortedFileNameField = new SortedDocValuesField(LuceneConstants.FILE_NAME, new BytesRef(file.getName()));

      //index file path
      Field filePathField = new Field(LuceneConstants.FILE_PATH,
      file.getCanonicalPath(),type);

      document.add(contentField);
      document.add(fileNameField);
      document.add(sortedFileNameField);
      document.add(filePathField);

      return document;
   }   

   private void indexFile(File file) throws IOException {
      System.out.println("Indexing "+file.getCanonicalPath());
      Document document = getDocument(file);
      writer.addDocument(document);
   }

   public int createIndex(String dataDirPath, FileFilter filter) 
      throws IOException {
      //get all files in the data directory
      File[] files = new File(dataDirPath).listFiles();

      for (File file : files) {
         if(!file.isDirectory()
            && !file.isHidden()
            && file.exists()
            && file.canRead()
            && filter.accept(file)
         ){
            indexFile(file);
         }
      }
      return writer.getDocStats().numDocs;
   }
}

Searcher.java

This class is used to read the indexes made on raw data and searches data using the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.SortField;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.QueryBuilder;

public class Searcher {
	
   IndexSearcher indexSearcher;
   QueryBuilder queryBuilder;
   Query query;
   
   public Searcher(String indexDirectoryPath) 
      throws IOException {
      DirectoryReader indexDirectory = DirectoryReader.open(FSDirectory.open(Paths.get(indexDirectoryPath)));
      indexSearcher = new IndexSearcher(indexDirectory);
      StandardAnalyzer analyzer = new StandardAnalyzer();
      queryBuilder = new QueryBuilder(analyzer);
   }
   
   public TopDocs search(Query query) throws IOException, ParseException {
      // Sort by a string field
      SortField fileNameSort = new SortField(LuceneConstants.FILE_NAME, SortField.Type.STRING);   
      Sort sort = new Sort(fileNameSort);
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH, sort);
   }

   public Document getDocument(ScoreDoc scoreDoc) 
      throws CorruptIndexException, IOException {
      return indexSearcher.storedFields().document(scoreDoc.doc);	
   }
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.text.ParseException;

import org.apache.lucene.document.Document;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.WildcardQuery;

public class LuceneTester {

   String indexDir = "D:\\lucene\\Index";
   String dataDir = "D:\\lucene\\Data";
   Indxer indexer;
   Searcher searcher;

   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         tester.createIndex();
         tester.searchUsingWildCardQuery("record1*");
      } catch (IOException e) {
         e.printStackTrace();
      } catch (ParseException e) {
         e.printStackTrace();
      }
   }

   private void createIndex() throws IOException {
      indexer = new Indexer(indexDir);
      int numIndexed;
      long startTime = System.currentTimeMillis();	
      numIndexed = indexer.createIndex(dataDir, new TextFileFilter());
      long endTime = System.currentTimeMillis();
      indexer.close();
      System.out.println(numIndexed+" File indexed, time taken: "
         +(endTime-startTime)+" ms");		
   }

   private void searchUsingWildCardQuery(String searchQuery) 
      throws IOException, ParseException { 
      searcher = new Searcher(indexDir); 
      long startTime = System.currentTimeMillis(); 

      //create a term to search file name 
      Term term = new Term(LuceneConstants.FILE_NAME, searchQuery); 
      //create the term query object 
      Query query = new WildcardQuery(term); 
      //do the search 
      TopDocs hits = searcher.search(query); 
      long endTime = System.currentTimeMillis(); 

      System.out.println(hits.totalHits + 
         " documents found. Time :" + (endTime - startTime) + "ms"); 

      for(ScoreDoc scoreDoc : hits.scoreDocs) { 
         Document doc = searcher.getDocument(scoreDoc); 
         System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH)); 
      } 
   } 
}

Data & Index Directory Creation

We have used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory D:\Lucene\Data. Test Data. An index directory path should be created as D:\Lucene\Index. Before running this program, delete any list of index files present in that folder.

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can compile and run your program. To do this, Keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Indexing D:\lucene\Data\record1.txt
Indexing D:\lucene\Data\record10.txt
Indexing D:\lucene\Data\record2.txt
Indexing D:\lucene\Data\record3.txt
Indexing D:\lucene\Data\record4.txt
Indexing D:\lucene\Data\record5.txt
Indexing D:\lucene\Data\record6.txt
Indexing D:\lucene\Data\record7.txt
Indexing D:\lucene\Data\record8.txt
Indexing D:\lucene\Data\record9.txt
10 File indexed, time taken: 63 ms
2 hits documents found. Time :69ms
File: D:\lucene\Data\record1.txt
File: D:\lucene\Data\record10.txt

Lucene - Indexing Operations

In this chapter, we'll discuss the four major operations of indexing. These operations are useful at various times and are used throughout of a software search application.

Indexing Operations

Following is a list of commonly-used operations during indexing process.

S.No. Operation & Description
1 Add Document

This operation is used in the initial stage of the indexing process to create the indexes on the newly available content.

2 Update Document

This operation is used to update indexes to reflect the changes in the updated contents. It is similar to recreating the index.

3 Delete Document

This operation is used to update indexes to exclude the documents which are not required to be indexed/searched.

4 Field Options

Field options specify a way or control the ways in which the contents of a field are to be made searchable.

Lucene - Add Document Operation

Add document is one of the core operations of the indexing process.

We add Document(s) containing Field(s) to IndexWriter where IndexWriter is used to update or create indexes.

We will now show you a step-wise approach and help you understand how to add a document using a basic example.

Add a document to an index

Follow these steps to add a document to an index −

Step 1 − Create a method to get a Lucene document from a text file.

Step 2 − Create various fields which are key value pairs containing keys as names and values as contents to be indexed.

Step 3 − Set field to be analyzed or not. In our case, only the content is to be analyzed as it can contain data such as a, am, are, an etc. which are not required in search operations.

Step 4 − Add the newly-created fields to the document object and return it to the caller method.

private Document getDocument(File file) throws IOException {
   Document document = new Document();

   //index file contents
   Field contentField = new TextField(LuceneConstants.CONTENTS, new FileReader(file));
   //index file name
   Field fileNameField = new StringField(LuceneConstants.FILE_NAME,
      file.getName(),Field.Store.YES);
   //index file path
   Field filePathField = new StringField(LuceneConstants.FILE_PATH,
      file.getCanonicalPath(),Field.Store.YES);

   document.add(contentField);
   document.add(fileNameField);
   document.add(filePathField);

   return document;
}    

Create a IndexWriter

IndexWriter class acts as a core component which creates/updates indexes during the indexing process.

Follow these steps to create a IndexWriter −

Step 1 − Create object of IndexWriter.

Step 2 − Create a Lucene directory which should point to location where indexes are to be stored.

Initialize the IndexWriter object created with the index directory, a standard analyzer having version information and other required/optional parameters.

private IndexWriter writer;

public Indexer(String indexDirectoryPath) throws IOException {
   //this directory will contain the indexes
   Directory indexDirectory = FSDirectory.open(Paths.get(indexDirectoryPath));
   StandardAnalyzer analyzer = new StandardAnalyzer();
   IndexWriterConfig config = new IndexWriterConfig(analyzer);
   writer = new IndexWriter(indexDirectory, config);
}

Add Document and Start Indexing Process

Following two are the ways to add the document.

  • addDocument(Document) − Adds the document using the default analyzer (specified when the index writer is created.)

  • addDocument(Document,Analyzer) − Adds the document using the provided analyzer.

private void indexFile(File file) throws IOException {
   System.out.println("Indexing "+file.getCanonicalPath());
   Document document = getDocument(file);
   writer.addDocument(document);
}

Example Application

To test the indexing process, we need to create Lucene application test.

Step Description
1 Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. To understand the indexing process, you can also use the project created in Lucene - First Application chapter as such for this chapter.
2 Create LuceneConstants.java,TextFileFilter.java and Indexer.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.
3 Create LuceneTester.java as mentioned below.
4 Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

TextFileFilter.java

This class is used as a .txt file filter.

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.FileFilter;

public class TextFileFilter implements FileFilter {

   @Override
   public boolean accept(File pathname) {
      return pathname.getName().toLowerCase().endsWith(".txt");
   }
}

Indexer.java

This class is used to index the raw data so that we can make it searchable using the Lucene library.

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.FileFilter;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Paths;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class Indexer {

   private IndexWriter writer;

   public Indexer(String indexDirectoryPath) throws IOException {
      //this directory will contain the indexes
	  Directory indexDirectory = FSDirectory.open(Paths.get(indexDirectoryPath));
      StandardAnalyzer analyzer = new StandardAnalyzer();
      IndexWriterConfig config = new IndexWriterConfig(analyzer);
      writer = new IndexWriter(indexDirectory, config);
   }

   public void close() throws CorruptIndexException, IOException {
      writer.close();
   }

   private Document getDocument(File file) throws IOException {
      Document document = new Document();

      //index file contents
      Field contentField = new TextField(LuceneConstants.CONTENTS, new FileReader(file));
      //index file name
      Field fileNameField = new StringField(LuceneConstants.FILE_NAME,
         file.getName(),Field.Store.YES);
      //index file path
      Field filePathField = new StringField(LuceneConstants.FILE_PATH,
         file.getCanonicalPath(),Field.Store.YES);

      document.add(contentField);
      document.add(fileNameField);
      document.add(filePathField);

      return document;
   }   

   private void indexFile(File file) throws IOException {
      System.out.println("Indexing "+file.getCanonicalPath());
      Document document = getDocument(file);
      writer.addDocument(document);
   }

   public int createIndex(String dataDirPath, FileFilter filter) 
      throws IOException {
      //get all files in the data directory
      File[] files = new File(dataDirPath).listFiles();

      for (File file : files) {
         if(!file.isDirectory()
            && !file.isHidden()
            && file.exists()
            && file.canRead()
            && filter.accept(file)
         ){
            indexFile(file);
         }
      }
      return writer.getDocStats().numDocs;
   }
}

LuceneTester.java

This class is used to test the indexing capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;

public class LuceneTester {
	
   String indexDir = "D:\\Lucene\\Index";
   String dataDir = "D:\\Lucene\\Data";
   Indexer indexer;
   
   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         tester.createIndex();
      } catch (IOException e) {
         e.printStackTrace();
      } 
   }

   private void createIndex() throws IOException {
      indexer = new Indexer(indexDir);
      int numIndexed;
      long startTime = System.currentTimeMillis();	
      numIndexed = indexer.createIndex(dataDir, new TextFileFilter());
      long endTime = System.currentTimeMillis();
      indexer.close();
      System.out.println(numIndexed+" File indexed, time taken: "
         +(endTime-startTime)+" ms");		
   }
}

Data & Index Directory Creation

We have used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory D:\Lucene\Data. Test Data. An index directory path should be created as D:\Lucene\Index. After running this program, you can see the list of index files created in that folder.

Running the Program

Once you are done with the creation of the source, creating the raw data, data directory and index directory, you are ready for this step which is compiling and running your program. To do this, keep LuceneTester.Java file tab active and use either Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Indexing D:\lucene\Data\record1.txt
Indexing D:\lucene\Data\record10.txt
Indexing D:\lucene\Data\record2.txt
Indexing D:\lucene\Data\record3.txt
Indexing D:\lucene\Data\record4.txt
Indexing D:\lucene\Data\record5.txt
Indexing D:\lucene\Data\record6.txt
Indexing D:\lucene\Data\record7.txt
Indexing D:\lucene\Data\record8.txt
Indexing D:\lucene\Data\record9.txt
10 File indexed, time taken: 88 ms

Once you've run the program successfully, you will have following content in your index directory

Lucene Index Directory

Lucene - Update Document Operation



Update document is another important operation as part of indexing process. This operation is used when already indexed contents are updated and indexes become invalid. This operation is also known as re-indexing.

We update Document(s) containing Field(s) to IndexWriter where IndexWriter is used to update indexes.

We will now show you a step-wise approach and help you understand how to update document using a basic example.

Update a Document to an Index

Follow this step to update a document to an index −

Step 1 − Create a method to update a Lucene document from an updated text file.

private void updateDocument(File file) throws IOException {
   Document document = new Document();
   String contents = "Updated Contents : ";
   try(BufferedReader reader = new BufferedReader(new FileReader(file))) {
      String line;
      while((line = reader.readLine()) != null) {
         contents += line;
      }
   }

   //update indexes for file contents
   writer.updateDocument(new Term
      (LuceneConstants.CONTENTS, 
      contents),document);
   }  

Create an IndexWriter

Follow these steps to create an IndexWriter −

Step 1 − IndexWriter class acts as a core component which creates/updates indexes during the indexing process.

Step 2 − Create object of IndexWriter.

Step 3 − Create a Lucene directory which should point to location where indexes are to be stored.

Step 4 − Initialize the IndexWriter object created with the index directory, a standard analyzer having version information and other required/optional parameters.

private IndexWriter writer;

public Indexer(String indexDirectoryPath) throws IOException {
   //this directory will contain the indexes
   Directory indexDirectory = FSDirectory.open(Paths.get(indexDirectoryPath));
   StandardAnalyzer analyzer = new StandardAnalyzer();
   IndexWriterConfig config = new IndexWriterConfig(analyzer);
   writer = new IndexWriter(indexDirectory, config);
}

Update document and start reindexing process

Following are the two ways to update the document.

  • updateDocument(Term, Document) − Delete the document containing the term and add the document using the default analyzer (specified when index writer is created).

  • updateDocument(Term, Document,Analyzer) − Delete the document containing the term and add the document using the provided analyzer.

private void indexFile(File file) throws IOException {
   System.out.println("Updating index for "+file.getCanonicalPath());
   updateDocument(file);
}

Example Application

To test the indexing process, let us create a Lucene application test.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the indexing process.

2

Create LuceneConstants.java,TextFileFilter.java and Indexer.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

TextFileFilter.java

This class is used as a .txt file filter.

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.FileFilter;

public class TextFileFilter implements FileFilter {

   @Override
   public boolean accept(File pathname) {
      return pathname.getName().toLowerCase().endsWith(".txt");
   }
}

Indexer.java

This class is used to index the raw data so that we can make it searchable using the Lucene library.

package com.tutorialspoint.lucene;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileFilter;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Paths;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class Indexer {

   private IndexWriter writer;

   public Indexer(String indexDirectoryPath) throws IOException {
      //this directory will contain the indexes
	  Directory indexDirectory = FSDirectory.open(Paths.get(indexDirectoryPath));
      StandardAnalyzer analyzer = new StandardAnalyzer();
      IndexWriterConfig config = new IndexWriterConfig(analyzer);
      writer = new IndexWriter(indexDirectory, config);
   }

   public void close() throws CorruptIndexException, IOException {
      writer.close();
   }

   private void indexFile(File file) throws IOException {
	   System.out.println("Updating index for "+file.getCanonicalPath());
      updateDocument(file);
   }
   
   private void updateDocument(File file) throws IOException {
      Document document = new Document();
      String contents = "Updated Contents : ";
      try(BufferedReader reader = new BufferedReader(new FileReader(file))) {
         String line;
         while((line = reader.readLine()) != null) {
            contents += line;
         }
      }

      //update indexes for file contents
      writer.updateDocument(new Term
         (LuceneConstants.CONTENTS, 
         contents),document);
   }   

   public int createIndex(String dataDirPath, FileFilter filter) 
      throws IOException {
      //get all files in the data directory
      File[] files = new File(dataDirPath).listFiles();

      for (File file : files) {
         if(!file.isDirectory()
            && !file.isHidden()
            && file.exists()
            && file.canRead()
            && filter.accept(file)
         ){
            indexFile(file);
         }
      }
      return writer.getDocStats().numDocs;
   }
}

LuceneTester.java

This class is used to test the indexing capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;

public class LuceneTester {
	
   String indexDir = "D:\\Lucene\\Index";
   String dataDir = "D:\\Lucene\\Data";
   Indexer indexer;
   
   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         tester.createIndex();
      } catch (IOException e) {
         e.printStackTrace();
      } 
   }

   private void createIndex() throws IOException {
      indexer = new Indexer(indexDir);
      int numIndexed;
      long startTime = System.currentTimeMillis();	
      numIndexed = indexer.createIndex(dataDir, new TextFileFilter());
      long endTime = System.currentTimeMillis();
      indexer.close();
   }
}

Data & Index Directory Creation

Here, we have used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory D:\Lucene\Data. Test Data. An index directory path should be created as D:\Lucene\Index. After running this program, you can see the list of index files created in that folder.

Running the Program

Once you are done with the creation of the source, the raw data, the data directory and the index directory, you can proceed with the compiling and running of your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

Updating index for D:\lucene\Data\record1.txt
Updating index for D:\lucene\Data\record10.txt
Updating index for D:\lucene\Data\record2.txt
Updating index for D:\lucene\Data\record3.txt
Updating index for D:\lucene\Data\record4.txt
Updating index for D:\lucene\Data\record5.txt
Updating index for D:\lucene\Data\record6.txt
Updating index for D:\lucene\Data\record7.txt
Updating index for D:\lucene\Data\record8.txt
Updating index for D:\lucene\Data\record9.txt
10 File indexed, time taken: 50 ms

Once you've run the above program successfully, you will have the following content in your index directory

Lucene Index Directory

Lucene - Delete Document Operation

Delete document is another important operation of the indexing process. This operation is used when already indexed contents are updated and indexes become invalid or indexes become very large in size, then in order to reduce the size and update the index, delete operations are carried out.

We delete Document(s) containing Field(s) to IndexWriter where IndexWriter is used to update indexes.

We will now show you a step-wise approach and make you understand how to delete a document using a basic example.

Delete a document from an index

Follow these steps to delete a document from an index −

Step 1 − Create a method to delete a Lucene document of an obsolete text file.

private void deleteDocument(File file) throws IOException {   
   //delete indexes for a file
   writer.deleteDocument(new Term(LuceneConstants.FILE_NAME,file.getName())); 
   writer.commit();
}   

Create an IndexWriter

IndexWriter class acts as a core component which creates/updates indexes during the indexing process.

Follow these steps to create an IndexWriter −

Step 1 − Create object of IndexWriter.

Step 2 − Create a Lucene directory which should point to a location where indexes are to be stored.

Step 3 − Initialize the IndexWriter object created with the index directory, a standard analyzer having the version information and other required/optional parameters.

private IndexWriter writer;

public Indexer(String indexDirectoryPath) throws IOException {
   //this directory will contain the indexes
   Directory indexDirectory = FSDirectory.open(Paths.get(indexDirectoryPath));
   StandardAnalyzer analyzer = new StandardAnalyzer();
   IndexWriterConfig config = new IndexWriterConfig(analyzer);
   writer = new IndexWriter(indexDirectory, config);
}

Delete Document and Start Reindexing Process

Following are the ways to delete the document.

  • deleteDocuments(Term) − Delete all the documents containing the term.

  • deleteDocuments(Term[]) − Delete all the documents containing any of the terms in the array.

  • deleteDocuments(Query) − Delete all the documents matching the query.

  • deleteDocuments(Query[]) − Delete all the documents matching the query in the array.

  • deleteAll() − Delete all the documents.

private void indexFile(File file) throws IOException {
   System.out.println("Deleting index for "+file.getCanonicalPath());
   deleteDocument(file);   
}

Example Application

To test the indexing process, let us create a Lucene application test.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the indexing process.

2

Create LuceneConstants.java,TextFileFilter.java and Indexer.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class provides various constants that can be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

TextFileFilter.java

This class is used as a .txt file filter.

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.FileFilter;

public class TextFileFilter implements FileFilter {

   @Override
   public boolean accept(File pathname) {
      return pathname.getName().toLowerCase().endsWith(".txt");
   }
}

Indexer.java

This class is used to index the raw data thereby, making it searchable using the Lucene library.

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.FileFilter;
import java.io.IOException;
import java.nio.file.Paths;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class Indexer {

   private IndexWriter writer;

   public Indexer(String indexDirectoryPath) throws IOException {
      //this directory will contain the indexes
	  Directory indexDirectory = FSDirectory.open(Paths.get(indexDirectoryPath));
      StandardAnalyzer analyzer = new StandardAnalyzer();
      IndexWriterConfig config = new IndexWriterConfig(analyzer);
      writer = new IndexWriter(indexDirectory, config);
   }

   public void close() throws CorruptIndexException, IOException {
      writer.close();
   }

   private void deleteDocument(File file) throws IOException {
      //delete indexes for a file
      writer.deleteDocuments(
      new Term(LuceneConstants.FILE_NAME,file.getName())); 
      writer.commit();  
   }  

   private void indexFile(File file) throws IOException {
      System.out.println("Deleting index: "+file.getCanonicalPath());
      deleteDocument(file);      
   }

   public int createIndex(String dataDirPath, FileFilter filter) 
      throws IOException {
      //get all files in the data directory
      File[] files = new File(dataDirPath).listFiles();

      for (File file : files) {
         if(!file.isDirectory()
            && !file.isHidden()
            && file.exists()
            && file.canRead()
            && filter.accept(file)
         ){
            indexFile(file);
         }
      }
      return writer.getDocStats().numDocs;
   }
}

LuceneTester.java

This class is used to test the indexing capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;

public class LuceneTester {
	
   String indexDir = "D:\\Lucene\\Index";
   String dataDir = "D:\\Lucene\\Data";
   Indexer indexer;
   
   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         tester.createIndex();
      } catch (IOException e) {
         e.printStackTrace();
      } 
   }

   private void createIndex() throws IOException {
      indexer = new Indexer(indexDir);
      int numIndexed;
      long startTime = System.currentTimeMillis();	
      numIndexed = indexer.createIndex(dataDir, new TextFileFilter());
      long endTime = System.currentTimeMillis();
      indexer.close();
   }
}

Data & Index Directory Creation

Weve used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory D:\Lucene\Data. Test Data. An index directory path should be created as D:\Lucene\Index. After running this program, you can see the list of index files created in that folder.

Running the Program

Once you are done with the creation of the source, the raw data, the data directory and the index directory, you can compile and run your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

Deleting index: D:\lucene\Data\record1.txt
Deleting index: D:\lucene\Data\record10.txt
Deleting index: D:\lucene\Data\record2.txt
Deleting index: D:\lucene\Data\record3.txt
Deleting index: D:\lucene\Data\record4.txt
Deleting index: D:\lucene\Data\record5.txt
Deleting index: D:\lucene\Data\record6.txt
Deleting index: D:\lucene\Data\record7.txt
Deleting index: D:\lucene\Data\record8.txt
Deleting index: D:\lucene\Data\record9.txt
10 File indexed, time taken: 325 ms

Once you've run the program successfully, you will have following content in your index directory

Lucene Index Directory

Lucene - Field Options/ Field Type

Field is the most important unit of the indexing process. It is the actual object containing the contents to be indexed. When we add a field, Lucene provides numerous controls on the field using the Field Options which state how much a field is to be searchable.

We add Document(s) containing Field(s) to IndexWriter where IndexWriter is used to update or create indexes.

We will now show you a step-wise approach and help you understand the various Field Options using a basic example.

Various Field Options using FieldType Object

Following are the various field options −

  • FieldType.setTokenized(true) − In this, we first analyze, then do indexing. This is used for normal text indexing. Analyzer will break the field's value into stream of tokens and each token is searchable separately.

  • FieldType.setTokenized(false) − In this, we do not analyze but do indexing. This is used for complete text indexing. For example, person's names, URL etc.

  • FieldType.omitNorms(true) − This is a variant of FieldType.setTokenized(true). The Analyzer will break the field's value into stream of tokens and each token is searchable separately. However, the NORMs are not stored in the indexes. NORMS are used to boost searching and this often ends up consuming a lot of memory.

  • FieldType.omitNorms(false) − This is variant of FieldType.setTokenized(false). Indexing is done but NORMS are not stored in the indexes.

  • FieldType.setIndexOptions(IndexOptions.NONE) − Field value is not searchable.

Use of Field Options

Following are the different ways in which the Field Options can be used −

  • To create a method to get a Lucene document from a text file.

  • To create various types of fields which are key value pairs containing keys as names and values as contents to be indexed.

  • To set field to be analyzed or not. In our case, only content is to be analyzed as it can contain data such as a, am, are, an, etc. which are not required in search operations.

  • To add the newly-created fields to the document object and return it to the caller method.

private Document getDocument(File file) throws IOException {
   Document document = new Document();

   //index file contents
   Field contentField = new TextField(LuceneConstants.CONTENTS, 
      new FileReader(file));

   FieldType type = new FieldType();
   type.setStored(true);
   type.setTokenized(false);
   type.setIndexOptions(IndexOptions.DOCS);
   type.setOmitNorms(true);


   //index file name
   Field fileNameField = new Field(LuceneConstants.FILE_NAME,
   file.getName(),type);

   //index file path
   Field filePathField = new Field(LuceneConstants.FILE_PATH,
   file.getCanonicalPath(),type);

   document.add(contentField);
   document.add(fileNameField);
   document.add(filePathField);

   return document;
}   

Example Application

To test the indexing process, we need to create a Lucene application test.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the indexing process.

2

Create LuceneConstants.java,TextFileFilter.java and Indexer.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure the business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

TextFileFilter.java

This class is used as a .txt file filter.

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.FileFilter;

public class TextFileFilter implements FileFilter {

   @Override
   public boolean accept(File pathname) {
      return pathname.getName().toLowerCase().endsWith(".txt");
   }
}

Indexer.java

This class is used to index the raw data so that we can make it searchable using the Lucene library.

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.FileFilter;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Paths;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexOptions;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class Indexer {

   private IndexWriter writer;

   public Indexer(String indexDirectoryPath) throws IOException {
      //this directory will contain the indexes
      Directory indexDirectory = FSDirectory.open(Paths.get(indexDirectoryPath));
      StandardAnalyzer analyzer = new StandardAnalyzer();
      IndexWriterConfig config = new IndexWriterConfig(analyzer);
      writer = new IndexWriter(indexDirectory, config);
   }

   public void close() throws CorruptIndexException, IOException {
      writer.close();
   }

   private Document getDocument(File file) throws IOException {
      Document document = new Document();

      //index file contents
      Field contentField = new TextField(LuceneConstants.CONTENTS, 
         new FileReader(file));

      FieldType type = new FieldType();
      type.setStored(true);
      type.setTokenized(false);
      type.setIndexOptions(IndexOptions.DOCS);
      type.setOmitNorms(true);

      //index file name
      Field fileNameField = new Field(LuceneConstants.FILE_NAME,
      file.getName(),type);

      //index file path
      Field filePathField = new Field(LuceneConstants.FILE_PATH,
      file.getCanonicalPath(),type);

      document.add(contentField);
      document.add(fileNameField);
      document.add(filePathField);

      return document;
   }   

   private void indexFile(File file) throws IOException {
      System.out.println("Indexing "+file.getCanonicalPath());
      Document document = getDocument(file);
      writer.addDocument(document);
   }

   public int createIndex(String dataDirPath, FileFilter filter) 
      throws IOException {
      //get all files in the data directory
      File[] files = new File(dataDirPath).listFiles();

      for (File file : files) {
         if(!file.isDirectory()
            && !file.isHidden()
            && file.exists()
            && file.canRead()
            && filter.accept(file)
         ){
            indexFile(file);
         }
      }
      return writer.getDocStats().numDocs;
   }
}

LuceneTester.java

This class is used to test the indexing capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;

public class LuceneTester {
	
   String indexDir = "D:\\Lucene\\Index";
   String dataDir = "D:\\Lucene\\Data";
   Indexer indexer;
   
   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         tester.createIndex();
      } catch (IOException e) {
         e.printStackTrace();
      } 
   }

   private void createIndex() throws IOException {
      indexer = new Indexer(indexDir);
      int numIndexed;
      long startTime = System.currentTimeMillis();	
      numIndexed = indexer.createIndex(dataDir, new TextFileFilter());
      long endTime = System.currentTimeMillis();
      indexer.close();
      System.out.println(numIndexed+" File indexed, time taken: "
         +(endTime-startTime)+" ms");		
   }
}

Data & Index Directory Creation

We have used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory D:\Lucene\Data. Test Data. An index directory path should be created as D:\Lucene\Index. After running this program, you can see the list of index files created in that folder.

Running the Program

Once you are done with the creation of the source, the raw data, the data directory and the index directory, you can compile and run your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Indexing D:\lucene\Data\record1.txt
Indexing D:\lucene\Data\record10.txt
Indexing D:\lucene\Data\record2.txt
Indexing D:\lucene\Data\record3.txt
Indexing D:\lucene\Data\record4.txt
Indexing D:\lucene\Data\record5.txt
Indexing D:\lucene\Data\record6.txt
Indexing D:\lucene\Data\record7.txt
Indexing D:\lucene\Data\record8.txt
Indexing D:\lucene\Data\record9.txt
10 File indexed, time taken: 60 ms

Once you've run the program successfully, you will have following content in your index directory

Lucene Index Directory

Lucene - Query Programming

We have seen in previous chapter Lucene - Search Operation, Lucene uses IndexSearcher to make searches and it uses the Query object created by QueryParser as the input. In this chapter, we are going to discuss various types of Query objects and the different ways to create them programmatically. Creating different types of Query object gives control on the kind of search to be made.

Consider a case of Advanced Search, provided by many applications where users are given multiple options to confine the search results. By Query programming, we can achieve the same very easily.

Query Types

Following is the list of Query types that we'll discuss in due course.

S.No. Class & Description
1 TermQuery

This class acts as a core component which creates/updates indexes during the indexing process.

2 TermRangeQuery

TermRangeQuery is used when a range of textual terms are to be searched.

3 PrefixQuery

PrefixQuery is used to match documents whose index starts with a specified string.

4 BooleanQuery

BooleanQuery is used to search documents which are result of multiple queries using AND, OR or NOT operators.

5 PhraseQuery

Phrase query is used to search documents which contain a particular sequence of terms.

6 WildCardQuery

WildcardQuery is used to search documents using wildcards like '*' for any character sequence,? matching a single character.

7 FuzzyQuery

FuzzyQuery is used to search documents using fuzzy implementation that is an approximate search based on the edit distance algorithm.

8 MatchAllDocsQuery

MatchAllDocsQuery as the name suggests matches all the documents.

9 MatchNoDocsQuery

MatchAllDocsQuery as the name suggests matches no document.

10 RegexpQuery

RegexpQuery provides a fast regular expression based query.

Lucene - TermQuery

TermQuery is the most commonly-used query object and is the foundation of many complex queries that Lucene can make use of. It is used to retrieve documents based on the key which is case sensitive.

Class Declaration

Following is the declaration for org.apache.lucene.search.TermQuery class −

public class TermQuery
   extends Query
S.No. Constructor & Description
1

TermQuery(Term t)

Constructs a query for the term t.

2

TermQuery(Term t, TermStates states)

Expert: constructs a TermQuery that will use the provided docFreq instead of looking up the docFreq against the searcher.

S.No. Method & Description
1

Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost)

Expert: Constructs an appropriate Weight implementation for this query.

2

boolean equals(Object other)

Returns true iff other is equal to this.

3

Term getTerm()

Returns the term of this query.

4

TermStates getTermStates()

Returns the TermStates passed to the constructor, or null if it was not passed.

5

int hashCode()

Override and implement query hash code properly in a subclass.

6

String toString(String field)

Prints a user-readable version of this query.

7

void visit(QueryVisitor visitor)

Recurse through the query tree, visiting any child queries.

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.search.Query
  • java.lang.Object

Usage of TermQuery

private void searchUsingTermQuery(
   String searchQuery)throws IOException, ParseException {
   searcher = new Searcher(indexDir);
   long startTime = System.currentTimeMillis();
   
   //create a term to search file name
   Term term = new Term(LuceneConstants.FILE_NAME, searchQuery);
   //create the term query object
   Query query = new TermQuery(term);
   //do the search
   TopDocs hits = searcher.search(query);
   long endTime = System.currentTimeMillis();

   System.out.println(hits.totalHits +
      " documents found. Time :" + (endTime - startTime) + "ms");
   for(ScoreDoc scoreDoc : hits.scoreDocs) {
      Document doc = searcher.getDocument(scoreDoc);
      System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH));
   }
   searcher.close();
}

Example Application

To test search using TermQuery, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java and Searcher.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

Searcher.java

This class is used to read the indexes made on raw data and searches data using the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.QueryBuilder;

public class Searcher {

   IndexSearcher indexSearcher;
   QueryBuilder queryBuilder;
   Query query;

   public Searcher(String indexDirectoryPath) 
      throws IOException {
      DirectoryReader indexDirectory = DirectoryReader.open(FSDirectory.open(Paths.get(indexDirectoryPath)));
      indexSearcher = new IndexSearcher(indexDirectory);
      StandardAnalyzer analyzer = new StandardAnalyzer();
      queryBuilder = new QueryBuilder(analyzer);
   }

   public TopDocs search( String searchQuery) 
      throws IOException, ParseException {
      query = queryBuilder.createPhraseQuery(LuceneConstants.CONTENTS, searchQuery);
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public TopDocs search(Query query) throws IOException, ParseException {
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public Document getDocument(ScoreDoc scoreDoc) throws CorruptIndexException, IOException {
      return indexSearcher.storedFields().document(scoreDoc.doc);	
   }
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.text.ParseException;

import org.apache.lucene.document.Document;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;

public class LuceneTester {

   String indexDir = "D:\\Lucene\\Index";
   String dataDir = "D:\\Lucene\\Data";
   Searcher searcher;

   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         tester.searchUsingTermQuery("record4.txt");
      } catch (IOException e) {
         e.printStackTrace();
      } catch (ParseException e) {
         e.printStackTrace();
      }
   }

   private void searchUsingTermQuery(
      String searchQuery)throws IOException, ParseException {
      searcher = new Searcher(indexDir);
      long startTime = System.currentTimeMillis();

      //create a term to search file name
      Term term = new Term(LuceneConstants.FILE_NAME, searchQuery);
      //create the term query object
      Query query = new TermQuery(term);
      //do the search
      TopDocs hits = searcher.search(query);
      long endTime = System.currentTimeMillis();

      System.out.println(hits.totalHits +
         " documents found. Time :" + (endTime - startTime) + "ms");
      for(ScoreDoc scoreDoc : hits.scoreDocs) {
         Document doc = searcher.getDocument(scoreDoc);
         System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH));
      }
   }
}

Data & Index Directory Creation

I've used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory D:\Lucene\Data. Test Data. An index directory path should be created as D:\Lucene\Index. After running the indexing program in the chapter Lucene - Indexing Process, you can see the list of index files created in that folder.

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

1 documents found. Time :13 ms
File: D:\Lucene\Data\record4.txt

Lucene - TermRangeQuery

TermRangeQuery is used when a range of textual terms are to be searched.

Class Declaration

Following is the declaration for org.apache.lucene.search.TermRangeQuery class −

public class TermRangeQuery
   extends AutomatonQuery
S.No. Constructor & Description
1

TermRangeQuery(String field, BytesRef lowerTerm, BytesRef upperTerm, boolean includeLower, boolean includeUpper)

Constructs a query selecting all terms greater/equal than lowerTerm but less/equal than upperTerm.

2

TermRangeQuery(String field, BytesRef lowerTerm, BytesRef upperTerm, boolean includeLower, boolean includeUpper, MultiTermQuery.RewriteMethod rewriteMethod)

Constructs a query selecting all terms greater/equal than lowerTerm but less/equal than upperTerm.

S.No. Method & Description
1

boolean equals(Object obj)

Override and implement query instance equivalence properly in a subclass.

2

BytesRef getLowerTerm()

Returns the lower value of this range query.

3

BytesRef getUpperTerm()

Returns the upper value of this range query.

4

int hashCode()

Override and implement query hash code properly in a subclass.

5

boolean includesLower()

Returns true if the lower endpoint is inclusive.

6

boolean includesUpper()

Returns true if the upper endpoint is inclusive.

7

static TermRangeQuery newStringRange(String field, String lowerTerm, String upperTerm, boolean includeLower, boolean includeUpper)

Factory that creates a new TermRangeQuery using Strings for term text.

8

static TermRangeQuery newStringRange(String field, String lowerTerm, String upperTerm, boolean includeLower, boolean includeUpper, MultiTermQuery.RewriteMethod rewriteMethod)

Factory that creates a new TermRangeQuery using Strings for term text.

9

static Automaton toAutomaton(BytesRef lowerTerm, BytesRef upperTerm, boolean includeLower, boolean includeUpper)

 

10

String toString(String field)

Prints a user-readable version of this query.

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.search.AutomatonQuery
  • org.apache.lucene.search.MultiTermQuery
  • org.apache.lucene.search.Query
  • java.lang.Object

Usage of TermRangeQuery

private void searchUsingTermRangeQuery(String searchQueryMin,
   String searchQueryMax)throws IOException, ParseException {
   searcher = new Searcher(indexDir);
   long startTime = System.currentTimeMillis();

   //create the term query object
   Query query = TermRangeQuery.newStringRange(LuceneConstants.FILE_NAME, 
   searchQueryMin,searchQueryMax,true,false);
   //do the search
   TopDocs hits = searcher.search(query);
   long endTime = System.currentTimeMillis();

   System.out.println(hits.totalHits +
      " documents found. Time :" + (endTime - startTime) + "ms");
   for(ScoreDoc scoreDoc : hits.scoreDocs) {
      Document doc = searcher.getDocument(scoreDoc);
      System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH));
   }
   searcher.close();
}

Example Application

To test search using TermRangeQuery, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java and Searcher.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

Searcher.java

This class is used to read the indexes made on raw data and searches data using the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.QueryBuilder;

public class Searcher {

   IndexSearcher indexSearcher;
   QueryBuilder queryBuilder;
   Query query;

   public Searcher(String indexDirectoryPath) 
      throws IOException {
      DirectoryReader indexDirectory = DirectoryReader.open(FSDirectory.open(Paths.get(indexDirectoryPath)));
      indexSearcher = new IndexSearcher(indexDirectory);
      StandardAnalyzer analyzer = new StandardAnalyzer();
      queryBuilder = new QueryBuilder(analyzer);
   }

   public TopDocs search( String searchQuery) 
      throws IOException, ParseException {
      query = queryBuilder.createPhraseQuery(LuceneConstants.CONTENTS, searchQuery);
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public TopDocs search(Query query) throws IOException, ParseException {
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public Document getDocument(ScoreDoc scoreDoc) throws CorruptIndexException, IOException {
      return indexSearcher.storedFields().document(scoreDoc.doc);	
   }
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.text.ParseException;

import org.apache.lucene.document.Document;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermRangeQuery;
import org.apache.lucene.search.TopDocs;

public class LuceneTester {
	
   String indexDir = "D:\\Lucene\\Index";
   String dataDir = "D:\\Lucene\\Data";
   Searcher searcher;

   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         tester.searchUsingTermRangeQuery("record2.txt","record6.txt");
      } catch (IOException e) {
         e.printStackTrace();
      } catch (ParseException e) {
         e.printStackTrace();
      }
   }

   private void searchUsingTermRangeQuery(String searchQueryMin,
      String searchQueryMax)throws IOException, ParseException {
      searcher = new Searcher(indexDir);
      long startTime = System.currentTimeMillis();

      //create the term query object
      Query query = TermRangeQuery.newStringRange(LuceneConstants.FILE_NAME, 
      searchQueryMin,searchQueryMax,true,false);
      //do the search
      TopDocs hits = searcher.search(query);
      long endTime = System.currentTimeMillis();

      System.out.println(hits.totalHits +
         " documents found. Time :" + (endTime - startTime) + "ms");
      for(ScoreDoc scoreDoc : hits.scoreDocs) {
         Document doc = searcher.getDocument(scoreDoc);
         System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH));
      }
   }
}

Data & Index Directory Creation

I've used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory D:\Lucene\Data. Test Data. An index directory path should be created as D:\Lucene\Index. After running the indexing program in the chapter Lucene - Indexing Process, you can see the list of index files created in that folder.

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

4 hits documents found. Time :75ms
File: D:\lucene\Data\record2.txt
File: D:\lucene\Data\record3.txt
File: D:\lucene\Data\record4.txt
File: D:\lucene\Data\record5.txt

Lucene - PrefixQuery

PrefixQuery class is used to match documents whose index start with a specified string.

Class Declaration

Following is the declaration for org.apache.lucene.search.PrefixQuery class −

public class PrefixQuery
   extends AutomatonQuery
S.No. Constructor & Description
1

PrefixQuery(Term prefix)

Constructs a query for terms starting with prefix.

2

PrefixQuery(Term prefix, MultiTermQuery.RewriteMethod rewriteMethod)

Constructs a query for terms starting with prefix using a defined RewriteMethod

S.No. Method & Description
1

boolean equals(Object obj)

Override and implement query instance equivalence properly in a subclass.

2

Term getPrefix()

Returns the prefix of this query.

3

int hashCode()

Override and implement query hash code properly in a subclass.

4

static Automaton toAutomaton(BytesRef prefix)

Build an automaton accepting all terms with the specified prefix.

5

String toString(String field)

Prints a user-readable version of this query.

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.search.AutomatonQuery
  • org.apache.lucene.search.MultiTermQuery
  • org.apache.lucene.search.Query
  • java.lang.Object

Usage of PrefixQuery

private void searchUsingPrefixQuery(String searchQuery)
   throws IOException, ParseException {
   searcher = new Searcher(indexDir);
   long startTime = System.currentTimeMillis();

   //create a term to search file name
   Term term = new Term(LuceneConstants.FILE_NAME, searchQuery);
   //create the term query object
   Query query = new PrefixQuery(term);
   //do the search
   TopDocs hits = searcher.search(query);
   long endTime = System.currentTimeMillis();

   System.out.println(hits.totalHits +
      " documents found. Time :" + (endTime - startTime) + "ms");
   for(ScoreDoc scoreDoc : hits.scoreDocs) {
      Document doc = searcher.getDocument(scoreDoc);
      System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH));
   }
}

Example Application

To test search using PrefixQuery, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java and Searcher.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

Searcher.java

This class is used to read the indexes made on raw data and searches data using the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.QueryBuilder;

public class Searcher {

   IndexSearcher indexSearcher;
   QueryBuilder queryBuilder;
   Query query;

   public Searcher(String indexDirectoryPath) 
      throws IOException {
      DirectoryReader indexDirectory = DirectoryReader.open(FSDirectory.open(Paths.get(indexDirectoryPath)));
      indexSearcher = new IndexSearcher(indexDirectory);
      StandardAnalyzer analyzer = new StandardAnalyzer();
      queryBuilder = new QueryBuilder(analyzer);
   }

   public TopDocs search( String searchQuery) 
      throws IOException, ParseException {
      query = queryBuilder.createPhraseQuery(LuceneConstants.CONTENTS, searchQuery);
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public TopDocs search(Query query) throws IOException, ParseException {
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public Document getDocument(ScoreDoc scoreDoc) throws CorruptIndexException, IOException {
      return indexSearcher.storedFields().document(scoreDoc.doc);	
   }
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.text.ParseException;

import org.apache.lucene.document.Document;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.PrefixQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;

public class LuceneTester {

   String indexDir = "D:\\Lucene\\Index";
   String dataDir = "D:\\Lucene\\Data";
   Searcher searcher;

   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         tester.searchUsingPrefixQuery("record1");
      } catch (IOException e) {
         e.printStackTrace();
      } catch (ParseException e) {
         e.printStackTrace();
      }
   }

   private void searchUsingPrefixQuery(String searchQuery)
      throws IOException, ParseException {
      searcher = new Searcher(indexDir);
      long startTime = System.currentTimeMillis();

      //create a term to search file name
      Term term = new Term(LuceneConstants.FILE_NAME, searchQuery);
      //create the term query object
      Query query = new PrefixQuery(term);
      //do the search
      TopDocs hits = searcher.search(query);
      long endTime = System.currentTimeMillis();

      System.out.println(hits.totalHits +
         " documents found. Time :" + (endTime - startTime) + "ms");
      for(ScoreDoc scoreDoc : hits.scoreDocs) {
         Document doc = searcher.getDocument(scoreDoc);
         System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH));
      }
   }
}

Data & Index Directory Creation

I've used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory D:\Lucene\Data. Test Data. An index directory path should be created as D:\Lucene\Index. After running the indexing program in the chapter Lucene - Indexing Process, you can see the list of index files created in that folder.

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

2 hits documents found. Time :87ms
File: D:\lucene\Data\record1.txt
File: D:\lucene\Data\record10.txt

Lucene - BooleanQuery

BooleanQuery class is used to search documents which are a result of multiple queries using AND, OR or NOT operators.

Class Declaration

Following is the declaration for org.apache.lucene.search.BooleanQuery class −

public class BooleanQuery
   extends Query
      implements Iterable<BooleanClause>
S.No. Method & Description
1

List<BooleanClause> clauses()

Return a list of the clauses of this BooleanQuery.

2

Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost)

Expert: Constructs an appropriate Weight implementation for this query.

3

boolean equals(Object o)

Compares the specified object with this boolean query for equality.

4

Collection<Query> getClauses(BooleanClause.Occur occur)

Return the collection of queries for the given BooleanClause.Occur.

5

int getMinimumNumberShouldMatch()

Gets the minimum number of the optional BooleanClauses which must be satisfied.

6

int hashCode()

Override and implement query hash code properly in a subclass.

7

final Iterator<BooleanClause> iterator()

Returns an iterator on the clauses in this query.

8

Query rewrite(IndexSearcher indexSearcher)

Expert: called to re-write queries into primitive queries.

9

String toString(String field)

Prints a user-readable version of this query.

10

void visit(QueryVisitor visitor)

Recurse through the query tree, visiting any child queries.

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.search.Query
  • java.lang.Object

Usage of BooleanQuery

private void searchUsingBooleanQuery(String searchQuery1,
   String searchQuery2) throws IOException, ParseException {
   searcher = new Searcher(indexDir);
   long startTime = System.currentTimeMillis();

   //create a term to search file name
   Term term1 = new Term(LuceneConstants.FILE_NAME, searchQuery1);
   //create the term query object
   Query query1 = new TermQuery(term1);

   Term term2 = new Term(LuceneConstants.FILE_NAME, searchQuery2);
   //create the term query object
   Query query2 = new PrefixQuery(term2);

   BooleanQuery query = new BooleanQuery.Builder()
      .add(query1,BooleanClause.Occur.MUST_NOT)
      .add(query2,BooleanClause.Occur.MUST)
      .build();

   TopDocs hits = searcher.search(query);
   long endTime = System.currentTimeMillis();

   System.out.println(hits.totalHits +
      " documents found. Time :" + (endTime - startTime) + "ms");
   for(ScoreDoc scoreDoc : hits.scoreDocs) {
      Document doc = searcher.getDocument(scoreDoc);
      System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH));
   }
}

Example Application

To test search using BooleanQuery, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java and Searcher.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

Searcher.java

This class is used to read the indexes made on raw data and searches data using the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.QueryBuilder;

public class Searcher {

   IndexSearcher indexSearcher;
   QueryBuilder queryBuilder;
   Query query;

   public Searcher(String indexDirectoryPath) 
      throws IOException {
      DirectoryReader indexDirectory = DirectoryReader.open(FSDirectory.open(Paths.get(indexDirectoryPath)));
      indexSearcher = new IndexSearcher(indexDirectory);
      StandardAnalyzer analyzer = new StandardAnalyzer();
      queryBuilder = new QueryBuilder(analyzer);
   }

   public TopDocs search( String searchQuery) 
      throws IOException, ParseException {
      query = queryBuilder.createPhraseQuery(LuceneConstants.CONTENTS, searchQuery);
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public TopDocs search(Query query) throws IOException, ParseException {
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public Document getDocument(ScoreDoc scoreDoc) throws CorruptIndexException, IOException {
      return indexSearcher.storedFields().document(scoreDoc.doc);	
   }
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.text.ParseException;

import org.apache.lucene.document.Document;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.PrefixQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;

public class LuceneTester {

   String indexDir = "D:\\Lucene\\Index";
   String dataDir = "D:\\Lucene\\Data";
   Searcher searcher;

   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         tester.searchUsingBooleanQuery("record1.txt","record1");
      } catch (IOException e) {
         e.printStackTrace();
      } catch (ParseException e) {
         e.printStackTrace();
      }
   }

   private void searchUsingBooleanQuery(String searchQuery1,
      String searchQuery2)throws IOException, ParseException {
      searcher = new Searcher(indexDir);
      long startTime = System.currentTimeMillis();

      //create a term to search file name
      Term term1 = new Term(LuceneConstants.FILE_NAME, searchQuery1);
      //create the term query object
      Query query1 = new TermQuery(term1);

      Term term2 = new Term(LuceneConstants.FILE_NAME, searchQuery2);
      //create the term query object
      Query query2 = new PrefixQuery(term2);

      BooleanQuery query = new BooleanQuery.Builder()
         .add(query1,BooleanClause.Occur.MUST_NOT)
         .add(query2,BooleanClause.Occur.MUST)
         .build();

      TopDocs hits = searcher.search(query);
      long endTime = System.currentTimeMillis();

      System.out.println(hits.totalHits +
         " documents found. Time :" + (endTime - startTime) + "ms");
      for(ScoreDoc scoreDoc : hits.scoreDocs) {
         Document doc = searcher.getDocument(scoreDoc);
         System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH));
      }
   }
}

Data & Index Directory Creation

I've used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory D:\Lucene\Data. Test Data. An index directory path should be created as D:\Lucene\Index. After running the indexing program in the chapter Lucene - Indexing Process, you can see the list of index files created in that folder.

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

1 hits documents found. Time :96ms
File: D:\lucene\Data\record10.txt

Lucene - PhraseQuery

PhraseQuery class is used to search documents which contain a particular sequence of terms.

Class Declaration

Following is the declaration for org.apache.lucene.search.PhraseQuery class −

public class PhraseQuery
   extends Query
S.No. Constructor & Description
1

PhraseQuery(int slop, String field, String... terms)

Create a phrase query which will match documents that contain the given list of terms at consecutive positions in field, and at a maximum edit distance of slop.

2

PhraseQuery(int slop, String field, BytesRef... terms)

Create a phrase query which will match documents that contain the given list of terms at consecutive positions in field, and at a maximum edit distance of slop.

3

PhraseQuery(String field, String... terms)

Create a phrase query which will match documents that contain the given list of terms at consecutive positions in field.

4

PhraseQuery(String field, BytesRef... terms)

Create a phrase query which will match documents that contain the given list of terms at consecutive positions in field.

S.No. Method & Description
1

List<BooleanClause> clauses()

Return a list of the clauses of this BooleanQuery.

2

Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost)

Expert: Constructs an appropriate Weight implementation for this query.

3

boolean equals(Object other)

Returns true iff o is equal to this.

4

String getField()

Returns the field this query applies to

5

int[] getPositions()

Returns the relative positions of terms in this phrase.

6

int getSlop()

Return the slop for this PhraseQuery.

7

Term[] getTerms()

Returns the list of terms in this phrase.

8

int hashCode()

Returns a hash code value for this object.

9

Query rewrite(IndexSearcher indexSearcher)

Expert: called to re-write queries into primitive queries.

10

static float termPositionsCost(TermsEnum termsEnum)

Returns an expected cost in simple operations of processing the occurrences of a term in a document that contains the term.

11

String toString(String f)

Prints a user-readable version of this query.

12

void visit(QueryVisitor visitor)

Recurse through the query tree, visiting any child queries.

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.search.Query
  • java.lang.Object

Usage of PhraseQuery

private void searchUsingPhraseQuery(String[] phrases)
   throws IOException, ParseException {
   searcher = new Searcher(indexDir);
   long startTime = System.currentTimeMillis();
   PhraseQuery.Builder queryBuilder = new PhraseQuery.Builder();
   queryBuilder.setSlop(0);

   for(String word:phrases) {
      queryBuilder.add(new Term(LuceneConstants.FILE_NAME,word));
   }

   PhraseQuery query = queryBuilder.build();

   //do the search
   TopDocs hits = searcher.search(query);
   long endTime = System.currentTimeMillis();

   System.out.println(hits.totalHits +
      " documents found. Time :" + (endTime - startTime) + "ms");
   for(ScoreDoc scoreDoc : hits.scoreDocs) {
      Document doc = searcher.getDocument(scoreDoc);
      System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH));
   }
}

Example Application

To test search using BooleanQuery, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java and Searcher.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

Searcher.java

This class is used to read the indexes made on raw data and searches data using the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.QueryBuilder;

public class Searcher {

   IndexSearcher indexSearcher;
   QueryBuilder queryBuilder;
   Query query;

   public Searcher(String indexDirectoryPath) 
      throws IOException {
      DirectoryReader indexDirectory = DirectoryReader.open(FSDirectory.open(Paths.get(indexDirectoryPath)));
      indexSearcher = new IndexSearcher(indexDirectory);
      StandardAnalyzer analyzer = new StandardAnalyzer();
      queryBuilder = new QueryBuilder(analyzer);
   }

   public TopDocs search( String searchQuery) 
      throws IOException, ParseException {
      query = queryBuilder.createPhraseQuery(LuceneConstants.CONTENTS, searchQuery);
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public TopDocs search(Query query) throws IOException, ParseException {
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public Document getDocument(ScoreDoc scoreDoc) throws CorruptIndexException, IOException {
      return indexSearcher.storedFields().document(scoreDoc.doc);	
   }
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.text.ParseException;

import org.apache.lucene.document.Document;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.PhraseQuery;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;

public class LuceneTester {

   String indexDir = "D:\\Lucene\\Index";
   String dataDir = "D:\\Lucene\\Data";
   Searcher searcher;

   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         String[] phrases = new String[]{"record1.txt"};
         tester.searchUsingPhraseQuery(phrases);
      } catch (IOException e) {
         e.printStackTrace();
      } catch (ParseException e) {
         e.printStackTrace();
      }
   }

   private void searchUsingPhraseQuery(String[] phrases)
      throws IOException, ParseException {
      searcher = new Searcher(indexDir);
      long startTime = System.currentTimeMillis();

      PhraseQuery.Builder queryBuilder = new PhraseQuery.Builder();
      queryBuilder.setSlop(0);

      for(String word:phrases) {
         queryBuilder.add(new Term(LuceneConstants.FILE_NAME,word));
      }

      PhraseQuery query = queryBuilder.build();

      //do the search
      TopDocs hits = searcher.search(query);
      long endTime = System.currentTimeMillis();

      System.out.println(hits.totalHits +
         " documents found. Time :" + (endTime - startTime) + "ms");
      for(ScoreDoc scoreDoc : hits.scoreDocs) {
         Document doc = searcher.getDocument(scoreDoc);
         System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH));
      }
   }
}

Data & Index Directory Creation

I've used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory D:\Lucene\Data. Test Data. An index directory path should be created as D:\Lucene\Index. After running the indexing program in the chapter Lucene - Indexing Process, you can see the list of index files created in that folder.

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

1 hits documents found. Time :31ms
File: D:\lucene\Data\record1.txt

Lucene - WildCardQuery

WildCardQuery class is used to search documents using wildcards like '*' for any character sequence, matching a single character.

Class Declaration

Following is the declaration for org.apache.lucene.search.WildCardQuery class −

public class WildcardQuery
   extends AutomatonQuery
S.No. Method & Description
1

static final char WILDCARD_CHAR

Char equality with support for wildcards

2

static final char WILDCARD_ESCAPE

Escape character

3

static final char WILDCARD_STRING

String equality with support for wildcards

S.No. Constructor & Description
1

WildcardQuery(Term term)

Constructs a query for terms matching term.

2

WildcardQuery(Term term, int determinizeWorkLimit)

Constructs a query for terms matching term.

3

WildcardQuery(Term term, int determinizeWorkLimit, MultiTermQuery.RewriteMethod rewriteMethod)

Constructs a query for terms matching term.

S.No. Method & Description
1

Term getTerm()

Returns the pattern term.

2

static Automaton toAutomaton(Term wildcardquery, int determinizeWorkLimit)

Convert Lucene wildcard syntax into an automaton.

3

String toString(String field)

Prints a user-readable version of this query.

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.search.AutomatonQuery
  • org.apache.lucene.search.MultiTermQuery
  • org.apache.lucene.search.Query
  • java.lang.Object

Usage of WildCardQuery

private void searchUsingWildCardQuery(String searchQuery) 
   throws IOException, ParseException { 
   searcher = new Searcher(indexDir); 
   long startTime = System.currentTimeMillis(); 

   //create a term to search file name 
   Term term = new Term(LuceneConstants.FILE_NAME, searchQuery); 
   //create the term query object 
   Query query = new WildcardQuery(term); 
   //do the search 
   TopDocs hits = searcher.search(query); 
   long endTime = System.currentTimeMillis(); 

   System.out.println(hits.totalHits + 
      " documents found. Time :" + (endTime - startTime) + "ms"); 

   for(ScoreDoc scoreDoc : hits.scoreDocs) { 
      Document doc = searcher.getDocument(scoreDoc); 
      System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH)); 
   } 
} 

Example Application

To test search using BooleanQuery, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java and Searcher.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

Searcher.java

This class is used to read the indexes made on raw data and searches data using the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.QueryBuilder;

public class Searcher {

   IndexSearcher indexSearcher;
   QueryBuilder queryBuilder;
   Query query;

   public Searcher(String indexDirectoryPath) 
      throws IOException {
      DirectoryReader indexDirectory = DirectoryReader.open(FSDirectory.open(Paths.get(indexDirectoryPath)));
      indexSearcher = new IndexSearcher(indexDirectory);
      StandardAnalyzer analyzer = new StandardAnalyzer();
      queryBuilder = new QueryBuilder(analyzer);
   }

   public TopDocs search( String searchQuery) 
      throws IOException, ParseException {
      query = queryBuilder.createPhraseQuery(LuceneConstants.CONTENTS, searchQuery);
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public TopDocs search(Query query) throws IOException, ParseException {
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public Document getDocument(ScoreDoc scoreDoc) throws CorruptIndexException, IOException {
      return indexSearcher.storedFields().document(scoreDoc.doc);	
   }
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.text.ParseException;

import org.apache.lucene.document.Document;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.WildcardQuery;

public class LuceneTester {
	
   String indexDir = "D:\\Lucene\\Index";
   String dataDir = "D:\\Lucene\\Data";
   Searcher searcher;

   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         tester.searchUsingWildCardQuery("record1*");
      } catch (IOException e) {
         e.printStackTrace();
      } catch (ParseException e) {
         e.printStackTrace();
      }
   }

   private void searchUsingWildCardQuery(String searchQuery) 
      throws IOException, ParseException { 
      searcher = new Searcher(indexDir); 
      long startTime = System.currentTimeMillis(); 

      //create a term to search file name 
      Term term = new Term(LuceneConstants.FILE_NAME, searchQuery); 
      //create the term query object 
      Query query = new WildcardQuery(term); 
      //do the search 
      TopDocs hits = searcher.search(query); 
      long endTime = System.currentTimeMillis(); 

      System.out.println(hits.totalHits + 
         " documents found. Time :" + (endTime - startTime) + "ms"); 

      for(ScoreDoc scoreDoc : hits.scoreDocs) { 
         Document doc = searcher.getDocument(scoreDoc); 
         System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH)); 
      }
   } 
}

Data & Index Directory Creation

I've used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory D:\Lucene\Data. Test Data. An index directory path should be created as D:\Lucene\Index. After running the indexing program in the chapter Lucene - Indexing Process, you can see the list of index files created in that folder.

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

2 hits documents found. Time :70ms
File: D:\lucene\Data\record1.txt
File: D:\lucene\Data\record10.txt

Lucene - FuzzyQuery

FuzzyQuery class is used to search documents using fuzzy implementation that is an approximate search based on the edit distance algorithm.

Class Declaration

Following is the declaration for org.apache.lucene.search.FuzzyQuery class −

public class FuzzyQuery
   extends MultiTermQuery
S.No. Method & Description
1

static final int defaultMaxEdits

 

2

static final int defaultMaxExpansions

 

3

static final int defaultPrefixLength

 

4

static final boolean defaultTranspositions

 

S.No. Constructor & Description
1

FuzzyQuery(Term term)

Calls FuzzyQuery(term, defaultMaxEdits).

2

FuzzyQuery(Term term, int maxEdits)

Calls FuzzyQuery(term, maxEdits, defaultPrefixLength).

3

FuzzyQuery(Term term, int maxEdits, int prefixLength)

Calls FuzzyQuery(term, maxEdits, prefixLength, defaultMaxExpansions, defaultTranspositions).

4

FuzzyQuery(Term term, int maxEdits, int prefixLength, int maxExpansions, boolean transpositions)

Calls FuzzyQuery(Term, int, int, int, boolean, org.apache.lucene.search.MultiTermQuery.RewriteMethod) FuzzyQuery(term, maxEdits, prefixLength, maxExpansions, defaultRewriteMethod(maxExpansions))

5

FuzzyQuery(Term term, int maxEdits, int prefixLength, int maxExpansions, boolean transpositions, MultiTermQuery.RewriteMethod rewriteMethod)

Create a new FuzzyQuery that will match terms with an edit distance of at most maxEdits to term.

S.No. Method & Description
1

static MultiTermQuery.RewriteMethod defaultRewriteMethod(int maxExpansions)

Creates a default top-terms blended frequency scoring rewrite with the given max expansions.

2

boolean equals(Object obj)

Override and implement query instance equivalence properly in a subclass.

3

static int floatToEdits(float minimumSimilarity, int termLen)

Helper function to convert from "minimumSimilarity" fractions to raw edit distances.

4

CompiledAutomaton getAutomata()

Returns the compiled automata used to match terms.

5

static CompiledAutomaton getFuzzyAutomaton(String term, int maxEdits, int prefixLength, boolean transpositions)

Returns the CompiledAutomaton internally used by FuzzyQuery to match terms.

6

int getMaxEdits()

 

7

int getPrefixLength()

Returns the non-fuzzy prefix length.

8

Term getTerm()

Returns the pattern term.

9

protected TermsEnum getTermsEnum(Terms terms, AttributeSource atts)

Construct the enumeration to be used, expanding the pattern term.

10

boolean getTranspositions()

Returns true if transpositions should be treated as a primitive edit operation.

11

int hashCode()

Override and implement query hash code properly in a subclass.

12

String toString(String field)

Prints a query to a string, with field assumed to be the default field and omitted.

13

void visit(QueryVisitor visitor)

Recurse through the query tree, visiting any child queries.

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.search.MultiTermQuery
  • org.apache.lucene.search.Query
  • java.lang.Object

Usage of FuzzyQuery

private void searchUsingFuzzyQuery(String searchQuery)
   throws IOException, ParseException {
   searcher = new Searcher(indexDir);
   long startTime = System.currentTimeMillis();
   //create a term to search file name
   Term term = new Term(LuceneConstants.FILE_NAME, searchQuery);
   //create the term query object
   Query query = new FuzzyQuery(term);
   //do the search
   TopDocs hits = searcher.search(query);
   long endTime = System.currentTimeMillis();

   System.out.println(hits.totalHits +
      " documents found. Time :" + (endTime - startTime) + "ms");
   for(ScoreDoc scoreDoc : hits.scoreDocs) {
      Document doc = searcher.getDocument(scoreDoc);
      System.out.print("Score: "+ scoreDoc.score + " ");
      System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH));
   }
} 

Example Application

To test search using BooleanQuery, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java and Searcher.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

Searcher.java

This class is used to read the indexes made on raw data and searches data using the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.QueryBuilder;

public class Searcher {

   IndexSearcher indexSearcher;
   QueryBuilder queryBuilder;
   Query query;

   public Searcher(String indexDirectoryPath) 
      throws IOException {
      DirectoryReader indexDirectory = DirectoryReader.open(FSDirectory.open(Paths.get(indexDirectoryPath)));
      indexSearcher = new IndexSearcher(indexDirectory);
      StandardAnalyzer analyzer = new StandardAnalyzer();
      queryBuilder = new QueryBuilder(analyzer);
   }

   public TopDocs search( String searchQuery) 
      throws IOException, ParseException {
      query = queryBuilder.createPhraseQuery(LuceneConstants.CONTENTS, searchQuery);
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public TopDocs search(Query query) throws IOException, ParseException {
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public Document getDocument(ScoreDoc scoreDoc) throws CorruptIndexException, IOException {
      return indexSearcher.storedFields().document(scoreDoc.doc);	
   }
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.text.ParseException;

import org.apache.lucene.document.Document;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.FuzzyQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;

public class LuceneTester {

   String indexDir = "D:\\Lucene\\Index";
   String dataDir = "D:\\Lucene\\Data";
   Searcher searcher;

   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         tester.searchUsingFuzzyQuery("cord3.txt");
      } catch (IOException e) {
         e.printStackTrace();
      } catch (ParseException e) {
         e.printStackTrace();
      }
   }
   private void searchUsingFuzzyQuery(String searchQuery)
      throws IOException, ParseException {
      searcher = new Searcher(indexDir);
      long startTime = System.currentTimeMillis();

      //create a term to search file name
      Term term = new Term(LuceneConstants.FILE_NAME, searchQuery);
      //create the term query object
      Query query = new FuzzyQuery(term);
      //do the search
      TopDocs hits = searcher.search(query);
      long endTime = System.currentTimeMillis();

      System.out.println(hits.totalHits +
         " documents found. Time :" + (endTime - startTime) + "ms");
      for(ScoreDoc scoreDoc : hits.scoreDocs) {
         Document doc = searcher.getDocument(scoreDoc);
         System.out.print("Score: "+ scoreDoc.score + " ");
         System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH));
      }
   }
}

Data & Index Directory Creation

I've used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory D:\Lucene\Data. Test Data. An index directory path should be created as D:\Lucene\Index. After running the indexing program in the chapter Lucene - Indexing Process, you can see the list of index files created in that folder.

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

1 hits documents found. Time :89ms
Score: 0.73515606 File: D:\lucene\Data\record3.txt

Lucene - MatchAllDocsQuery

MatchAllDocsQuery class as the name suggests, matches all the documents.

Class Declaration

Following is the declaration for org.apache.lucene.search.MatchAllDocsQuery class −

public class MatchAllDocsQuery
   extends Query
S.No. Constructor & Description
1

MatchAllDocsQuery()

Default Constructor

S.No. Method & Description
1

Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost)

Expert: Constructs an appropriate Weight implementation for this query.

2

boolean equals(Object obj)

Override and implement query instance equivalence properly in a subclass.

3

int hashCode()

Override and implement query hash code properly in a subclass.

4

String toString(String field)

Prints a query to a string, with field assumed to be the default field and omitted.

5

void visit(QueryVisitor visitor)

Recurse through the query tree, visiting any child queries.

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.search.Query
  • java.lang.Object

Usage of MatchAllDocsQuery

private void searchUsingMatchAllDocsQuery()
   throws IOException, ParseException {
   searcher = new Searcher(indexDir);
   long startTime = System.currentTimeMillis();
      
   //create the term query object
   Query query = new MatchAllDocsQuery();
   //do the search
   TopDocs hits = searcher.search(query);
   long endTime = System.currentTimeMillis();

   System.out.println(hits.totalHits +
      " documents found. Time :" + (endTime - startTime) + "ms");
   for(ScoreDoc scoreDoc : hits.scoreDocs) {
      Document doc = searcher.getDocument(scoreDoc);
      System.out.print("Score: "+ scoreDoc.score + " ");
      System.out.println("Doc ID: " + scoreDoc.doc);
   }
} 

Example Application

To test search using BooleanQuery, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java and Searcher.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

Searcher.java

This class is used to read the indexes made on raw data and searches data using the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.QueryBuilder;

public class Searcher {

   IndexSearcher indexSearcher;
   QueryBuilder queryBuilder;
   Query query;

   public Searcher(String indexDirectoryPath) 
      throws IOException {
      DirectoryReader indexDirectory = DirectoryReader.open(FSDirectory.open(Paths.get(indexDirectoryPath)));
      indexSearcher = new IndexSearcher(indexDirectory);
      StandardAnalyzer analyzer = new StandardAnalyzer();
      queryBuilder = new QueryBuilder(analyzer);
   }

   public TopDocs search( String searchQuery) 
      throws IOException, ParseException {
      query = queryBuilder.createPhraseQuery(LuceneConstants.CONTENTS, searchQuery);
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public TopDocs search(Query query) throws IOException, ParseException {
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public Document getDocument(ScoreDoc scoreDoc) throws CorruptIndexException, IOException {
      return indexSearcher.storedFields().document(scoreDoc.doc);	
   }
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.text.ParseException;

import org.apache.lucene.document.Document;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;

public class LuceneTester {
	
   String indexDir = "D:\\Lucene\\Index";
   String dataDir = "D:\\Lucene\\Data";
   Searcher searcher;

   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         tester.searchUsingMatchAllDocsQuery();
      } catch (IOException e) {
         e.printStackTrace();
      } catch (ParseException e) {
         e.printStackTrace();
      }
   }
   private void searchUsingMatchAllDocsQuery()
      throws IOException, ParseException {
      searcher = new Searcher(indexDir);
      long startTime = System.currentTimeMillis();
      
      //create the term query object
      Query query = new MatchAllDocsQuery();
      //do the search
      TopDocs hits = searcher.search(query);
      long endTime = System.currentTimeMillis();

      System.out.println(hits.totalHits +
         " documents found. Time :" + (endTime - startTime) + "ms");
      for(ScoreDoc scoreDoc : hits.scoreDocs) {
         Document doc = searcher.getDocument(scoreDoc);
         System.out.print("Score: "+ scoreDoc.score + " ");
         System.out.println("Doc ID: " + scoreDoc.doc);
      }
   }
}

Data & Index Directory Creation

I've used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory D:\Lucene\Data. Test Data. An index directory path should be created as D:\Lucene\Index. After running the indexing program in the chapter Lucene - Indexing Process, you can see the list of index files created in that folder.

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

10 hits documents found. Time :12ms
Score: 1.0 Doc ID: 1
Score: 1.0 Doc ID: 2
Score: 1.0 Doc ID: 3
Score: 1.0 Doc ID: 4
Score: 1.0 Doc ID: 5
Score: 1.0 Doc ID: 6
Score: 1.0 Doc ID: 7
Score: 1.0 Doc ID: 8
Score: 1.0 Doc ID: 9
Score: 1.0 Doc ID: 10

Lucene - MatchNoDocsQuery

MatchNoDocsQuery class as the name suggests, matches no documents.

Class Declaration

Following is the declaration for org.apache.lucene.search.MatchNoDocsQuery class −

public class MatchNoDocsQuery
   extends Query
S.No. Constructor & Description
1

MatchNoDocsQuery()

Default Constructor

2

MatchNoDocsQuery(String reason)

Provides a reason explaining why this query was used.

S.No. Method & Description
1

Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost)

Expert: Constructs an appropriate Weight implementation for this query.

2

boolean equals(Object obj)

Override and implement query instance equivalence properly in a subclass.

3

int hashCode()

Override and implement query hash code properly in a subclass.

4

String toString(String field)

Prints a query to a string, with field assumed to be the default field and omitted.

5

void visit(QueryVisitor visitor)

Recurse through the query tree, visiting any child queries.

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.search.Query
  • java.lang.Object

Usage of MatchNoDocsQuery

private void searchUsingMatchNoDocsQuery()
   throws IOException, ParseException {
   searcher = new Searcher(indexDir);
   long startTime = System.currentTimeMillis();
      
   //create the term query object
   Query query = new MatchNoDocsQuery();
   //do the search
   TopDocs hits = searcher.search(query);
   long endTime = System.currentTimeMillis();

   System.out.println(hits.totalHits +
      " documents found. Time :" + (endTime - startTime) + "ms");
   for(ScoreDoc scoreDoc : hits.scoreDocs) {
      Document doc = searcher.getDocument(scoreDoc);
      System.out.print("Score: "+ scoreDoc.score + " ");
      System.out.println("Doc ID: " + scoreDoc.doc);
   }
}

Example Application

To test search using BooleanQuery, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java and Searcher.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

Searcher.java

This class is used to read the indexes made on raw data and searches data using the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.QueryBuilder;

public class Searcher {

   IndexSearcher indexSearcher;
   QueryBuilder queryBuilder;
   Query query;

   public Searcher(String indexDirectoryPath) 
      throws IOException {
      DirectoryReader indexDirectory = DirectoryReader.open(FSDirectory.open(Paths.get(indexDirectoryPath)));
      indexSearcher = new IndexSearcher(indexDirectory);
      StandardAnalyzer analyzer = new StandardAnalyzer();
      queryBuilder = new QueryBuilder(analyzer);
   }

   public TopDocs search( String searchQuery) 
      throws IOException, ParseException {
      query = queryBuilder.createPhraseQuery(LuceneConstants.CONTENTS, searchQuery);
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public TopDocs search(Query query) throws IOException, ParseException {
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public Document getDocument(ScoreDoc scoreDoc) throws CorruptIndexException, IOException {
      return indexSearcher.storedFields().document(scoreDoc.doc);	
   }
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.text.ParseException;

import org.apache.lucene.document.Document;
import org.apache.lucene.search.MatchNoDocsQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;

public class LuceneTester {
	
   String indexDir = "D:\\Lucene\\Index";
   String dataDir = "D:\\Lucene\\Data";
   Searcher searcher;

   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         tester.searchUsingMatchNoDocsQuery();
      } catch (IOException e) {
         e.printStackTrace();
      } catch (ParseException e) {
         e.printStackTrace();
      }
   }
   private void searchUsingMatchNoDocsQuery()
      throws IOException, ParseException {
      searcher = new Searcher(indexDir);
      long startTime = System.currentTimeMillis();
      
      //create the term query object
      Query query = new MatchNoDocsQuery();
      //do the search
      TopDocs hits = searcher.search(query);
      long endTime = System.currentTimeMillis();

      System.out.println(hits.totalHits +
         " documents found. Time :" + (endTime - startTime) + "ms");
      for(ScoreDoc scoreDoc : hits.scoreDocs) {
         Document doc = searcher.getDocument(scoreDoc);
         System.out.print("Score: "+ scoreDoc.score + " ");
         System.out.println("Doc ID: " + scoreDoc.doc);
      }
   }
}

Data & Index Directory Creation

I've used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory D:\Lucene\Data. Test Data. An index directory path should be created as D:\Lucene\Index. After running the indexing program in the chapter Lucene - Indexing Process, you can see the list of index files created in that folder.

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

0 hits documents found. Time :9ms

Lucene - RegexpQuery

RegexpQuery class represents a Regular Expression based Query. RegexpQuery comparisons are quiet fast.

Class Declaration

Following is the declaration for org.apache.lucene.search.RegexpQuery class −

public class RegexpQuery
   extends AutomatonQuery
S.No. Field & Description
1

static final AutomatonProvider DEFAULT_PROVIDER

A provider that provides no named automata.

S.No. Constructor & Description
1

RegexpQuery(Term term)

Constructs a query for terms matching term.

2

RegexpQuery(Term term, int flags)

Constructs a query for terms matching term.

3

RegexpQuery(Term term, int flags, int determinizeWorkLimit)

Constructs a query for terms matching term.

4

RegexpQuery(Term term, int syntaxFlags, int matchFlags, int determinizeWorkLimit)

Constructs a query for terms matching term.

5

RegexpQuery(Term term, int syntaxFlags, int matchFlags, AutomatonProvider provider, int determinizeWorkLimit, MultiTermQuery.RewriteMethod rewriteMethod)

Constructs a query for terms matching term.

6

RegexpQuery(Term term, int syntaxFlags, int matchFlags, AutomatonProvider provider, int determinizeWorkLimit, MultiTermQuery.RewriteMethod rewriteMethod, boolean doDeterminization)

Constructs a query for terms matching term.

7

RegexpQuery(Term term, int syntaxFlags, AutomatonProvider provider, int determinizeWorkLimit)

Constructs a query for terms matching term.

S.No. Method & Description
1

Term getRegexp()

Returns the regexp of this query wrapped in a Term.

2

String toString(String field)

Prints a user-readable version of this query.

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.search.AutomatonQuery
  • org.apache.lucene.search.MultiTermQuery
  • org.apache.lucene.search.Query
  • java.lang.Object

Usage of RegexpQuery

private void searchUsingRegexpQuery(String searchQuery) 
   throws IOException, ParseException { 
   searcher = new Searcher(indexDir); 
   long startTime = System.currentTimeMillis(); 

   //create a term to search file name 
   Term term = new Term(LuceneConstants.FILE_NAME, searchQuery); 
   //create the term query object 
   Query query = new RegexpQuery(term); 
   //do the search 
   TopDocs hits = searcher.search(query); 
   long endTime = System.currentTimeMillis(); 

   System.out.println(hits.totalHits + 
      " documents found. Time :" + (endTime - startTime) + "ms"); 

   for(ScoreDoc scoreDoc : hits.scoreDocs) { 
      Document doc = searcher.getDocument(scoreDoc); 
      System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH)); 
   } 
} 

Example Application

To test search using BooleanQuery, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java and Searcher.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

Searcher.java

This class is used to read the indexes made on raw data and searches data using the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.QueryBuilder;

public class Searcher {

   IndexSearcher indexSearcher;
   QueryBuilder queryBuilder;
   Query query;

   public Searcher(String indexDirectoryPath) 
      throws IOException {
      DirectoryReader indexDirectory = DirectoryReader.open(FSDirectory.open(Paths.get(indexDirectoryPath)));
      indexSearcher = new IndexSearcher(indexDirectory);
      StandardAnalyzer analyzer = new StandardAnalyzer();
      queryBuilder = new QueryBuilder(analyzer);
   }

   public TopDocs search( String searchQuery) 
      throws IOException, ParseException {
      query = queryBuilder.createPhraseQuery(LuceneConstants.CONTENTS, searchQuery);
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public TopDocs search(Query query) throws IOException, ParseException {
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public Document getDocument(ScoreDoc scoreDoc) throws CorruptIndexException, IOException {
      return indexSearcher.storedFields().document(scoreDoc.doc);	
   }
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.text.ParseException;

import org.apache.lucene.document.Document;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.RegexpQuery;

public class LuceneTester {
	
   String indexDir = "D:\\Lucene\\Index";
   String dataDir = "D:\\Lucene\\Data";
   Searcher searcher;

   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         tester.searchUsingRegexpQuery("record1*.txt");
      } catch (IOException e) {
         e.printStackTrace();
      } catch (ParseException e) {
         e.printStackTrace();
      }
   }

   private void searchUsingRegexpQuery(String searchQuery) 
      throws IOException, ParseException { 
      searcher = new Searcher(indexDir); 
      long startTime = System.currentTimeMillis(); 

      //create a term to search file name 
      Term term = new Term(LuceneConstants.FILE_NAME, searchQuery); 
      //create the term query object 
      Query query = new RegexpQuery(term); 
      //do the search 
      TopDocs hits = searcher.search(query); 
      long endTime = System.currentTimeMillis(); 

      System.out.println(hits.totalHits + 
         " documents found. Time :" + (endTime - startTime) + "ms"); 

      for(ScoreDoc scoreDoc : hits.scoreDocs) { 
         Document doc = searcher.getDocument(scoreDoc); 
         System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH)); 
      }
   } 
}

Data & Index Directory Creation

I've used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory D:\Lucene\Data. Test Data. An index directory path should be created as D:\Lucene\Index. After running the indexing program in the chapter Lucene - Indexing Process, you can see the list of index files created in that folder.

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

1 hits documents found. Time :53ms
File: D:\lucene\Data\record1.txt

Lucene - Analysis

In one of our previous chapters, we have seen that Lucene uses IndexWriter to analyze the Document(s) using the Analyzer and then creates/open/edit indexes as required. In this chapter, we are going to discuss the various types of Analyzer objects and other relevant objects which are used during the analysis process. Understanding the Analysis process and how analyzers work will give you great insight over how Lucene indexes the documents.

Important Analyzer

Following is the list of objects that we'll discuss in due course.

S.No. Class & Description
1 WhitespaceAnalyzer

This analyzer splits the text in a document based on whitespace.

2 SimpleAnalyzer

This analyzer splits the text in a document based on non-letter characters and puts the text in lowercase.

3 StopAnalyzer

This analyzer works just as the SimpleAnalyzer and removes the common words like 'a', 'an', 'the', etc.

4 StandardAnalyzer

This is the most sophisticated analyzer and is capable of handling names, email addresses, etc. It lowercases each token and removes common words and punctuations, if any.

5 KeywordAnalyzer

This analyzer treats entire stream as a token. It is best suited for identifiers, zip codes, product names etc.

6 CustomAnalyzer

We can create our own custom analyzer as per custom requirements using CustomAnalyzer.builder() method.

7 EnglishAnalyzer

Analyzer for English language.

8 FrenchAnalyzer

Analyzer for French language.

9 Lucene - SpanishAnalyzer

Analyzer for Spanish language.

Lucene - WhitespaceAnalyzer Analyzer

WhitespaceAnalyzer splits the text in a document based on whitespace.

Class Declaration

Following is the declaration for org.apache.lucene.analysis.core.WhitespaceAnalyzer class −

public final class WhitespaceAnalyzer
   extends Analyzer
.
S.No. Constructor & Description
1

WhitespaceAnalyzer()

Creates a new WhitespaceAnalyzer with a maximum token length of 255 chars.

2

WhitespaceAnalyzer(int maxTokenLength)

Creates a new WhitespaceAnalyzer with a custom maximum token length.

S.No. Method & Description
1

protected Analyzer.TokenStreamComponents createComponents(String fieldName)

Creates a new Analyzer.TokenStreamComponents instance for this analyzer.

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.analysis.Analyzer
  • java.lang.Object

Usage of WhitespaceAnalyzer

private void displayTokenUsingWhitespaceAnalyzer() throws IOException {
   String text 
      = "Lucene is simple yet powerful java based search library.";
   Analyzer analyzer = new WhitespaceAnalyzer();
   TokenStream tokenStream = analyzer.tokenStream(
   LuceneConstants.CONTENTS, new StringReader(text));
   CharTermAttribute term = tokenStream.addAttribute(CharTermAttribute.class);
   tokenStream.reset();
   while(tokenStream.incrementToken()) {
      System.out.print("[" + term.toString() + "] ");
   }
   analyzer.close();
}

Example Application

To test search using WhitespaceAnalyzer, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.io.StringReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

public class LuceneTester {
	
   public static void main(String[] args) {
      LuceneTester tester;

      tester = new LuceneTester();
   
      try {
         tester.displayTokenUsingWhitespaceAnalyzer();
      } catch (IOException e) {
         e.printStackTrace();
      }
   }

   private void displayTokenUsingWhitespaceAnalyzer() throws IOException {
      String text 
         = "Lucene is simple yet powerful java based search library.";
      Analyzer analyzer = new WhitespaceAnalyzer();
      TokenStream tokenStream = analyzer.tokenStream(
         LuceneConstants.CONTENTS, new StringReader(text));
      CharTermAttribute term = tokenStream.addAttribute(CharTermAttribute.class);
      tokenStream.reset();
      while(tokenStream.incrementToken()) {
         System.out.print("[" + term.toString() + "] ");
      }
      analyzer.close();
   }
}

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

[lucene] [simple] [yet] [powerful] [java] [based] [search] [library]

Lucene - SimpleAnalyzer Analyzer

SimpleAnalyzer splits the text in a document based on non-letter characters and then puts them in lowercase.

Class Declaration

Following is the declaration for org.apache.lucene.analysis.core.SimpleAnalyzer class −

public final class SimpleAnalyzer
   extends Analyzer
.
S.No. Constructor & Description
1

SimpleAnalyzer()

Creates a new SimpleAnalyzer.

S.No. Method & Description
1

protected Analyzer.TokenStreamComponents createComponents(String fieldName)

Creates a new Analyzer.TokenStreamComponents instance for this analyzer.

2

protected TokenStream normalize(String fieldName, TokenStream in)

 

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.analysis.Analyzer
  • java.lang.Object

Usage of SimpleAnalyzer

private void displayTokenUsingSimpleAnalyzer() throws IOException {
   String text 
      = "Lucene is simple yet powerful java based search library.";
   Analyzer analyzer = new SimpleAnalyzer();
   TokenStream tokenStream = analyzer.tokenStream(
   LuceneConstants.CONTENTS, new StringReader(text));
   CharTermAttribute term = tokenStream.addAttribute(CharTermAttribute.class);
   tokenStream.reset();
   while(tokenStream.incrementToken()) {
      System.out.print("[" + term.toString() + "] ");
   }
   analyzer.close();
}

Example Application

To test search using SimpleAnalyzer, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.io.StringReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.core.SimpleAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

public class LuceneTester {
	
   public static void main(String[] args) {
      LuceneTester tester;

      tester = new LuceneTester();
   
      try {
         tester.displayTokenUsingSimpleAnalyzer();
      } catch (IOException e) {
         e.printStackTrace();
      }
   }

   private void displayTokenUsingSimpleAnalyzer() throws IOException {
      String text 
         = "Lucene is simple yet powerful java based search library.";
      Analyzer analyzer = new SimpleAnalyzer();
      TokenStream tokenStream = analyzer.tokenStream(
         LuceneConstants.CONTENTS, new StringReader(text));
      CharTermAttribute term = tokenStream.addAttribute(CharTermAttribute.class);
      tokenStream.reset();
      while(tokenStream.incrementToken()) {
         System.out.print("[" + term.toString() + "] ");
      }
      analyzer.close();
   }
}

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

[lucene] [simple] [yet] [powerful] [java] [based] [search] [library]

Lucene - StopAnalyzer Analyzer

StopAnalyzer works similar to SimpleAnalyzer and remove the common words like 'a', 'an', 'the', etc.

Class Declaration

Following is the declaration for org.apache.lucene.analysis.core.StopAnalyzer class −

public final class StopAnalyzer
   extends Analyzer
.
S.No. Constructor & Description
1

StopAnalyzer(Reader stopwords)

Builds an analyzer with the stop words from the given reader.

2

StopAnalyzer(Reader stopwords)

Builds an analyzer with the stop words from the given path.

3

StopAnalyzer(CharArraySet stopWords)

Builds an analyzer with the stop words from the given set.

S.No. Method & Description
1

protected Analyzer.TokenStreamComponents createComponents(String fieldName)

Creates a new Analyzer.TokenStreamComponents used to tokenize all the text in the provided Reader.

2

protected TokenStream normalize(String fieldName, TokenStream in)

 

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.analysis.StopwordAnalyzerBase
  • org.apache.lucene.analysis.Analyzer
  • java.lang.Object

Usage of StopAnalyzer

private void displayTokenUsingStopAnalyzer() throws IOException {
   String text = "The Lucene is a simple yet powerful java based search library.";

   Set<String> stopWords = new HashSet<>();
   stopWords.add("a");
   stopWords.add("an");
   stopWords.add("the");

   Analyzer analyzer = new StopAnalyzer(CharArraySet.copy(stopWords));
   TokenStream tokenStream = analyzer.tokenStream(
   LuceneConstants.CONTENTS, new StringReader(text));
   CharTermAttribute term = tokenStream.addAttribute(CharTermAttribute.class);
   tokenStream.reset();
   while(tokenStream.incrementToken()) {
      System.out.print("[" + term.toString() + "] ");
   }
   analyzer.close();
}

Example Application

To test search using StopAnalyzer, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.io.StringReader;
import java.util.HashSet;
import java.util.Set;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.CharArraySet;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.core.StopAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

public class LuceneTester {
	
   public static void main(String[] args) {
      LuceneTester tester;

      tester = new LuceneTester();
   
      try {
         tester.displayTokenUsingStopAnalyzer();
      } catch (IOException e) {
         e.printStackTrace();
      }
   }

   private void displayTokenUsingStopAnalyzer() throws IOException {
      String text 
         = "The Lucene is a simple yet powerful java based search library.";
      
      Set<String> stopWords = new HashSet<>();
      stopWords.add("a");
      stopWords.add("an");
      stopWords.add("the");
      
      Analyzer analyzer = new StopAnalyzer(CharArraySet.copy(stopWords));
      TokenStream tokenStream = analyzer.tokenStream(
         LuceneConstants.CONTENTS, new StringReader(text));
      CharTermAttribute term = tokenStream.addAttribute(CharTermAttribute.class);
      tokenStream.reset();
      while(tokenStream.incrementToken()) {
         System.out.print("[" + term.toString() + "] ");
      }
      analyzer.close();
   }
}

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

[lucene] [is] [simple] [yet] [powerful] [java] [based] [search] [library] 

Lucene - StandardAnalyzer Analyzer

StandardAnalyzer is the most sophisticated analyzer and is capable of handling names, email addresses, etc. It lowercases each token and removes common words and punctuations, if any.

Class Declaration

Following is the declaration for org.apache.lucene.analysis.StandardAnalyzer class −

public final class StandardAnalyzer
   extends StopwordAnalyzerBase
S.No. Field & Description
1

static final int DEFAULT_MAX_TOKEN_LENGTH

Default maximum allowed token length.

.
S.No. Constructor & Description
1

StandardAnalyzer()

Builds an analyzer with no stop words.

2

StandardAnalyzer(Reader stopwords)

Builds an analyzer with the stop words from the given reader.

3

StandardAnalyzer(CharArraySet stopWords)

Builds an analyzer with the given stop words.

S.No. Method & Description
1

protected Analyzer.TokenStreamComponents createComponents(String fieldName)

Creates a new Analyzer.TokenStreamComponents instance for this analyzer.

2

int getMaxTokenLength()

Returns the current maximum token length.

3

protected TokenStream normalize(String fieldName, TokenStream in)

Wrap the given TokenStream in order to apply normalization filters.

4

void setMaxTokenLength(int length)

Set the max allowed token length.

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.analysis.StopwordAnalyzerBase
  • org.apache.lucene.analysis.Analyzer
  • java.lang.Object

Usage of StandardAnalyzer

private void displayTokenUsingStandardAnalyzer() throws IOException {
   String text 
      = "Lucene is simple yet powerful java based search library.";
   Analyzer analyzer = new StandardAnalyzer();
   TokenStream tokenStream = analyzer.tokenStream(
   LuceneConstants.CONTENTS, new StringReader(text));
   CharTermAttribute term = tokenStream.addAttribute(CharTermAttribute.class);
   tokenStream.reset();
   while(tokenStream.incrementToken()) {
      System.out.print("[" + term.toString() + "] ");
   }
   analyzer.close();
}

Example Application

To test search using StandardAnalyzer, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.io.StringReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

public class LuceneTester {
	
   public static void main(String[] args) {
      LuceneTester tester;

      tester = new LuceneTester();
   
      try {
         tester.displayTokenUsingStandardAnalyzer();
      } catch (IOException e) {
         e.printStackTrace();
      }
   }

   private void displayTokenUsingStandardAnalyzer() throws IOException {
      String text 
         = "Lucene is simple yet powerful java based search library.";
      Analyzer analyzer = new StandardAnalyzer();
      TokenStream tokenStream = analyzer.tokenStream(
         LuceneConstants.CONTENTS, new StringReader(text));
      CharTermAttribute term = tokenStream.addAttribute(CharTermAttribute.class);
      tokenStream.reset();
      while(tokenStream.incrementToken()) {
         System.out.print("[" + term.toString() + "] ");
      }
      analyzer.close();
   }
}

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

[lucene] [simple] [yet] [powerful] [java] [based] [search] [library]

Lucene - KeywordAnalyzer Analyzer

KeywordAnalyzer analyzer treats entire stream as a token. It is best suited for identifiers, zip codes, product names etc.

Class Declaration

Following is the declaration for org.apache.lucene.analysis.core.KeywordAnalyzer class −

public final class KeywordAnalyzer
   extends Analyzer
.
S.No. Constructor & Description
1

KeywordAnalyzer()

Builds an analyzer.

S.No. Method & Description
1

protected Analyzer.TokenStreamComponents createComponents(String fieldName)

Creates a new Analyzer.TokenStreamComponents.

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.analysis.Analyzer
  • java.lang.Object

Usage of KeywordAnalyzer

private void displayTokenUsingKeywordAnalyzer() throws IOException {
   String text = "Lucene is simple yet powerful java based search library.";

   Analyzer analyzer = new KeywordAnalyzer();
   TokenStream tokenStream = analyzer.tokenStream(
   LuceneConstants.CONTENTS, new StringReader(text));
   CharTermAttribute term = tokenStream.addAttribute(CharTermAttribute.class);
   tokenStream.reset();
   while(tokenStream.incrementToken()) {
      System.out.print("[" + term.toString() + "] ");
   }
   analyzer.close();
}

Example Application

To test search using StandardAnalyzer, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.io.StringReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.core.KeywordAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

public class LuceneTester {
	
   public static void main(String[] args) {
      LuceneTester tester;

      tester = new LuceneTester();
   
      try {
         tester.displayTokenUsingKeywordAnalyzer();
      } catch (IOException e) {
         e.printStackTrace();
      }
   }

   private void displayTokenUsingKeywordAnalyzer() throws IOException {
      String text 
         = "Lucene is simple yet powerful java based search library.";
      
      Analyzer analyzer = new KeywordAnalyzer();
      TokenStream tokenStream = analyzer.tokenStream(
         LuceneConstants.CONTENTS, new StringReader(text));
      CharTermAttribute term = tokenStream.addAttribute(CharTermAttribute.class);
      tokenStream.reset();
      while(tokenStream.incrementToken()) {
         System.out.print("[" + term.toString() + "] ");
      }
      analyzer.close();
   }
}

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

[Lucene is simple yet powerful java based search library.] 

Lucene - CustomAnalyzer

We can create our own custom analyzer as per custom requirements using CustomAnalyzer.builder() method.

Class Declaration

Following is the declaration for org.apache.lucene.analysis.core.CustomAnalyzer class −

public final class CustomAnalyzer
   extends Analyzer
S.No. Method & Description
1

static CustomAnalyzer.Builder builder()

Returns a builder for custom analyzers that loads all resources from Lucene's classloader.

2

static CustomAnalyzer.Builder builder(Path configDir)

Returns a builder for custom analyzers that loads all resources from the given file system base directory.

3

static CustomAnalyzer.Builder builder(ResourceLoader loader)

Returns a builder for custom analyzers that loads all resources using the given ResourceLoader.

4

protected Analyzer.TokenStreamComponents createComponents(String fieldName)

 

5

List<CharFilterFactory> getCharFilterFactories()

Returns the list of char filters that are used in this analyzer.

6

int getOffsetGap(String fieldName)

 

7

int getPositionIncrementGap(String fieldName)

 

8

List<TokenFilterFactory> getTokenFilterFactories()

Returns the list of token filters that are used in this analyzer.

9

TokenizerFactory getTokenizerFactory()

Returns the tokenizer that is used in this analyzer.

10

protected Reader initReader(String fieldName, Reader reader)

 

11

protected Reader initReaderForNormalization(String fieldName, Reader reader)

 

12

protected TokenStream normalize(String fieldName, TokenStream in)

 

13

String toString()

String representation of the analyzer.

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.analysis.StopwordAnalyzerBase
  • org.apache.lucene.analysis.Analyzer
  • java.lang.Object

Usage of CustomAnalyzer

private void displayTokenUsingCustomAnalyzer() throws IOException {
   String text 
      = "Lucene is simple yet powerful java based search library.";

   Analyzer analyzer = CustomAnalyzer.builder()
      .withTokenizer("standard")
      .addTokenFilter("lowercase")
      .addTokenFilter("stop")
      .addTokenFilter("capitalization")
      .build();
   TokenStream tokenStream = analyzer.tokenStream(
   LuceneConstants.CONTENTS, new StringReader(text));
   CharTermAttribute term = tokenStream.addAttribute(CharTermAttribute.class);
   tokenStream.reset();
   while(tokenStream.incrementToken()) {
      System.out.print("[" + term.toString() + "] ");
   }
   analyzer.close();
}

Example Application

To test search using CustomAnalyzer, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.io.StringReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.custom.CustomAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

public class LuceneTester {
	
   public static void main(String[] args) {
      LuceneTester tester;

      tester = new LuceneTester();
   
      try {
         tester.displayTokenUsingCustomAnalyzer();
      } catch (IOException e) {
         e.printStackTrace();
      }
   }

   private void displayTokenUsingCustomAnalyzer() throws IOException {
      String text 
         = "Lucene is simple yet powerful java based search library.";
      
      Analyzer analyzer = CustomAnalyzer.builder()
    	      .withTokenizer("standard")
    	      .addTokenFilter("lowercase")
    	      .addTokenFilter("stop")
    	      .addTokenFilter("capitalization")
    	      .build();
      TokenStream tokenStream = analyzer.tokenStream(
         LuceneConstants.CONTENTS, new StringReader(text));
      CharTermAttribute term = tokenStream.addAttribute(CharTermAttribute.class);
      tokenStream.reset();
      while(tokenStream.incrementToken()) {
         System.out.print("[" + term.toString() + "] ");
      }
      analyzer.close();
   }
}

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

[Lucene] [Simple] [Yet] [Powerful] [Java] [Based] [Search] [Library] 

Lucene - EnglishAnalyzer

EnglishAnalyzer is a specific analyzer for English language.

Class Declaration

Following is the declaration for org.apache.lucene.analysis.en.EnglishAnalyzer class −

public final class EnglishAnalyzer
   extends StopwordAnalyzerBase
S.No. Field & Description
1

static final CharArraySet ENGLISH_STOP_WORDS_SET

An unmodifiable set containing some common English words that are not usually useful for searching.

S.No. Constructor & Description
1

EnglishAnalyzer()

Builds an analyzer with the default stop words: getDefaultStopSet().

2

EnglishAnalyzer(CharArraySet stopwords)

Builds an analyzer with the given stop words.

3

EnglishAnalyzer(CharArraySet stopwords, CharArraySet stemExclusionSet)

Builds an analyzer with the given stop words.

S.No. Method & Description
1

protected Analyzer.TokenStreamComponents createComponents(String fieldName)

Creates a Analyzer.TokenStreamComponents which tokenizes all the text in the provided Reader.

2

static CharArraySet getDefaultStopSet()

Returns an unmodifiable instance of the default stop words set.

3

protected TokenStream normalize(String fieldName, TokenStream in)

 

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.analysis.StopwordAnalyzerBase
  • org.apache.lucene.analysis.Analyzer
  • java.lang.Object

Usage of EnglishAnalyzer

private void displayTokenUsingEnglishAnalyzer() throws IOException {
   String text = "Lucene is simple yet powerful java based search library.";
   Analyzer analyzer = new EnglishAnalyzer();
   TokenStream tokenStream = analyzer.tokenStream(
   LuceneConstants.CONTENTS, new StringReader(text));
   CharTermAttribute term = tokenStream.addAttribute(CharTermAttribute.class);
   tokenStream.reset();
   while(tokenStream.incrementToken()) {
      System.out.print("[" + term.toString() + "] ");
   }
   analyzer.close();
}

Example Application

To test search using EnglishAnalyzer, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.io.StringReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.en.EnglishAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

public class LuceneTester {
	
   public static void main(String[] args) {
      LuceneTester tester;

      tester = new LuceneTester();
   
      try {
         tester.displayTokenUsingEnglishAnalyzer();
      } catch (IOException e) {
         e.printStackTrace();
      }
   }

   private void displayTokenUsingEnglishAnalyzer() throws IOException {
      String text = "Lucene is simple yet powerful java based search library.";
      Analyzer analyzer = new EnglishAnalyzer();
      TokenStream tokenStream = analyzer.tokenStream(
      LuceneConstants.CONTENTS, new StringReader(text));
      CharTermAttribute term = tokenStream.addAttribute(CharTermAttribute.class);
      tokenStream.reset();
      while(tokenStream.incrementToken()) {
         System.out.print("[" + term.toString() + "] ");
      }
      analyzer.close();
   }
}

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

[lucen] [simpl] [yet] [power] [java] [base] [search] [librari] 

Lucene - FrenchAnalyzer

FrenchAnalyzer is a specific analyzer for French language.

Class Declaration

Following is the declaration for org.apache.lucene.analysis.fr.FrenchAnalyzer class −

public final class FrenchAnalyzer
   extends StopwordAnalyzerBase
S.No. Field & Description
1

static final CharArraySet DEFAULT_ARTICLES

Default set of articles for ElisionFilter.

2

static final CharArraySet DEFAULT_STOPWORD_FILE

File containing default French stopwords.

S.No. Constructor & Description
1

FrenchAnalyzer()

Builds an analyzer with the default stop words: getDefaultStopSet().

2

FrenchAnalyzer(CharArraySet stopwords)

Builds an analyzer with the given stop words.

3

FrenchAnalyzer(CharArraySet stopwords, CharArraySet stemExclusionSet)

Builds an analyzer with the given stop words.

S.No. Method & Description
1

protected Analyzer.TokenStreamComponents createComponents(String fieldName)

Creates a Analyzer.TokenStreamComponents which tokenizes all the text in the provided Reader.

2

static CharArraySet getDefaultStopSet()

Returns an unmodifiable instance of the default stop words set.

3

protected TokenStream normalize(String fieldName, TokenStream in)

 

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.analysis.StopwordAnalyzerBase
  • org.apache.lucene.analysis.Analyzer
  • java.lang.Object

Usage of FrenchAnalyzer

private void displayTokenUsingFrenchAnalyzer() throws IOException {
   String text = "Lucene est une bibliothèque de recherche simple mais puissante basée sur Java.";
   Analyzer analyzer = new FrenchAnalyzer();
   TokenStream tokenStream = analyzer.tokenStream(
   LuceneConstants.CONTENTS, new StringReader(text));
   CharTermAttribute term = tokenStream.addAttribute(CharTermAttribute.class);
   tokenStream.reset();
   while(tokenStream.incrementToken()) {
      System.out.print("[" + term.toString() + "] ");
   }
   analyzer.close();
}

Example Application

To test search using FrenchAnalyzer, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.io.StringReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.fr.FrenchAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

public class LuceneTester {
	
   public static void main(String[] args) {
      LuceneTester tester;

      tester = new LuceneTester();
   
      try {
         tester.displayTokenUsingFrenchAnalyzer();
      } catch (IOException e) {
         e.printStackTrace();
      }
   }

   private void displayTokenUsingFrenchAnalyzer() throws IOException {
      String text = "Lucene est une bibliothèque de recherche simple mais puissante basée sur Java.";
      Analyzer analyzer = new FrenchAnalyzer();
      TokenStream tokenStream = analyzer.tokenStream(
      LuceneConstants.CONTENTS, new StringReader(text));
      CharTermAttribute term = tokenStream.addAttribute(CharTermAttribute.class);
      tokenStream.reset();
      while(tokenStream.incrementToken()) {
         System.out.print("[" + term.toString() + "] ");
      }
      analyzer.close();
   }
}

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

[lucen] [est] [bibliothequ] [recherch] [simpl] [puisant] [base] [java]

Lucene - SpanishAnalyzer

SpanishAnalyzer is a specific analyzer for Spanish language.

Class Declaration

Following is the declaration for org.apache.lucene.analysis.es.SpanishAnalyzer class −

public final class SpanishAnalyzer
   extends StopwordAnalyzerBase
S.No. Field & Description
1

static final String DEFAULT_STOPWORD_FILE

File containing default Spanish stopwords.

S.No. Constructor & Description
1

SpanishAnalyzer()

Builds an analyzer with the default stop words: getDefaultStopSet().

2

SpanishAnalyzer(CharArraySet stopwords)

Builds an analyzer with the given stop words.

3

SpanishAnalyzer(CharArraySet stopwords, CharArraySet stemExclusionSet)

Builds an analyzer with the given stop words.

S.No. Method & Description
1

protected Analyzer.TokenStreamComponents createComponents(String fieldName)

Creates a Analyzer.TokenStreamComponents which tokenizes all the text in the provided Reader.

2

static CharArraySet getDefaultStopSet()

Returns an unmodifiable instance of the default stop words set.

3

protected TokenStream normalize(String fieldName, TokenStream in)

 

Methods Inherited

This class inherits methods from the following classes −

  • org.apache.lucene.analysis.StopwordAnalyzerBase
  • org.apache.lucene.analysis.Analyzer
  • java.lang.Object

Usage of SpanishAnalyzer

private void displayTokenUsingSpanishAnalyzer() throws IOException {
   String text = "Lucene es una biblioteca de búsqueda basada en Java sencilla pero potente.";
   Analyzer analyzer = new SpanishAnalyzer();
   TokenStream tokenStream = analyzer.tokenStream(
   LuceneConstants.CONTENTS, new StringReader(text));
   CharTermAttribute term = tokenStream.addAttribute(CharTermAttribute.class);
   tokenStream.reset();
   while(tokenStream.incrementToken()) {
      System.out.print("[" + term.toString() + "] ");
   }
   analyzer.close();
}

Example Application

To test search using SpanishAnalyzer, let us create a test Lucene application.

Step Description
1

Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process.

2

Create LuceneConstants.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged.

3

Create LuceneTester.java as mentioned below.

4

Clean and Build the application to make sure business logic is working as per the requirements.

LuceneConstants.java

This class is used to provide various constants to be used across the sample application.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

LuceneTester.java

This class is used to test the searching capability of the Lucene library.

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.io.StringReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.es.SpanishAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

public class LuceneTester {
	
   public static void main(String[] args) {
      LuceneTester tester;

      tester = new LuceneTester();
   
      try {
         tester.displayTokenUsingSpanishAnalyzer();
      } catch (IOException e) {
         e.printStackTrace();
      }
   }

   private void displayTokenUsingSpanishAnalyzer() throws IOException {
      String text = "Lucene es una biblioteca de búsqueda basada en Java sencilla pero potente.";
      Analyzer analyzer = new SpanishAnalyzer();
      TokenStream tokenStream = analyzer.tokenStream(
      LuceneConstants.CONTENTS, new StringReader(text));
      CharTermAttribute term = tokenStream.addAttribute(CharTermAttribute.class);
      tokenStream.reset();
      while(tokenStream.incrementToken()) {
         System.out.print("[" + term.toString() + "] ");
      }
      analyzer.close();
   }
}

Running the Program

Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTester application. If your application runs successfully, it will print the following message in Eclipse IDE's console −

Output

[lucen] [bibliotec] [busqued] [basad] [java] [sencill] [potent] 
Advertisements