Apache Solr - Terminology


In this chapter, we will try to understand the real meaning of some of the terms that are frequently used while working on Solr.

General Terminology

The following is a list of general terms that are used across all types of Solr setups −

  • Instance − Just like a tomcat instance or a jetty instance, this term refers to the application server, which runs inside a JVM. The home directory of Solr provides reference to each of these Solr instances, in which one or more cores can be configured to run in each instance.

  • Core − While running multiple indexes in your application, you can have multiple cores in each instance, instead of multiple instances each having one core.

  • Home − The term $SOLR_HOME refers to the home directory which has all the information regarding the cores and their indexes, configurations, and dependencies.

  • Shard − In distributed environments, the data is partitioned between multiple Solr instances, where each chunk of data can be called as a Shard. It contains a subset of the whole index.

SolrCloud Terminology

In an earlier chapter, we discussed how to install Apache Solr in standalone mode. Note that we can also install Solr in distributed mode (cloud environment) where Solr is installed in a master-slave pattern. In distributed mode, the index is created on the master server and it is replicated to one or more slave servers.

The key terms associated with Solr Cloud are as follows −

  • Node − In Solr cloud, each single instance of Solr is regarded as a node.

  • Cluster − All the nodes of the environment combined together make a cluster.

  • Collection − A cluster has a logical index that is known as a collection.

  • Shard − A shard is portion of the collection which has one or more replicas of the index.

  • Replica − In Solr Core, a copy of shard that runs in a node is known as a replica.

  • Leader − It is also a replica of shard, which distributes the requests of the Solr Cloud to the remaining replicas.

  • Zookeeper − It is an Apache project that Solr Cloud uses for centralized configuration and coordination, to manage the cluster and to elect a leader.

Configuration Files

The main configuration files in Apache Solr are as follows −

  • Solr.xml − It is the file in the $SOLR_HOME directory that contains Solr Cloud related information. To load the cores, Solr refers to this file, which helps in identifying them.

  • Solrconfig.xml − This file contains the definitions and core-specific configurations related to request handling and response formatting, along with indexing, configuring, managing memory and making commits.

  • Schema.xml − This file contains the whole schema along with the fields and field types.

  • Core.properties − This file contains the configurations specific to the core. It is referred for core discovery, as it contains the name of the core and path of the data directory. It can be used in any directory, which will then be treated as the core directory.