SolrCloud Shards and Indexing

When your collection is too large for one node, you can break it up and store it in sections by creating multiple shards.

A Shard is a logical partition of the collection, containing a subset of documents from the collection, such that every document in a collection is contained in exactly one shard. Which shard contains each document in a collection depends on the overall sharding strategy for that collection.

For example, you might have a collection where the "country" field of each document determines which shard it is part of, so documents from the same country are co-located. A different collection might simply use a "hash" on the uniqueKey of each document to determine its Shard.

There is support for distributing both the index process and the queries automatically, and ZooKeeper provides failover and load balancing. As well as supporting replication, automatic index splitting into shards, there is support for automatic routing of documents to specific shards by a sharding strategy. Additionally, every shard can have multiple replicas for additional robustness.

Leaders and Replicas

In SolrCloud there are no leaders or followers. Instead, every shard consists of at least one physical replica, exactly one of which is a leader. Leaders are automatically elected, initially on a first-come-first-served basis, and then based on the ZooKeeper process described at the Zookeeper docs.

If a leader goes down, one of the other replicas is automatically elected as the new leader.

When a document is sent to a Solr node for indexing, the system first determines which Shard that document belongs to, and then which node is currently hosting the leader for that shard. The document is then forwarded to the current leader for indexing, and the leader forwards the update to all of the other replicas.

Types of Replicas

By default, all replicas are eligible to become leaders if their leader goes down. However, this comes at a cost: if all replicas could become a leader at any time, every replica must be in sync with its leader at all times. New documents added to the leader must be routed to the replicas, and each replica must do a commit. If a replica goes down, or is temporarily unavailable, and then rejoins the cluster, recovery may be slow if it has missed a large number of updates.

These issues are not a problem for most users. However, some use cases would perform better if the replicas behaved a bit more like the former model, either by not syncing in real-time or by not being eligible to become leaders at all.

Solr accomplishes this by allowing you to set the replica type when creating a new collection or when adding a replica. The available types are:

NRT: This is the default. A NRT replica (NRT = NearRealTime) maintains a transaction log and writes new documents to its indexes locally. Any replica of this type is eligible to become a leader. Traditionally, this was the only type supported by Solr.
TLOG: This type of replica maintains a transaction log but does not index document changes locally. This type helps speed up indexing since no commits need to occur in the replicas. When this type of replica needs to update its index, it does so by replicating the index from the leader. This type of replica is also eligible to become a shard leader; it would do so by first processing its transaction log. If it does become a leader, it will behave the same as if it was a NRT type of replica.
PULL: This type of replica does not maintain a transaction log nor index document changes locally. It only replicates the index from the shard leader. It is not eligible to become a shard leader and doesn’t participate in shard leader election at all.

If you do not specify the type of replica when it is created, it will be NRT type.

Combining Replica Types in a Cluster

There are three combinations of replica types that are recommended:

All NRT replicas
All TLOG replicas
TLOG replicas with PULL replicas

All NRT Replicas

Use this for small to medium clusters, or even big clusters where the update (index) throughput is not too high. NRT is the only type of replica that supports soft-commits, so also use this combination when NearRealTime is needed.

All TLOG Replicas

Use this combination if NearRealTime is not needed and the number of replicas per shard is high, but you still want all replicas to be able to handle update requests.

TLOG replicas plus PULL replicas

Use this combination if NearRealTime is not needed, the number of replicas per shard is high, and you want to increase availability of search queries over document updates even if that means temporarily serving outdated results.

Other Combinations of Replica Types

Other combinations of replica types are not recommended. If more than one replica in the shard is writing its own index instead of replicating from an NRT replica, a leader election can cause all replicas of the shard to become out of sync with the leader, and all would have to replicate the full index.

Recovery with PULL Replicas

There are a few Recovery related scenarios to consider when using PULL replicas:

If a PULL replica cannot sync to the leader because the leader is down, or due to network paritioning, replication will not occur. However, the PULL replica will continue to serve queries. Once it can connect to the leader again, replication will resume.
If a PULL replica cannot connect to ZooKeeper, it will stop replicating because it will no longer be able to confidently know which replica to treat as the Leader. The PULL replica will also be removed from the cluster status and distributed queries will not be routed to it from other replias the cluster (or from SolrJ).
If a PULL replica dies or is unreachable for any other reason, it will no longer be query-able. When it rejoins the cluster, it will first attempt to recover from the current leader, and only when that is complete, will it be ready to serve queries again.

It is important to realize that when PULL replicas join (or re-join) a cluster, they will not be query-able until they do an initial recovery from the current leader.

This means that if a Solr node hosting an existing PULL replica is started (or restarted) at a moment in time where there is no active leader for that shard — either because all leader eligible replicas are currently offline, or because of the leader eligible replicas are not yet active due to replaying their own transaction logs — then that PULL replica will not be query-able until the leader election is complete.

This behavior will differ from other existing PULL replicas if they were already active & serving queries before the current leader election. These PULL replicas will continue to be query-able using the last index fetched by these PULL replicas from the last known leader.

This behavior can be customized with an expert level replica property named skipLeaderRecovery. If this property is set to true on a PULL replica, then this replica will skip it’s initial RECOVERING phase on node start (or restart), and immediately begin serving queries using it’s local index (which it will update through normal periodic replication from the leader — if & when a leader is available)

Queries with Preferred Replica Types

By default all replicas serve queries. See the section shards.preference Parameter for details on how to indicate preferred replica types for queries.

Document Routing

Solr offers the ability to specify the router implementation used by a collection by specifying the router.name parameter when creating your collection.

If you use the compositeId router (the default), you can send documents with a prefix in the document ID which will be used to calculate the hash Solr uses to determine the shard a document is sent to for indexing. The prefix can be anything you’d like it to be (it doesn’t have to be the shard name, for example), but it must be consistent so Solr behaves consistently.

For example, if you want to co-locate documents for a customer, you could use the customer name or ID as the prefix. If your customer is "IBM", for example, with a document with the ID "12345", you would insert the prefix into the document id field: "IBM!12345". The exclamation mark ('!') is critical here, as it distinguishes the prefix used to determine which shard to direct the document to.

Then at query time, you include the prefix(es) into your query with the _route_ parameter (i.e., q=solr&_route_=IBM!) to direct queries to specific shards. In some situations, this may improve query performance because it overcomes network latency when querying all the shards.

The compositeId router supports prefixes containing up to 2 levels of routing. For example: a prefix routing first by region, then by customer: "USA!IBM!12345"

Another use case could be if the customer "IBM" has a lot of documents and you want to spread it across multiple shards. The syntax for such a use case would be: shard_key/num!document_id where the /num is the number of bits from the shard key to use in the composite hash.

So IBM/3!12345 will take 3 bits from the shard key and 29 bits from the unique doc id, spreading the tenant over 1/8th of the shards in the collection. Likewise if the num value was 2 it would spread the documents across 1/4th the number of shards. At query time, you include the prefix(es) along with the number of bits into your query with the _route_ parameter (i.e., q=solr&_route_=IBM/3!) to direct queries to specific shards.

If you do not want to influence how documents are stored, you don’t need to specify a prefix in your document ID.

If you created the collection and defined the "implicit" router at the time of creation, you can additionally define a router.field parameter to use a field from each document to identify a shard where the document belongs. If the field specified is missing in the document, then the document will be rejected. You could also use the _route_ parameter to name a specific shard.

Shard Splitting

When you create a collection in SolrCloud, you decide on the initial number shards to be used. But it can be difficult to know in advance the number of shards that you need, particularly when organizational requirements can change at a moment’s notice, and the cost of finding out later that you chose wrong can be high, involving creating new cores and reindexing all of your data.

The ability to split shards is in the Collections API. It currently allows splitting a shard into two pieces. The existing shard is left as-is, so the split action effectively makes two copies of the data as new shards. You can delete the old shard at a later time when you’re ready.

More details on how to use shard splitting is in the section on the Collection API’s SPLITSHARD command.

Ignoring Commits from Client Applications in SolrCloud

In most cases, when running in SolrCloud mode, indexing client applications should not send explicit commit requests. Rather, you should configure auto commits with openSearcher=false and autoSoftCommit to make recent updates visible in search requests. This ensures that auto commits occur on a regular schedule in the cluster.

Using autoSoftCommit or commitWithin requires the client app to embrace the realities of "eventual consistency". Solr will make documents searchable at roughly the same time across replicas of a collection but there are no hard guarantees. Consequently, in rare cases, it’s possible for a document to show up in one search only for it not to appear in a subsequent search occurring immediately after the first search when the second search is routed to a different replica. Also, documents added in a particular order (even in the same batch) might become searchable out of the order of submission when there is sharding. The document will become visible on all replicas of a shard after the next autoCommit or commitWithin interval expires.

To enforce a policy where client applications should not send explicit commits, you should update all client applications that index data into SolrCloud. However, that is not always feasible, so Solr provides the IgnoreCommitOptimizeUpdateProcessorFactory, which allows you to ignore explicit commits and/or optimize requests from client applications without having refactor your client application code.

To activate this request processor you’ll need to add the following to your solrconfig.xml:

<updateRequestProcessorChain name="ignore-commit-from-client" default="true">
  <processor class="solr.IgnoreCommitOptimizeUpdateProcessorFactory">
    <int name="statusCode">200</int>
  </processor>
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.DistributedUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

As shown in the example above, the processor will return 200 to the client but will ignore the commit or optimize request. Notice that you need to wire-in the implicit processors needed by SolrCloud as well, since this custom chain is taking the place of the default chain.

In the following example, the processor will raise an exception with a 403 code with a customized error message:

<updateRequestProcessorChain name="ignore-commit-from-client" default="true">
  <processor class="solr.IgnoreCommitOptimizeUpdateProcessorFactory">
    <int name="statusCode">403</int>
    <str name="responseMessage">Thou shall not issue a commit!</str>
  </processor>
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.DistributedUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

Lastly, you can also configure it to just ignore optimize and let commits pass thru by doing:

<updateRequestProcessorChain name="ignore-optimize-only-from-client-403">
  <processor class="solr.IgnoreCommitOptimizeUpdateProcessorFactory">
    <str name="responseMessage">Thou shall not issue an optimize, but commits are OK!</str>
    <bool name="ignoreOptimizeOnly">true</bool>
  </processor>
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>