Getting Started with SolrCloud
SolrCloud is designed to provide a highly available, fault tolerant environment for distributing your indexed content and query requests across multiple servers.
It’s a system in which data is organized into multiple pieces, or shards, that can be hosted on multiple machines, with replicas providing redundancy for both scalability and fault tolerance, and a ZooKeeper server that helps manage the overall structure so that both indexing and search requests can be routed properly.
This section explains SolrCloud and its inner workings in detail, but before you dive in, it’s best to have an idea of what it is you’re trying to accomplish.
This page provides a simple tutorial to start Solr in SolrCloud mode, so you can begin to get a sense for how shards interact with each other during indexing and when serving queries. To that end, we’ll use simple examples of configuring SolrCloud on a single machine, which is obviously not a real production environment, which would include several servers or virtual machines. In a real production environment, you’ll also use the real machine names instead of "localhost" which we’ve used here.
In this section you will learn how to start a SolrCloud cluster using startup scripts and a specific configset.
This tutorial assumes that you’re already familiar with the basics of using Solr. If you need a refresher, please see the Getting Started section to get a grounding in Solr concepts. If you load documents as part of that exercise, you should start over with a fresh Solr installation for these SolrCloud tutorials. |
SolrCloud Example
Interactive Startup
The bin/solr
script makes it easy to get started with SolrCloud as it walks you through the process of launching Solr nodes in SolrCloud mode and adding a collection. To get started, simply do:
bin/solr -e cloud
This starts an interactive session to walk you through the steps of setting up a simple SolrCloud cluster with embedded ZooKeeper.
The script starts by asking you how many Solr nodes you want to run in your local cluster, with the default being 2.
Welcome to the SolrCloud example!
This interactive session will help you launch a SolrCloud cluster on your local workstation.
To begin, how many Solr nodes would you like to run in your local cluster? (specify 1-4 nodes) [2]
The script supports starting up to 4 nodes, but we recommend using the default of 2 when starting out. These nodes will each exist on a single machine, but will use different ports to mimic operation on different servers.
Next, the script will prompt you for the port to bind each of the Solr nodes to, such as:
Please enter the port for node1 [8983]
Choose any available port for each node; the default for the first node is 8983 and 7574 for the second node. The script will start each node in order and show you the command it uses to start the server, such as:
solr start -cloud -s example/cloud/node1/solr -p 8983
The first node will also start an embedded ZooKeeper server bound to port 9983. The Solr home for the first node is in example/cloud/node1/solr
as indicated by the -s
option.
After starting up all nodes in the cluster, the script prompts you for the name of the collection to create:
Please provide a name for your new collection: [gettingstarted]
The suggested default is "gettingstarted" but you might want to choose a name more appropriate for your specific search application.
Next, the script prompts you for the number of shards to distribute the collection across. Sharding is covered in more detail later on, so if you’re unsure, we suggest using the default of 2 so that you can see how a collection is distributed across multiple nodes in a SolrCloud cluster.
Next, the script will prompt you for the number of replicas to create for each shard. Replication is covered in more detail later in the guide, so if you’re unsure, then use the default of 2 so that you can see how replication is handled in SolrCloud.
Lastly, the script will prompt you for the name of a configuration directory for your collection. You can choose _default, or sample_techproducts_configs. The configuration directories are pulled from server/solr/configsets/
so you can review them beforehand if you wish. The _default configuration is useful when you’re still designing a schema for your documents and need some flexibility as you experiment with Solr, since it has schemaless functionality. However, after creating your collection, the schemaless functionality can be disabled in order to lock down the schema (so that documents indexed after doing so will not alter the schema) or to configure the schema by yourself. This can be done as follows (assuming your collection name is mycollection
):
curl http://host:8983/solr/mycollection/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'
curl http://host:8983/api/collections/mycollection/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'
At this point, you should have a new collection created in your local SolrCloud cluster. To verify this, you can run the status command:
bin/solr status
If you encounter any errors during this process, check the Solr log files in example/cloud/node1/logs
and example/cloud/node2/logs
.
You can see how your collection is deployed across the cluster by visiting the cloud panel in the Solr Admin UI: http://localhost:8983/solr/#/~cloud. Solr also provides a way to perform basic diagnostics for a collection using the healthcheck command:
bin/solr healthcheck -c gettingstarted
The healthcheck command gathers basic information about each replica in a collection, such as number of docs, current status (active, down, etc.), and address (where the replica lives in the cluster).
Documents can now be added to SolrCloud using the Post Tool.
To stop Solr in SolrCloud mode, you would use the bin/solr
script and issue the stop
command, as in:
bin/solr stop -all
Starting with -noprompt
You can also get SolrCloud started with all the defaults instead of the interactive session using the following command:
bin/solr -e cloud -noprompt
Restarting Nodes
You can restart your SolrCloud nodes using the bin/solr
script. For instance, to restart node1 running on port 8983 (with an embedded ZooKeeper server), you would do:
bin/solr restart -c -p 8983 -s example/cloud/node1/solr
To restart node2 running on port 7574, you can do:
bin/solr restart -c -p 7574 -z localhost:9983 -s example/cloud/node2/solr
Notice that you need to specify the ZooKeeper address (-z localhost:9983
) when starting node2 so that it can join the cluster with node1.
Adding a Node to a Cluster
Adding a node to an existing cluster is a bit advanced and involves a little more understanding of Solr. Once you startup a SolrCloud cluster using the startup scripts, you can add a new node to it by:
mkdir <solr.home for new Solr node>
cp <existing solr.xml path> <new solr.home>
bin/solr start -cloud -s solr.home/solr -p <port num> -z <zk hosts string>
Notice that the above requires you to create a Solr home directory. You either need to copy solr.xml
to the solr_home
directory, or keep in centrally in ZooKeeper /solr.xml
.
Example (with directory structure) that adds a node to an example started with "bin/solr -e cloud":
mkdir -p example/cloud/node3/solr
cp server/solr/solr.xml example/cloud/node3/solr
bin/solr start -cloud -s example/cloud/node3/solr -p 8987 -z localhost:9983
The previous command will start another Solr node on port 8987 with Solr home set to example/cloud/node3/solr
. The new node will write its log files to example/cloud/node3/logs
.
Once you’re comfortable with how the SolrCloud example works, we recommend using the process described in Taking Solr to Production for setting up SolrCloud nodes in production.