Setting Up an External ZooKeeper Ensemble

Although Solr comes bundled with Apache ZooKeeper, you are strongly encouraged to use an external ZooKeeper setup in production.

While using Solr’s embedded ZooKeeper instance is fine for getting started, you shouldn’t use this in production because it does not provide any failover: if the Solr instance that hosts ZooKeeper shuts down, ZooKeeper is also shut down. Any shards or Solr instances that rely on it will not be able to communicate with it or each other.

The solution to this problem is to set up an external ZooKeeper ensemble, which is a number of servers running ZooKeeper that communicate with each other to coordinate the activities of the cluster.

How Many ZooKeeper Nodes?

The first question to answer is the number of ZooKeeper nodes you will run in your ensemble.

When planning how many ZooKeeper nodes to configure, keep in mind that the main principle for a ZooKeeper ensemble is maintaining a majority of servers to serve requests. This majority is called a quorum.

"For a ZooKeeper service to be active, there must be a majority of non-failing machines that can communicate with each other. To create a deployment that can tolerate the failure of F machines, you should count on deploying 2xF+1 machines.

ZooKeeper Administrator's Guide, http://zookeeper.apache.org/doc/r3.4.14/zookeeperAdmin.html

To properly maintain a quorum, it’s highly recommended to have an odd number of ZooKeeper servers in your ensemble, so a majority is maintained.

To explain why, think about this scenario: If you have two ZooKeeper nodes and one goes down, this means only 50% of available servers are available. Since this is not a majority, ZooKeeper will no longer serve requests.

However, if you have three ZooKeeper nodes and one goes down, you have 66% of your servers available and ZooKeeper will continue normally while you repair the one down node. If you have 5 nodes, you could continue operating with two down nodes if necessary.

It’s not generally recommended to go above 5 nodes. While it may seem that more nodes provide greater fault-tolerance and availability, in practice it becomes less efficient because of the amount of inter-node coordination that occurs. Unless you have a truly massive Solr cluster (on the scale of 1,000s of nodes), try to stay to 3 as a general rule, or maybe 5 if you have a larger cluster.

More information on ZooKeeper clusters is available from the ZooKeeper documentation at http://zookeeper.apache.org/doc/r3.4.14/zookeeperAdmin.html#sc_zkMulitServerSetup.

Download Apache ZooKeeper

The first step in setting up Apache ZooKeeper is, of course, to download the software. It’s available from http://zookeeper.apache.org/releases.html.

Solr currently uses Apache ZooKeeper v3.4.14.

When using an external ZooKeeper ensemble, you will need need to keep your local installation up-to-date with the latest version distributed with Solr. Since it is a stand-alone application in this scenario, it does not get upgraded as part of a standard Solr upgrade.

ZooKeeper Installation

Installation consists of extracting the files into a specific target directory where you’d like to have ZooKeeper store its internal data. The actual directory itself doesn’t matter, as long as you know where it is.

The command to unpack the ZooKeeper package is:

tar xvf zookeeper-3.4.14.tar.gz

This location is the <ZOOKEEPER_HOME> for ZooKeeper on this server.

Installing and unpacking ZooKeeper must be repeated on each server where ZooKeeper will be run.

Configuration for a ZooKeeper Ensemble

After installation, we’ll first take a look at the basic configuration for ZooKeeper, then specific parameters for configuring each node to be part of an ensemble.

Initial Configuration

To configure your ZooKeeper instance, create a file named <ZOOKEEPER_HOME>/conf/zoo.cfg. A sample configuration file is included in your ZooKeeper installation, as conf/zoo_sample.cfg. You can edit and rename that file instead of creating it new if you prefer.

The file should have the following information to start:

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181

The parameters are as follows:

tickTime
Part of what ZooKeeper does is determine which servers are up and running at any given time, and the minimum session time out is defined as two "ticks". The tickTime parameter specifies in milliseconds how long each tick should be.
dataDir
This is the directory in which ZooKeeper will store data about the cluster. This directory must be empty before starting ZooKeeper for the first time.
clientPort
This is the port on which Solr will access ZooKeeper.

These are the basic parameters that need to be in use on each ZooKeeper node, so this file must be copied to or created on each node.

Next we’ll customize this configuration to work within an ensemble.

Ensemble Configuration

To complete configuration for an ensemble, we need to set additional parameters so each node knows who it is in the ensemble and where every other node is.

Each of the examples below assume you are installing ZooKeeper on different servers with different hostnames.

Once complete, your zoo.cfg file might look like this:

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181

initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888

autopurge.snapRetainCount=3
autopurge.purgeInterval=1

We’ve added these parameters to the three we had already:

initLimit
Amount of time, in ticks, to allow followers to connect and sync to a leader. In this case, you have 5 ticks, each of which is 2000 milliseconds long, so the server will wait as long as 10 seconds to connect and sync with the leader.
syncLimit
Amount of time, in ticks, to allow followers to sync with ZooKeeper. If followers fall too far behind a leader, they will be dropped.
server.X

These are the server IDs (the X part), hostnames (or IP addresses) and ports for all servers in the ensemble. The IDs differentiate each node of the ensemble, and allow each node to know where each of the other node is located. The ports can be any ports you choose; ZooKeeper’s default ports are 2888:3888.

Since we’ve assigned server IDs to specific hosts/ports, we must also define which server in the list this node is. We do this with a myid file stored in the data directory (defined by the dataDir parameter). The contents of the myid file is only the server ID.

In the case of the configuration example above, you would create the file /var/lib/zookeeper/1/myid with the content "1" (without quotes), as in this example:

1
autopurge.snapRetainCount

The number of snapshots and corresponding transaction logs to retain when purging old snapshots and transaction logs.

ZooKeeper automatically keeps a transaction log and writes to it as changes are made. A snapshot of the current state is taken periodically, and this snapshot supersedes transaction logs older than the snapshot. However, ZooKeeper never cleans up either the old snapshots or the old transaction logs; over time they will silently fill available disk space on each server.

To avoid this, set the autopurge.snapRetainCount and autopurge.purgeInterval parameters to enable an automatic clean up (purge) to occur at regular intervals. The autopurge.snapRetainCount parameter will keep the set number of snapshots and transaction logs when a clean up occurs. This parameter can be configured higher than 3, but cannot be set lower than 3.

autopurge.purgeInterval
The time in hours between purge tasks. The default for this parameter is 0, so must be set to 1 or higher to enable automatic clean up of snapshots and transaction logs. Setting it as high as 24, for once a day, is acceptable if preferred.

We’ll repeat this configuration on each node.

On the second node, update <ZOOKEEPER_HOME>/conf/zoo.cfg file so it matches the content on node 1 (particularly the server hosts and ports):

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181

initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888

autopurge.snapRetainCount=3
autopurge.purgeInterval=1

On the second node, create a myid file with the contents "2", and put it in the /var/lib/zookeeper directory.

2

On the third node, update <ZOOKEEPER_HOME>/conf/zoo.cfg file so it matches the content on nodes 1 and 2 (particularly the server hosts and ports):

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181

initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888

autopurge.snapRetainCount=3
autopurge.purgeInterval=1

And create the myid file in the /var/lib/zookeeper directory:

3

Repeat this for servers 4 and 5 if you are creating a 5-node ensemble (a rare case).

ZooKeeper Environment Configuration

To ease troubleshooting in case of problems with the ensemble later, it’s recommended to run ZooKeeper with logging enabled and with proper JVM garbage collection (GC) settings.

  1. Create a file named zookeeper-env.sh and put it in the <ZOOKEEPER_HOME>/conf directory (the same place you put zoo.cfg). This file will need to exist on each server of the ensemble.
  2. Add the following settings to the file:

    ZOO_LOG_DIR="/path/for/log/files"
    ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
    
    SERVER_JVMFLAGS="-Xms2048m -Xmx2048m -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:$ZOO_LOG_DIR/zookeeper_gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M"

    The property ZOO_LOG_DIR defines the location on the server where ZooKeeper will print its logs. ZOO_LOG4J_PROP sets the logging level and log appenders.

    With SERVER_JVMFLAGS, we’ve defined several parameters for garbage collection and logging GC-related events. One of the system parameters is -Xloggc:$ZOO_LOG_DIR/zookeeper_gc.log, which will put the garbage collection logs in the same directory we’ve defined for ZooKeeper logs, in a file named zookeeper_gc.log.

  3. Review the default settings in <ZOOKEEPER_HOME>/conf/log4j.properties, especially the log4j.appender.ROLLINGFILE.MaxFileSize parameter. This sets the size at which log files will be rolled over, and by default it is 10MB.
  4. Copy zookeeper-env.sh and any changes to log4j.properties to each server in the ensemble.
The above instructions are for Linux servers only. The default zkServer.sh script includes support for a zookeeper-env.sh file but the Windows version of the script, zkServer.cmd, does not. To make the same configuration on a Windows server, the changes would need to be made directly in the zkServer.cmd.

At this point, you are ready to start your ZooKeeper ensemble.

More Information about ZooKeeper

ZooKeeper provides a great deal of power through additional configurations, but delving into them is beyond the scope of Solr’s documentation. For more information, see the ZooKeeper documentation.

Starting and Stopping ZooKeeper

Start ZooKeeper

To start the ensemble, use the <ZOOKEEPER_HOME>/bin/zkServer.sh or zkServer.cmd script, as with this command:

Linux OS
zkServer.sh start
Windows OS
zkServer.cmd start

This command needs to be run on each server that will run ZooKeeper.

You should see the ZooKeeper logs in the directory where you defined to store them. However, immediately after startup, you may not see the zookeeper_gc.log yet, as it likely will not appear until garbage collection has happened the first time.

Shut Down ZooKeeper

To shut down ZooKeeper, use the same zkServer.sh or zkServer.cmd script on each server with the "stop" command:

Linux OS
zkServer.sh stop
Windows OS
zkServer.cmd stop

Solr Configuration

When starting Solr, you must provide an address for ZooKeeper or Solr won’t know how to use it. This can be done in two ways: by defining the connect string, a list of servers where ZooKeeper is running, at every startup on every node of the Solr cluster, or by editing Solr’s include file as a permanent system parameter. Both approaches are described below.

When referring to the location of ZooKeeper within Solr, it’s best to use the addresses of all the servers in the ensemble. If one happens to be down, Solr will automatically be able to send its request to another server in the list.

Using a chroot

If your ensemble is or will be shared among other systems besides Solr, you should consider defining application-specific znodes, or a hierarchical namespace that will only include Solr’s files.

Once you create a znode for each application, you add it’s name, also called a chroot, to the end of your connect string whenever you tell Solr where to access ZooKeeper.

Creating a chroot is done with a bin/solr command:

bin/solr zk mkroot /solr -z zk1:2181,zk2:2181,zk3:2181

See the section Create a znode for more examples of this command.

Once the znode is created, it behaves in a similar way to a directory on a filesystem: the data stored by Solr in ZooKeeper is nested beneath the main data directory and won’t be mixed with data from another system or process that uses the same ZooKeeper ensemble.

Using the -z Parameter with bin/solr

Pointing Solr at the ZooKeeper ensemble you’ve created is a simple matter of using the -z parameter when using the bin/solr script.

For example, to point the Solr instance to the ZooKeeper you’ve started on port 2181 on three servers with chroot /solr (see Using a chroot above), this is what you’d need to do:

bin/solr start -e cloud -z zk1:2181,zk2:2181,zk3:2181/solr

Updating Solr’s Include Files

If you update Solr’s include file (solr.in.sh or solr.in.cmd), which overrides defaults used with bin/solr, you will not have to use the -z parameter with bin/solr commands.

Linux: solr.in.sh

The section to look for will be commented out:

# Set the ZooKeeper connection string if using an external ZooKeeper ensemble
# e.g. host1:2181,host2:2181/chroot
# Leave empty if not using SolrCloud
#ZK_HOST=""

Remove the comment marks at the start of the line and enter the ZooKeeper connect string:

# Set the ZooKeeper connection string if using an external ZooKeeper ensemble
# e.g. host1:2181,host2:2181/chroot
# Leave empty if not using SolrCloud
ZK_HOST="zk1:2181,zk2:2181,zk3:2181/solr"

Windows: solr.in.cmd

The section to look for will be commented out:

REM Set the ZooKeeper connection string if using an external ZooKeeper ensemble
REM e.g. host1:2181,host2:2181/chroot
REM Leave empty if not using SolrCloud
REM set ZK_HOST=

Remove the comment marks at the start of the line and enter the ZooKeeper connect string:

REM Set the ZooKeeper connection string if using an external ZooKeeper ensemble
REM e.g. host1:2181,host2:2181/chroot
REM Leave empty if not using SolrCloud
set ZK_HOST=zk1:2181,zk2:2181,zk3:2181/solr

Now you will not have to enter the connection string when starting Solr.

Increasing the File Size Limit

ZooKeeper is designed to hold small files, on the order of kilobytes. By default, ZooKeeper’s file size limit is 1MB. Attempting to write or read files larger than this will cause errors.

Some Solr features, e.g., text analysis synonyms, LTR, and OpenNLP named entity recognition, require configuration resources that can be larger than the default limit. ZooKeeper can be configured, via Java system property jute.maxbuffer, to increase this limit. Note that this configuration, which is required both for ZooKeeper server(s) and for all clients that connect to the server(s), must be the same everywhere it is specified.

Configuring jute.maxbuffer on ZooKeeper Nodes

jute.maxbuffer must be configured on each external ZooKeeper node. This can be achieved in any of the following ways; note though that only the first option works on Windows:

  1. In <ZOOKEEPER_HOME>/conf/zoo.cfg, e.g., to increase the file size limit to one byte less than 10MB, add this line:

    jute.maxbuffer=0x9fffff
  2. In <ZOOKEEPER_HOME>/conf/zookeeper-env.sh, e.g., to increase the file size limit to 50MiB, add this line:

    JVMFLAGS="$JVMFLAGS -Djute.maxbuffer=50000000"
  3. In <ZOOKEEPER_HOME>/bin/zkServer.sh, add a JVMFLAGS environment variable assignment near the top of the script, e.g., to increase the file size limit to 5MiB:

    JVMFLAGS="$JVMFLAGS -Djute.maxbuffer=5000000"

Configuring jute.maxbuffer for ZooKeeper Clients

The bin/solr script invokes Java programs that act as ZooKeeper clients. When you use Solr’s bundled ZooKeeper server instead of setting up an external ZooKeeper ensemble, the configuration described below will also configure the ZooKeeper server.

Add the setting to the SOLR_OPTS environment variable in Solr’s include file (bin/solr.in.sh or solr.in.cmd):

Linux: solr.in.sh

The section to look for will start:

# Anything you add to the SOLR_OPTS variable will be included in the java
# start command line as-is, in ADDITION to other options. If you specify the
# -a option on start script, those options will be appended as well. Examples:

Add the following line to increase the file size limit to 2MB:

SOLR_OPTS="$SOLR_OPTS -Djute.maxbuffer=0x200000"

Windows: solr.in.cmd

The section to look for will start:

REM Anything you add to the SOLR_OPTS variable will be included in the java
REM start command line as-is, in ADDITION to other options. If you specify the
REM -a option on start script, those options will be appended as well. Examples:

Add the following line to increase the file size limit to 2MB:

set SOLR_OPTS=%SOLR_OPTS% -Djute.maxbuffer=0x200000

Securing the ZooKeeper Connection

You may also want to secure the communication between ZooKeeper and Solr.

To setup ACL protection of znodes, see the section ZooKeeper Access Control.