Solr Docker FAQ

How do I persist Solr data and config?

Solr’s Docker image is pre-configured with container path /var/solr/ as a volume. What this means is that all index data, log files and other variable data will be persisted on the Docker host, even if you remove the container instance.

How can I mount a host directory as a data volume?

By default Solr’s volume is persisted in Docker’s default storage location on the host. On Linux systems this is /var/lib/docker/volumes. This is the recommended way to store Solr’s data. You have flexibility to use a bind mount host folder as well:

docker run --rm -p 8983:8983 -v $(pwd)/myData:/var/solr/ solr:9-slim

But this is both dependent on the host operating system and may run into different kind of file system permission issues.

Can I use volumes with SOLR_HOME?

While you could re-define SOLR_HOME inside the container, we instead recommend you to use the existing SOLR_HOME defined at /var/solr/, see above. You can give the volume a meaningful name instead of the auto generated hash, example name solrData:

docker run --rm -p 8983:8983 -v solrData:/mysolrhome solr:9-slim

Can I run ZooKeeper and Solr clusters under Docker?

At the network level the ZooKeeper nodes need to be able to talk to each other, and the Solr nodes need to be able to talk to the ZooKeeper nodes and to each other. At the application level, different nodes need to be able to identify and locate each other. In ZooKeeper that is done with a configuration file that lists hostnames or IP addresses for each node. In Solr that is done with a parameter that specifies a host or IP address, which is then stored in ZooKeeper.

In typical clusters, those hostnames/IP addresses are pre-defined and remain static through the lifetime of the cluster. In Docker, inter-container communication and multi-host networking can be facilitated by Docker Networks. But, crucially, Docker does not normally guarantee that IP addresses of containers remain static during the lifetime of a container. In non-networked Docker, the IP address seems to change every time you stop/start. In a networked Docker, containers can lose their IP address in certain sequences of starting/stopping, unless you take steps to prevent that.

IP changes cause problems:

  • If you use hardcoded IP addresses in configuration, and the addresses of your containers change after a stops/start, then your cluster will stop working and may corrupt itself.

  • If you use hostnames in configuration, and the addresses of your containers change, then you might run into problems with cached hostname lookups.

  • And if you use hostnames there is another problem: the names are not defined until the respective container is running, So when for example the first ZooKeeper node starts up, it will attempt a hostname lookup for the other nodes, and that will fail. This is especially a problem for ZooKeeper 3.4.6; future versions are better at recovering.

Docker 1.10 has a new --ip configuration option that allows you to specify an IP address for a container. It also has a --ip-range option that allows you to specify the range that other containers get addresses from. Used together, you can implement static addresses. See the Solr & ZooKeeper with Docker Networking for more information.

How can I run ZooKeeper and Solr with Docker Compose?

How can I get rid of "shared memory" warnings on Solr startup?

When starting the docker image you typically see these log lines:

OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 1)

If your set up can run without huge pages or you do not require it, the least-friction way to remove this warning is to disable large paging in the JVM via the environment variable:

SOLR_OPTS=-XX:-UseLargePages

In your Solr Admin UI, you will see listed under the JVM args both the original -XX:+UseLargePages set by the GC_TUNE environment variable and further down the list the overriding -XX:-UseLargePages argument.

I’m confused about the different invocations of Solr — help?

The different invocations of the Solr docker image can look confusing, because the name of the image is "solr" and the Solr command is also "solr", and the image interprets various arguments in special ways. Let’s illustrate the various invocations:

To run an arbitrary command in the image:

docker run -it solr date

Here "solr" is the name of the image, and "date" is the command. This does not invoke any Solr functionality.

To run the Solr server:

docker run -it solr

Here "solr" is the name of the image, and there is no specific command, so the image defaults to run the "solr" command with "-f" to run it in the foreground.

To run the Solr server with extra arguments:

docker run -it solr -h myhostname

This is the same as the previous one, but an additional argument is passed. The image will run the "solr" command with "-f -h myhostname".

To run solr as an arbitrary command:

docker run -it solr solr zk --help

Here the first "solr" is the image name, and the second "solr" is the "solr" command. The image runs the command exactly as specified; no "-f" is implicitly added. The container will print help text, and exit.

If you find this visually confusing, it might be helpful to use more specific image tags, and specific command paths. For example:

docker run -it solr bin/solr -f -h myhostname

Finally, the Solr docker image offers several commands that do some work before then invoking the Solr server, like "solr-precreate" and "solr-demo". See the README.md for usage. These are implemented by the docker-entrypoint.sh script, and must be passed as the first argument to the image. For example:

docker run -it solr solr-demo

It’s important to understand an implementation detail here. The Dockerfile uses solr-foreground as the CMD, and the docker-entrypoint.sh implements that by by running "solr -f". So these two are equivalent:

docker run -it solr
docker run -it solr solr-foreground

whereas:

docker run -it solr solr -f

is slightly different: the "solr" there is a generic command, not treated in any special way by docker-entrypoint.sh. In particular, this means that the docker-entrypoint-initdb.d mechanism is not applied. So, if you want to use docker-entrypoint-initdb.d, then you must use one of the other two invocations. You also need to keep that in mind when you want to invoke solr from the bash command. For example, this does NOT run docker-entrypoint-initdb.d scripts:

docker run -it -v $PWD/set-heap.sh:/docker-entrypoint-initdb.d/set-heap.sh \
    solr bash -c "echo hello; solr -f"

but this does:

docker run -it $PWD/set-heap.sh:/docker-entrypoint-initdb.d/set-heap.sh \
    solr bash -c "echo hello; /opt/docker-solr/scripts/docker-entrypoint.sh solr-foreground"