Solr in Docker
Getting started with the Docker image
Instructions below apply to solr:8.0.0
and above.
See the Docker Hub page for a full list of tags and architectures available.
Available images
Two docker images are provided for each release, full and slim. These correspond to the two binary distributions that are produced for each Solr release. The docker images for these distributions can be found at:
Full distribution |
|
Slim distribution |
|
Please refer to the Available Solr Packages section for more information on each distribution.
Running Solr with host-mounted directories
Typically users first want to run a single standalone Solr server in a container, with a single core for data, while storing data in a local directory. This is a convenient mechanism for developers, and could be used for single-server production hosts too.
mkdir solrdata
docker run -d -v "$PWD/solrdata:/var/solr" -p 8983:8983 --name my_solr solr solr-precreate gettingstarted
Then with a web browser go to http://localhost:8983/
to see Solr’s Admin UI (adjust the hostname for your docker host).
In the UI, click on "Core Admin" and should now see the "gettingstarted" core.
Next load some of the example data that is included in the container:
docker exec -it my_solr post -c gettingstarted example/exampledocs/manufacturers.xml
In the UI, find the "Core selector" popup menu and select the "gettingstarted" core, then select the "Query" menu item.
This gives you a default search for :
which returns all docs.
Hit the "Execute Query" button, and you should see a few docs with data.
Congratulations!
Docker-Compose
You can use Docker Compose to run a single standalone server or a multi-node cluster.
And you could use Docker Volumes instead of host-mounted directories.
For example, with a docker-compose.yml
containing the following:
version: '3'
services:
solr:
image: solr
ports:
- "8983:8983"
volumes:
- data:/var/solr
command:
- solr-precreate
- gettingstarted
volumes:
data:
you can simply run:
docker compose up -d
Below is an example compose file that starts a Solr Cloud cluster with Zookeeper. By
creating a Docker network, Solr can reach the zookeeper container with the internal
name zoo
.:
version: '3'
services:
solr:
image: solr:9-slim
ports:
- "8983:8983"
networks: [search]
environment:
ZK_HOST: "zoo:2181"
depends_on: [zoo]
zoo:
image: zookeeper:3.9
networks: [search]
environment:
ZOO_4LW_COMMANDS_WHITELIST: "mntr,conf,ruok"
networks:
search:
driver: bridge
How the Image Works
The container contains an installation of Solr, as installed by the service installation script.
This stores the Solr distribution in /opt/solr
, and configures Solr to use /var/solr
to store data and logs, using the /etc/default/solr
file for configuration.
If you want to persist the data, mount a volume or directory on /var/solr
.
Solr expects some files and directories in /var/solr
; if you use your own directory or volume you can either pre-populate them, or let Solr docker copy them for you.
If you want to use custom configuration, mount it in the appropriate place.
See below for examples.
The Solr docker distribution adds scripts in /opt/solr/docker/scripts
to make it easier to use under Docker, for example to create cores on container startup.
Creating Cores
When Solr runs in standalone mode, you create "cores" to store data. On a non-Docker Solr, you would run the server in the background, then use the Solr control script to create cores and load data. With Solr docker you have various options.
Manually
The first is exactly the same: start Solr running in a container, then execute the control script manually in the same container:
docker run -d -p 8983:8983 --name my_solr solr
docker exec -it my_solr solr create_core -c gettingstarted
This is not very convenient for users, and makes it harder to turn it into configuration for Docker Compose and orchestration tools like Kubernetes.
Using solr-precreate Command
So, typically you will use the solr-precreate
command which prepares the specified core and then runs Solr:
docker run -d -p 8983:8983 --name my_solr solr solr-precreate gettingstarted
The solr-precreate
command takes an optional extra argument to specify a configset directory below /opt/solr/server/solr/configsets/
or you can specify a full path to a custom configset inside the container:
docker run -d -p 8983:8983 --name my_solr -v $PWD/config/solr:/my_core_config/conf solr:8 solr-precreate my_core /my_core_config
N.B. When specifying the full path to the configset, the actual core configuration should be located inside that directory in the conf
directory.
See Configsets for details.
Using solr-create Command
The third option is to use the solr-create
command.
This runs a Solr in the background in the container, then uses the Solr control script to create the core, then stops the Solr server and restarts it in the foreground.
This method is less popular because the double Solr run can be confusing.
docker run -d -p 8983:8983 --name my_solr solr solr-create -c gettingstarted
Creating Collections
In a "SolrCloud" cluster you create "collections" to store data; and again you have several options for creating a core.
These examples assume you’re running a docker compose cluster.
The first way to create a collection is to go to the Solr Admin UI, select "Collections" from the left-hand side navigation menu, then press the "Add Collection" button, give it a name, select the _default
config set, then press the "Add Collection" button.
The second way is through the Solr control script on one of the containers:
docker exec solr1 solr create -c gettingstarted2
The third way is to use a separate container:
docker run -e SOLR_HOST=solr1 --network docs_solr solr solr create_collection -c gettingstarted3 -p 8983
The fourth way is to use the remote API, from the host or from one of the containers, or some new container on the same network (adjust the hostname accordingly):
curl 'http://localhost:8983/solr/admin/collections?action=CREATE&name=gettingstarted3&numShards=1&collection.configName=_default'
If you want to use a custom configuration for your collection, you first need to upload it, and then refer to it by name when you create the collection.
You can use the bin/solr zk
command or the Configsets API.
Loading Your Own Data
There are several ways to load data; let’s look at the most common ones.
The most common first deployment is to run Solr standalone (not in a cluster), on a workstation or server, where you have local data you wish to load. One way of doing that is using a separate container, with a mounted volume containing the data, using the host network so you can connect to the mapped port:
# start Solr. Listens on localhost:8983
docker run --name my_solr -p 8983:8983 solr solr-precreate books
# get data
mkdir mydata
wget -O mydata/books.csv https://raw.githubusercontent.com/apache/solr/main/solr/example/exampledocs/books.csv
docker run --rm -v "$PWD/mydata:/mydata" --network=host solr post -c books /mydata/books.csv
The same works if you use the example docker compose cluster, or you can just start your loading container in the same network:
docker run -e SOLR_HOST=solr1 --network=mycluster_solr solr solr create_collection -c books -p 8983
docker run --rm -v "$PWD/mydata:/mydata" --network=mycluster_solr solr post -c books /mydata/books.csv -host solr1
Alternatively, you can make the data available on a volume at Solr start time, and then load it from docker exec
or a custom start script.
solr.in.sh Configuration
In Solr it is common to configure settings in solr.in.sh, as documented in the section Environment Overrides Include File.
The solr.in.sh
file can be found in /etc/default
:
docker run solr cat /etc/default/solr.in.sh
It has various commented-out values, which you can override when running the container, like:
docker run -d -p 8983:8983 -e SOLR_HEAP=800m solr
You can also mount your own config file. Do not modify the values that are set at the end of the file.
Extending the Image
The Solr docker image has an extension mechanism.
At run time, before starting Solr, the container will execute scripts
in the /docker-entrypoint-initdb.d/
directory.
You can add your own scripts there either by using mounted volumes
or by using a custom Dockerfile.
These scripts can for example copy a core directory with pre-loaded data for continuous
integration testing, or modify the Solr configuration.
Here is a simple example.
With a custom.sh
script like:
#!/bin/bash
set -e
echo "this is running inside the container before Solr starts"
you can run:
$ docker run --name solr_custom1 -d -v $PWD/custom.sh:/docker-entrypoint-initdb.d/custom.sh solr
$ sleep 5
$ docker logs solr_custom1 | head
/opt/docker-solr/scripts/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/set-heap.sh
this is running inside the container before Solr starts
Starting Solr on port 8983 from /opt/solr/server
With this extension mechanism it can be useful to see the shell commands that are being executed by the docker-entrypoint.sh
script in the docker log.
To do that, set an environment variable using Docker’s -e VERBOSE=yes
.
Instead of using this mechanism, you can of course create your own script that does setup and then call solr-foreground
, mount that script into the container, and execute it as a command when running the container.
Other ways of extending the image are to create custom Docker images that inherit from this one.
Debugging with jattach
The jcmd
, jmap
jstack
tools can be useful for debugging Solr inside the container.
These tools are not included with the JRE, but this image includes the jattach utility which lets you do much of the same.
Usage: jattach <pid> <cmd> [args ...] Commands: load : load agent library properties : print system properties agentProperties : print agent properties datadump : show heap and thread summary threaddump : dump all stack traces (like jstack) dumpheap : dump heap (like jmap) inspectheap : heap histogram (like jmap -histo) setflag : modify manageable VM flag printflag : print VM flag jcmd : execute jcmd command
Example commands to do a thread dump and get heap info for PID 10
:
jattach 10 threaddump
jattach 10 jcmd GC.heap_info
Running under tini
The Solr docker image runs Solr under tini, to make signal handling work better; in particular, this allows you to kill -9
the JVM.
If you run docker run --init
, or use init: true
in docker-compose.yml
, or have added --init
to dockerd
, docker will start its tini
and docker-solr will notice it is not PID 1, and just exec
Solr.
If you do not run with --init
, then the docker entrypoint script detects that it is running as PID 1, and will start the tini
present in the docker-solr image, and run Solr under that.
If you really do not want to run tini
, and just run Solr as PID 1 instead, then you can set the TINI=no
environment variable.
Out of Memory Handling
Please refer to the Out of Memory Handling Section for more information. The Docker image no-longer has custom logic for OOMs.
History
The Docker-Solr project was started in 2015 by Martijn Koster in the docker-solr repository. In 2019 maintainership and copyright was transferred to the Apache Lucene/Solr project, and in 2020 the project was migrated to live within the Solr project. Many thanks to Martijn for all your contributions over the years!