Making and Restoring Backups
If you are worried about data loss, and of course you should be, you need a way to back up your Solr indexes so that you can recover quickly in case of catastrophic failure.
Solr provides two approaches to backing up and restoring Solr cores or collections, depending on how you are running Solr. If you run in SolrCloud mode, you will use the Collections API. If you run Solr in standalone mode, you will use the replication handler.
Backups (and Snapshots) capture data that has been hard commited. Commiting changes using Likewise, committing changes using |
SolrCloud Backups
Support for backups when running SolrCloud is provided with the Collections API. This allows the backups to be generated across multiple shards, and restored to the same number of shards and replicas as the original collection.
SolrCloud Backup/Restore requires a shared file system mounted at the same path on all nodes, or HDFS. |
Four different API commands are supported:
action=BACKUP
: This command backs up Solr indexes and configurations. More information is available in the section Backup Collection.action=RESTORE
: This command restores Solr indexes and configurations. More information is available in the section Restore Collection.action=LISTBACKUP
: This command lists the backup points available at a specified location, displaying metadata for each. More information is available in the section List Backups.action=DELETEBACKUP
: This command allows deletion of backup files or whole backups. More information is available in the section Delete Backups.
Standalone Mode Backups
Backups and restoration uses Solr’s replication handler. Out of the box, Solr includes implicit support for replication so this API can be used. Configuration of the replication handler can, however, be customized by defining your own replication handler in solrconfig.xml
. For details on configuring the replication handler, see the section Configuring the ReplicationHandler.
Backup API
The backup
API requires sending a command to the /replication
handler to back up the system.
You can trigger a back-up with an HTTP command like this (replace "gettingstarted" with the name of the core you are working with):
The backup
command is an asynchronous call, and it will represent data from the latest index commit point. All indexing and search operations will continue to be executed against the index as usual.
Only one backup call can be made against a core at any point in time. While an ongoing backup operation is happening subsequent calls for restoring will throw an exception.
The backup request can also take the following additional parameters:
location
- The path where the backup will be created. If the path is not absolute then the backup path will be relative to Solr’s instance directory.
name
- The snapshot will be created in a directory called
snapshot.<name>
. If a name is not specified then the directory name will have the following format:snapshot.<yyyyMMddHHmmssSSS>
. numberToKeep
- The number of backups to keep. If
maxNumberOfBackups
has been specified on the replication handler insolrconfig.xml
,maxNumberOfBackups
is always used and attempts to usenumberToKeep
will cause an error. Also, this parameter is not taken into consideration if the backup name is specified. More information aboutmaxNumberOfBackups
can be found in the section Configuring the ReplicationHandler. repository
- The name of the repository to be used for the backup. If no repository is specified then the local filesystem repository will be used automatically.
commitName
- The name of the commit which was used while taking a snapshot using the CREATESNAPSHOT command.
Backup Status
The backup
operation can be monitored to see if it has completed by sending the details
command to the /replication
handler, as in this example:
If it failed then a snapShootException
will be sent in the response.
Restore API
Restoring the backup requires sending the restore
command to the /replication
handler, followed by the name of the backup to restore.
You can restore from a backup with a command like this:
This will restore the named index snapshot into the current core. Searches will start reflecting the snapshot data once the restore is complete.
The restore
request can take these additional parameters:
location
- The location of the backup snapshot file. If not specified, it looks for backups in Solr’s data directory.
name
- The name of the backup index snapshot to be restored. If the name is not provided it looks for backups with
snapshot.<timestamp>
format in the location directory. It picks the latest timestamp backup in that case. repository
- The name of the repository to be used for the backup. If no repository is specified then the local filesystem repository will be used automatically.
The restore
command is an asynchronous call. Once the restore is complete the data reflected will be of the backed up index which was restored.
Only one restore
call can can be made against a core at one point in time. While an ongoing restore operation is happening subsequent calls for restoring will throw an exception.
Restore Status API
You can also check the status of a restore
operation by sending the restorestatus
command to the /replication
handler, as in this example:
The status value can be "In Progress", "success" or "failed". If it failed then an "exception" will also be sent in the response.
Create Snapshot API
The snapshot functionality is different from the backup functionality as the index files aren’t copied anywhere. The index files are snapshotted in the same index directory and can be referenced while taking backups.
You can trigger a snapshot command with an HTTP command like this (replace "techproducts" with the name of the core you are working with):
The CREATESNAPSHOT
request parameters are:
commitName
- The name to store the snapshot as.
core
- The name of the core to perform the snapshot on.
async
- Request ID to track this action which will be processed asynchronously.
List Snapshot API
The LISTSNAPSHOTS
command lists all the taken snapshots for a particular core.
You can trigger a list snapshot command with an HTTP command like this (replace "techproducts" with the name of the core you are working with):
The list snapshot request parameters are:
core
- The name of the core to whose snapshots we want to list.
async
- Request ID to track this action which will be processed asynchronously.
Delete Snapshot API
The DELETESNAPSHOT
command deletes a snapshot for a particular core.
You can trigger a delete snapshot with an HTTP command like this (replace "techproducts" with the name of the core you are working with):
The delete snapshot request parameters are:
commitName
- Specify the commit name to be deleted.
core
- The name of the core whose snapshot we want to delete.
async
- Request ID to track this action which will be processed asynchronously.
Backup/Restore Storage Repositories
Solr provides a repository abstraction to allow users to backup and restore their data to a variety of different storage systems.
For example, a Solr cluster running on a local filesystem (e.g., EXT3) can store backup data on the same disk, on a remote network-mounted drive, in HDFS, or even in some popular "cloud storage" providers, depending on the 'repository' implementation chosen.
Solr offers three different repository implementations out of the box (LocalFileSystemRepository
, HdfsBackupRepository
, and GCSBackupRepository
), and allows users to create plugins for their own storage systems as needed.
Users can define any number of repositories in their solr.xml
file.
The backup and restore APIs described above allow users to select which of these definitions they want to use at runtime via the repository
parameter.
When no repository
parameter is specified, the local filesystem repository is used as a default.
Repositories are defined by a <repository>
tag nested under a <backup>
parent tag.
All <repository>
tags must have a name
attribute (defines the identifier that users can reference later to select this repository) and a class
attribute (containing the full Java classname that implements the repository).
They may also have a boolean default
attribute, which may be true
on at most one repository definition.
Any children under the <repository>
tag are passed as additional configuration to the repository, allowing repositories to read their own implementation-specific configuration.
Information on each of the repository implementations provided with Solr is provided below.
LocalFileSystemRepository
LocalFileSystemRepository stores and retrieves backup files anywhere on the accessible filesystem. Files can be stored on "local" disk, or on network-mounted drives that appear local to the filesystem.
SolrCloud administrators looking to use LocalFileSystemRepository in tandem with network drives should be careful to make the drive available at the same location on each Solr node. Strictly speaking, the mount only needs to be present on the node doing the backup (or restore), and on the node currently serving as the "Overseer". However since the "overseer" role often moves from node to node in a cluster, it is generally recommended that backup drives be added to all nodes uniformly. |
A LocalFileSystemRepository instance is used as a default by any backup and restore commands that don’t explicitly provide a repository
parameter or have a default specified in solr.xml
.
LocalFileSystemRepository accepts the following configuration options:
location
- A valid file path (accessible to Solr locally) to use for backup storage and retrieval. Used as a fallback when user’s don’t provide a
location
parameter in their Backup or Restore API commands
An example configuration using this property can be found below.
<backup>
<repository name="local_repo" class="org.apache.solr.core.backup.repository.LocalFileSytemRepository">
<str name="location">/solr/backup_data</str>
</repository>
</backup>
HdfsBackupRepository
Stores and retrieves backup files from HDFS directories.
HdfsBackupRepository is deprecated and may be removed or relocated in a subsequent version of Solr. |
HdfsBackupRepository accepts the following configuration options:
solr.hdfs.buffer.size
- The size, in bytes, of the buffer used to transfer data to and from HDFS. Defaults to 4096 (4KB). Better throughput is often attainable with a larger buffer, where memory allows.
solr.hdfs.home
- Required.
A HDFS URI in the format
hdfs://<host>:<port>/<hdfsBaseFilePath>
that points Solr to the HDFS cluster to store (or retrieve) backup files on. solr.hdfs.permissions.umask-mode
- A permission umask used when creating files in HDFS.
location
- A valid directory path on the HDFS cluster to use for backup storage and retrieval. Used as a fallback when users don’t provide a
location
parameter in their Backup or Restore API commands
An example configuration using these properties can be found below:
<backup>
<repository name="hdfs" class="org.apache.solr.core.backup.repository.HdfsBackupRepository" default="false">
<str name="solr.hdfs.home">hdfs://some_hdfs_host:1234/solr/backup/data</str>
<int name="solr.hdfs.buffer.size">8192</int>
<str name="solr.hdfs.permissions.umask-mode">0022</str>
<str name="location">/default/hdfs/backup/location</str>
</repository>
</backup>
GCSBackupRepository
Stores and retrieves backup files in a Google Cloud Storage ("GCS") bucket.
GCSBackupRepsoitory accepts the following options for overall configuration:
gcsBucket
- The GCS bucket to read and write all backup files to.
If not specified, GCSBackupRepository will use the value of the
GCS_BUCKET
environment variable. If both values are absent, the valuesolrBackupsBucket
will be used as a default. gcsCredentialPath
- A path on the local filesystem (accessible by Solr) to a Google Cloud service account key file.
If not specified, GCSBackupRepository will use the value of the
GCS_CREDENTIAL_PATH
environment variable. If both values are absent, an error will be thrown as GCS requires credentials for most usage. location
- A valid "directory" path in the given GCS bucket to us for backup strage and retrieval.
(GCS uses a flat storage model, but Solr’s backup functionality names blobs in a way that approximates hierarchical directory storage.)
Used as a fallback when user’s don’t provide a
location
parameter in their Backup or Restore API commands
In addition to these properties for overall configuration, GCSBackupRepository gives users detailed control over the client used to communicate with GCS. These properties are unlikely to interest most users, but may be valuable for those looking to micromanage performance or subject to a flaky network.
GCSBackupRepository accepts the following advanced client-configuration options:
gcsWriteBufferSizeBytes
- The buffer size, in bytes, to use when sending data to GCS.
16777216
bytes (i.e., 16 MB) is used by default if not specified. gcsReadBufferSizeBytes
- The buffer size, in bytes, to use when copying data from GCS.
2097152
bytes (i.e., 2 MB) is used by default if not specified. gcsClientHttpConnectTimeoutMillis
- The connection timeout, in milliseconds, for all HTTP requests made by the GCS client.
"0" may be used to request an infinite timeout.
A negative integer, or not specifying a value at all, will result in a value of
20000
(or 20 seconds). gcsClientHttpReadTimeoutMillis
- The read timeout, in milliseconds, for reading data on an established connection. "0" may be used to request an infinite timeout. A negative integer, or not specifying a value at all, will result in a value of 20000 (or 20 seconds).
gcsClientMaxRetries
- The maximum number of times to retry an operation upon failure.
The GCS client will retry operations until this value is reached, or the time spent across all attempts exceeds
gcsClientMaxRequestTimeoutMillis
. "0" may be used to specify that no retries should be done. If not specified, this value defaults to 10. gcsClientMaxRequestTimeoutMillis
- The maximum amount of time to spend on all retries of an operation that has failed.
The GCS client will retry operations until either this timeout has been reached, or until
gcsClientMaxRetries
attempts have failed. If not specified the value 300000 (5 minutes) is used by default. gcsClientHttpInitialRetryDelayMillis
- The time, in milliseconds, to delay before the first retry of a HTTP request that has failed.
This value also factors in to subsequent retries - see the
gcsClientHttpRetryDelayMultiplier
description below for more information. IfgcsClientMaxRetries
is 0, this property is ignored as no retries are attempted. If not specified the value 1000 (1 second) is used by default. gcsClientHttpRetryDelayMultiplier
A floating-point multiplier used to scale the delay between each successive retry of a failed HTTP request.. The greater this number is, the more quickly the retry delay compounds and scales.
Under the covers, the GSC client uses an exponential backoff strategy between retries, governed by the formula: \$gcsClientH\t\tpInitialRetryDelayMillis*(gcsClientH\t\tpRetryDelayM\u\l\tiplier)^(retryNum-1)\$. The first retry will have a delay of \$gcsClientH\t\tpInitialRetryDelayMillis\$, the second a delay of \$gcsClientH\t\tpInitialRetryDelayMillis * gcsClientH\t\tpRetryDelayM\u\l\tiplier\$, the third a delay of \$gcsClientH\t\tpInitialRetryDelayMillis * gcsClientH\t\tpRetryDelayM\u\l\tiplier^2\$, and so on.
If not specified the value 1.0 is used by default, ensuring that
gcsClientHttpInitialRetryDelayMillis
is used between each retry attempt.gcsClientHttpMaxRetryDelayMillis
- The maximum delay, in milliseconds, between retry attempts on a failed HTTP request.
This is commonly used to cap the exponential growth in retry-delay that occurs over multiple attempts.
See the
gcsClientHttpRetryDelayMultiplier
description above for more information on how each delay is calculated when not subject to this maximum. If not specified the value 30000 (30 seconds) is used by default. gcsClientRpcInitialTimeoutMillis
- The time, in milliseconds, to wait on a RPC request before timing out.
This value also factors in to subsequent retries - see the
gcsClientRpcTimeoutMultiplier
description below for more information. IfgcsClientMaxRetries
is 0, this property is ignored as no retries are attempted. If not specified the value 10000 (10 seconds) is used by default. gcsClientRpcTimeoutMultiplier
A floating-point multiplier used to scale the timeout on each successive attempt of a failed RPC request. The greater this number is, the more quickly the timeout compounds and scales.
Under the covers, the GSC client uses an exponential backoff strategy for RPC timeouts, governed by the formula: \$gcsClientRpcInitialTimeoutMillis*(gcsClientRpcTimeoutM\u\l\tiplier)^(retryNum-1)\$. The first retry will have a delay of \$gcsClientRpcInitialTimeoutMillis\$, the second a delay of \$gcsClientRpcInitialTimeoutMillis * gcsClientRpcTimeoutM\u\l\tiplier\$, the third a delay of \$gcsClientRpcInitialTimeoutMillis * gcsClientRpcTimeoutM\u\l\tiplier^2\$, and so on.
If not specified the value 1.0 is used by default, ensuring that
gcsClientRpcInitialTimeoutMillis
is used on each RPC attempt.gcsClientRpcMaxTimeoutMillis
- The maximum timeout, in milliseconds, for retry attempts of a failed RPC request.
This is commonly used to cap the exponential growth in timeout that occurs over multiple attempts.
See the
gcsClientRpcTimeoutMultiplier
description above for more information on how each timeout is calculated when not subject to this maximum. If not specified the value 30000 (30 seconds) is used by default.
An example configuration using the overall and GCS-client properties can be seen below:
<backup>
<repository name="gcs_backup" class="org.apache.solr.gcs.GCSBackupRepository" default="false">
<str name="gcsBucket">solrBackups</str>
<str name="gcsCredentialPath">/local/path/to/credential/file</str>
<str name="location">/default/gcs/backup/location</str>
<int name="gcsClientMaxRetries">5</int>
<int name="gcsClientHttpInitialHttpDelayMillis">1500</int>
<double name="gcsClientHttpRetryDelayMultiplier">1.5</double>
<int name="gcsClientMaxHttpRetryDelayMillis>10000</int>
</repository>
</backup>