The Source and Target configurations differ in the case of the data centers being in separate clusters. "Cluster" here means separate ZooKeeper ensembles controlling disjoint Solr instances. Whether these data centers are physically separated or not is immaterial for this discussion.
As described in the section CDCR Architecture, two approaches are supported: uni-directional updates and bi-directional updates.
All CDCR configuration is done in the solrconfig.xml
file. Because this is a per-collection configuration file, all CDCR configuration is done for each collection.
Uni-Directional Updates
Source Configuration
Here is a sample of a Source configuration file, a section in solrconfig.xml
. The presence of the <replica>
section causes CDCR to use this cluster as the Source and it should not be present in the Target collections. Details about each setting are after the two examples. The source example has buffering disabled, the default is enabled:
<requestHandler name="/cdcr" class="solr.CdcrRequestHandler">
<lst name="replica">
<str name="zkHost">10.240.18.211:2181,10.240.18.212:2181</str>
<!--
If you have chrooted your Solr information at the target you must include the chroot, for example:
<str name="zkHost">10.240.18.211:2181,10.240.18.212:2181/solr</str>
-->
<str name="source">collection1</str>
<str name="target">collection1</str>
</lst>
<lst name="replicator">
<str name="threadPoolSize">8</str>
<str name="schedule">1000</str>
<str name="batchSize">128</str>
</lst>
<lst name="updateLogSynchronizer">
<str name="schedule">1000</str>
</lst>
</requestHandler>
<!-- Modify the <updateLog> section of your existing <updateHandler>
in your config as below -->
<updateHandler class="solr.DirectUpdateHandler2">
<updateLog class="solr.CdcrUpdateLog">
<str name="dir">${solr.ulog.dir:}</str>
<!--Any parameters from the original <updateLog> section -->
</updateLog>
<!-- Other configuration options such as autoCommit should still be present -->
</updateHandler>
Target Configuration
Here is a typical Target configuration.
Target instance must configure an update processor chain that is specific to CDCR. The update processor chain must include the CdcrUpdateProcessorFactory
. The task of this processor is to ensure that the version numbers attached to update requests coming from a CDCR Source SolrCloud are reused and not overwritten by the Target. A properly configured Target configuration looks similar to this:
<requestHandler name="/cdcr" class="solr.CdcrRequestHandler">
<!-- recommended for Target clusters -->
<lst name="buffer">
<str name="defaultState">disabled</str>
</lst>
</requestHandler>
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="update.chain">cdcr-processor-chain</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name="cdcr-processor-chain">
<processor class="solr.CdcrUpdateProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
<!-- Modify the <updateLog> section of your existing <updateHandler> in your
config as below -->
<updateHandler class="solr.DirectUpdateHandler2">
<updateLog class="solr.CdcrUpdateLog">
<str name="dir">${solr.ulog.dir:}</str>
<!--Any parameters from the original <updateLog> section -->
</updateLog>
<!-- Other configuration options such as autoCommit should still be present -->
</updateHandler>
Bi-Directional Updates
The configurations in both Cluster 1 and 2 are identical with respective zkHost
string specified in each cluster’s solrconfig.xml
.
Both Cluster 1 and Cluster 2 can act as Source and Target at any given point of time but a cluster cannot be both Source and Target at the same time. |
Cluster 1 Configuration
Here is a sample of a Cluster 1 configuration file, a section in solrconfig.xml
. Cluster 2 zkhost
string is specified in a CdcrRequestHandler
declaration:
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="update.chain">cdcr-processor-chain</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name="cdcr-processor-chain">
<processor class="solr.CdcrUpdateProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
<requestHandler name="/cdcr" class="solr.CdcrRequestHandler">
<lst name="replica">
<str name="zkHost">10.240.19.241:2181,10.240.19.242:2181</str>
<!--
If you have chrooted your Solr information at the target you must include the chroot, for example:
<str name="zkHost">10.240.19.241:2181,10.240.19.242:2181/solr</str>
-->
<str name="source">collection1</str>
<str name="target">collection1</str>
</lst>
<lst name="replicator">
<str name="threadPoolSize">8</str>
<str name="schedule">1000</str>
<str name="batchSize">128</str>
</lst>
<lst name="updateLogSynchronizer">
<str name="schedule">1000</str>
</requestHandler>
<!-- Modify the <updateLog> section of your existing <updateHandler>
in your config as below -->
<updateHandler class="solr.DirectUpdateHandler2">
<updateLog class="solr.CdcrUpdateLog">
<str name="dir">${solr.ulog.dir:}</str>
<!--Any parameters from the original <updateLog> section -->
</updateLog>
</updateHandler>
Cluster 2 Configuration
The configuration of the 2nd cluster is identical to the configuration of Cluster 1, with the Cluster 1 zkHost
string specified in CdcrRequestHandler
definition:
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="update.chain">cdcr-processor-chain</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name="cdcr-processor-chain">
<processor class="solr.CdcrUpdateProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
<requestHandler name="/cdcr" class="solr.CdcrRequestHandler">
<lst name="replica">
<str name="zkHost">10.250.18.211:2181,10.250.18.212:2181</str>
<!--
If you have chrooted your Solr information at the target you must include the chroot, for example:
<str name="zkHost">10.250.18.211:2181,10.250.18.212:2181/solr</str>
-->
<str name="source">collection1</str>
<str name="target">collection1</str>
</lst>
<lst name="replicator">
<str name="threadPoolSize">8</str>
<str name="schedule">1000</str>
<str name="batchSize">128</str>
</lst>
<lst name="updateLogSynchronizer">
<str name="schedule">1000</str>
</lst>
</requestHandler>
<!-- Modify the <updateLog> section of your existing <updateHandler>
in your config as below -->
<updateHandler class="solr.DirectUpdateHandler2">
<updateLog class="solr.CdcrUpdateLog">
<str name="dir">${solr.ulog.dir:}</str>
<!--Any parameters from the original <updateLog> section -->
</updateLog>
</updateHandler>
CDCR Configuration Parameters
The configuration details, defaults and options are as follows:
The Replica Element
CDCR can be configured to forward update requests to one or more Target collections. A Target collection is defined with a “replica” list as follows:
zkHost
-
The host address for ZooKeeper of the Target SolrCloud. Usually this is a comma-separated list of addresses to each node in the Target ZooKeeper ensemble. This parameter is required.
Source
-
The name of the collection on the Source SolrCloud to be replicated. This parameter is required.
Target
-
The name of the collection on the Target SolrCloud to which updates will be forwarded. This parameter is required.
The Replicator Element
The CDC Replicator is the component in charge of forwarding updates to the replicas. The replicator will monitor the update logs of the Source collection and will forward any new updates to the Target collection.
The replicator uses a fixed thread pool to forward updates to multiple replicas in parallel. If more than one replica is configured, one thread will forward a batch of updates from one replica at a time in a round-robin fashion. The replicator can be configured with a “replicator” list as follows:
threadPoolSize
-
The number of threads to use for forwarding updates. One thread per replica is recommended. The default is
2
. schedule
-
The delay in milliseconds for the monitoring the update log(s). The default is
10
. batchSize
-
The number of updates to send in one batch. The optimal size depends on the size of the documents. Large batches of large documents can increase your memory usage significantly. The default is
128
.
The updateLogSynchronizer Element
Expert: Non-leader nodes need to synchronize their update logs with their leader node from time to time in order to clean deprecated transaction log files. By default, such a synchronization process is performed every minute. The schedule of the synchronization can be modified with a “updateLogSynchronizer” list as follows:
If the updateLogSynchronizer element is omitted from the Source cluster, transaction logs may accumulate on non-leaders. |
schedule
-
The delay in milliseconds for synchronizing the update logs. The default is
60000
.
The Buffer Element
When buffering updates, the update logs will store all the updates indefinitely. It is best to disable buffering on both the Source and Target clusters during normal operation as when buffering is enabled the Update Logs will grow without limit. Enbling buffering is intended for special maintenance periods. Buffering can be disabled at startup with a “buffer” list and the parameter “defaultState” as follows:
defaultState
-
The state of the buffer at startup. The default is
enabled
.
Buffering should be enabled only for maintenance windows
Buffering is designed to augment maintenance windows. The following points should be kept in mind:
|
Initial Startup
Uni-Directional Approach
This is a general approach for initializing CDCR in a production environment. It’s based upon an approach taken by the initial working installation of CDCR and generously contributed to illustrate a "real world" scenario.
-
CDCR is used to keep a remote disaster-recovery instance available for production backup.
-
This example as 26 clouds with 200 million assets per cloud (15GB indexes). Total document count is over 4.8 billion.
-
Source and Target clouds were synched in 2-3 hour maintenance windows to establish the base index for the Targets.
-
As usual, it is good to start small. Sync a single cloud and monitor for a period of time before doing the others. You may need to adjust your settings several times before finding the right balance.
-
Before starting, stop or pause the indexers. This is best done during a small maintenance window.
-
Stop the SolrCloud instances at the Source.
-
Upload the modified
solrconfig.xml
to ZooKeeper on both Source and Target as appropriate, see the examples above. -
Sync the index directories from the Source collection to Target collection across to the corresponding shard nodes.
rsync
works well for this.For example, if there are two shards on collection1 with 2 replicas for each shard, copy the corresponding index directories from:
shard1replica1Source
to
shard1replica1Target
shard1replica2Source
to
shard1replica2Target
shard2replica1Source
to
shard2replica1Target
shard2replica2Source
to
shard2replica2Target
-
Start ZooKeeper on the Target (DR).
-
Start SolrCloud on the Target (DR).
-
Start ZooKeeper on the Source.
-
Start SolrCloud on the Source. As a general rule, the Target (DR) should be started before the Source.
-
Activate CDCR on Source instance using the CDCR API:
http://host:port/solr/<collection_name>/cdcr?action=START
There is no need to run the
/cdcr?action=START
command on the Target. -
Disable the buffer on the Target and Source:
http://host:port/solr/collection_name/cdcr?action=DISABLEBUFFER
-
Re-enable indexing.
Bi-Directional Approach
When using the bi-directional approach, it is highly recommended to enable CDCR on both cluster-collections before any indexing has taken place. |
Based on the same example from uni-directional solution, let’s walk through the necessary steps:
-
Before you begin, stop or pause any indexing processes. This is best done during a small maintenance window.
-
Stop the SolrCloud instances in both Cluster 1 and Cluster 2.
-
Upload the modified
solrconfig.xml
to ZooKeeper on both Cluster 1 and Cluster 2 as appropriate, see the examples above in the section Bi-Directional Updates. -
If documents were indexed prior to this exercise, sync the index directories from the Cluster 1 collection to the Cluster 2 collection to the corresponding shard nodes or vice versa. The
rsync
utility works well for this if it’s available on your server. Check to be sure the the updated index is copied across.For example, if there are 2 shards on collection 'cluster1' (the updated collection) with 2 replicas for each shard, copy the corresponding index directories from:
shard1replica1cluster1
to
shard1replica1cluster2
shard1replica2cluster1
to
shard1replica2cluster2
shard2replica1cluster1
to
shard2replica1cluster2
shard2replica2cluster1
to
shard2replica2cluster2
-
Start ZooKeeper on Cluster 1.
-
Start ZooKeeper on Cluster 2.
-
Start SolrCloud on Cluster 1.
-
Start SolrCloud on Cluster 2.
-
If not present, create respective collections in both Cluster 1 and Cluster 2.
-
Activate the CDCR on Cluster 1 and Cluster 2 instance using the CDCR API:
http://host:port/solr/<collection_name>/cdcr?action=START
-
Disable the buffer on Cluster 1 and Cluster 2:
http://host:port/solr/collection_name/cdcr?action=DISABLEBUFFER
-
Re-enable indexing.
ZooKeeper Settings
With CDCR, the Target ZooKeepers will have connections from the Target clouds and the Source clouds. You may need to increase the maxClientCnxns
setting in zoo.cfg
.
## set numbers of connection to 800 from client
## is maxClientCnxns=0 that means no limit
maxClientCnxns=800