Cross Data Center Replication (CDCR)
Cross Data Center Replication (CDCR) allows you to create multiple SolrCloud data centers and keep them in sync.
CDCR is deprecated
This feature (in its current form) is deprecated and will be removed in 9.0. Anyone currently using CDCR should consider migrating away from it. There are several open issues which make CDCR complex to maintain and generally unstable. The simplest alternative to CDCR is to manually maintain two Solr implementations in separate data centers and manage them as completely separate installations. While this may require more initial setup, it will be more stable in the long run. There are other alternatives, and the Solr community is working to identify the best recommended replacement in time for 9.0. |
What is CDCR?
The SolrCloud architecture is designed to support Near Real Time (NRT) searches on a Solr collection that usual consists of multiple nodes in a single data center. CDCR augments this model by forwarding updates from a Solr collection in one data center to a parallel Solr collection in another data center where the network latencies are greater than the SolrCloud model was designed to accommodate.
For more information about CDCR, see the following sections:
CDCR Architecture: A detailed overview of how CDCR works.
CDCR Configuration: How to set up and initialize CDCR for your cluster.
CDCR Operations: Information on monitoring CDCR and upgrading your cluster when using CDCR.
CDCR API: Reference for the CDCR API.
CDCR Glossary
For the purposes of discussing CDCR, the following terminology is used. If you are already familiar with SolrCloud, many of these terms will already be familiar to you.
- Node
- A JVM instance running Solr; a server.
- Cluster
- A set of Solr nodes managed as a single unit by a ZooKeeper ensemble hosting one or more Collections.
- Data Center
- A group of networked servers hosting a Solr cluster. For CDCR, the terms Cluster and Data Center are interchangeable as we assume that each Solr cluster is hosted in a different group of networked servers.
- Shard
- A sub-index of a single logical collection. This may be spread across multiple nodes of the cluster. Each shard can have 1-N replicas.
- Leader
- Each shard has replica identified as its leader. All the writes for documents belonging to a shard are routed through the leader.
- Replica
- A copy of a shard for use in failover or load balancing. Replicas comprising a shard can either be leaders or non-leaders.
- Follower
- A convenience term for a replica that is not the leader of a shard.
- Collection
- A logical index, consisting of one or more shards. A cluster can have multiple collections.
- Update
- An operation that changes the collection’s index in any way. This could be adding a new document, deleting documents or changing a document.
- Update Log(s)
- An append-only log of write operations maintained by each node.