Combining Distribution and Replication
When your index is too large for a single machine and you have a query volume that single shards cannot keep up with, it’s time to replicate each shard in your distributed search setup.
The idea is to combine distributed search with replication. As shown in the figure below, a combined distributed-replication configuration features a leader server for each shard and then 1-n followers that are replicated from the leader. As in a standard replicated configuration, the leader server handles updates and optimizations without adversely affecting query handling performance.
Query requests should be load balanced across each of the shard followers. This gives you both increased query handling capacity and fail-over backup if a server goes down.
None of the leader shards in this configuration know about each other. You index to each leader, the index is replicated to each follower, and then searches are distributed across the followers, using one follower from each leader/follower shard.
For high availability you can use a load balancer to set up a virtual IP for each shard’s set of followers. If you are new to load balancing, HAProxy (http://haproxy.1wt.eu/) is a good open source software load-balancer. If a follower server goes down, a good load-balancer will detect the failure using some technique (generally a heartbeat system), and forward all requests to the remaining live followers that served with the failed follower. A single virtual IP should then be set up so that requests can hit a single IP, and get load balanced to each of the virtual IPs for the search followers.
With this configuration you will have a fully load balanced, search-side fault-tolerant system (Solr does not yet support fault-tolerant indexing). Incoming searches will be handed off to one of the functioning followers, then the follower will distribute the search request across a follower for each of the shards in your configuration. The follower will issue a request to each of the virtual IPs for each shard, and the load balancer will choose one of the available followers. Finally, the results will be combined into a single results set and returned. If any of the followers go down, they will be taken out of rotation and the remaining followers will be used. If a shard leader goes down, searches can still be served from the followers until you have corrected the problem and put the leader back into production.