SolrCloud Autoscaling Fault Tolerance
The autoscaling framework uses a few strategies to ensure it’s able to still trigger actions in the event of unexpected changes to the system.
Node Added or Lost Markers
Since triggers execute on the node that runs the Overseer, should the Overseer node go down the
event would be lost because there would be no mechanism to generate it. Similarly, if a node has
been added before the Overseer leader change was completed, the
nodeAdded event would not be
For this reason Solr implements additional mechanisms to ensure that these events are generated reliably.
With standard SolrCloud behavior, when a node joins a cluster its presence is marked as an ephemeral ZooKeeper path in the
/live_nodes/<nodeName> ZooKeeper directory. Now an ephemeral path is also created under
When a new instance of Overseer leader is started it will run the
nodeAdded trigger (if it’s configured)
and discover the presence of this ZooKeeper path, at which point it will remove it and generate a
When a node leaves the cluster, up to three remaining nodes will try to create a persistent ZooKeeper path
/autoscaling/nodeLost/<nodeName> and eventually one of them succeeds. When a new instance of Overseer leader
is started it will run the
nodeLost trigger (if it’s configured) and discover the presence of this ZooKeeper
path, at which point it will remove it and generate a
Trigger State Checkpointing
Triggers generate events based on their internal state. If the Overseer leader goes down while the trigger is about to generate a new event, it’s likely that the event would be lost because a new trigger instance running on the new Overseer leader would start from a clean slate.
For this reason, after each time a trigger is executed its internal state is persisted to ZooKeeper, and on Overseer start its internal state is restored.
Trigger Event Queues
Autoscaling framework limits the rate at which events are processed using several different mechanisms. One is the locking mechanism that prevents concurrent processing of events, and another is a single-threaded executor that runs trigger actions.
This means that the processing of an event may take significant time, and during this time it’s possible that the Overseer may go down. In order to avoid losing events that were already generated but not yet fully processed, events are queued before processing is started.
Separate ZooKeeper queues are created for each trigger, and events produced by triggers are put on these per-trigger queues. When a new Overseer leader is started it will first check these queues and process events accumulated there, and only then it will continue to run triggers normally. Queued events that fail processing during this "replay" stage are discarded.