Simulated environment for autoscaling.
- Use the actual unchanged autoscaling code for cluster state monitoring and autoscaling plan execution.
- Support testing large clusters (> 100 nodes).
- Support fast testing using accelerated time (eg. 100x faster).
- Support enough of other Solr functionality for the test results to be meaningful.
This implementation of
uses the following simulated components:
SimCloudManager also maintains an up-to-date /live_nodes in SimDistribStateManager, provides a SolrClient instance for use in tests,
and provides several convenience methods for setting up simulated clusters, populating node and replica metrics, collecting
autoscaling-related event history, collecting autoscaling event statistics, etc.
SimCloudManager runs actual
so that it
uses real trigger and trigger action implementations, as well as real event scheduling and processing code.
It also provides methods for simulating Overseer leader change.
An important part of the SimCloudManager is also a request handler that processes common autoscaling
and collection admin requests. Autoscaling requests are processes by an instance of
(and result in changes in respective
data stored in
admin commands are simulated, ie. they don't use actual
due to the complex dependencies on real components.
This components maintains collection and replica states:
- Simulates delays between request and the actual cluster state changes
- Marks replicas as down when a node goes down (optionally preserving the replica metrics in order to simulate a node coming back), and keeps track of per-node cores and disk space.
- Runs a shard leader election look-alike on collection state updates.
- Maintains up-to-date /clusterstate.json and /clusterprops.json in SimDistribStateManager (which in turn notifies Watcher-s about collection updates).
Currently for simplicity it uses the old single /clusterstate.json format for representing ClusterState.
This component maintains node metrics. When a simulated cluster is set up using eg.
method, each simulated node is initialized with some basic metrics that are expected by the autoscaling
framework, such as node name, fake system load average, heap usage and disk usage.
The number of cores and disk space metrics may be used in autoscaling calculations, so they are
tracked and adjusted by
to the currently active replicas located on each node.
Limitations of the simulation framework
Currently the simulation framework is limited to testing the core autoscaling API in a single JVM.
Using it for other purposes would require extensive modifications in Solr and in the framework code.
Specifically, the framework supports testing the following autoscaling components:
- OverseerTriggerThread and components that it uses.
- Autoscaling config, triggers, trigger listeners, ScheduledTriggers, trigger event queues, ComputePlanAction / ExecutePlanAction, etc.
Overseer and CollectionsHandler Cmd implementations are NOT used, so cannot be properly tested - some of their functionality is simulated.
Other SolrCloud components make too many direct references to ZkStateReader, or direct HTTP requests, or rely on too many other components and require much more complex functionality - they may be refactored later but the effort may be too high.
Simulation framework definitely does not support the following functionality:
- Solr searching and indexing
- Any component that uses ZkController (eg. CoreContainer)
- Any component that uses ShardHandler (eg. CollectionsHandler Cmd-s)