Simulated environment for autoscaling.

Goals

Use the actual unchanged autoscaling code for cluster state monitoring and autoscaling plan execution.
Support testing large clusters (> 100 nodes).
Support fast testing using accelerated time (eg. 100x faster).
Support enough of other Solr functionality for the test results to be meaningful.

Simulated SolrCloudManager - `SimCloudManager`

This implementation of SolrCloudManager uses the following simulated components:

SimDistribStateManager - in-memory ZK look-alike, with support for Watcher-s, ephemeral and sequential nodes.
SimClusterStateProvider - manages collection, replica infos, states and replica metrics.
SimNodeStateProvider - manages node metrics.
GenericDistributedQueue - DistributedQueue that uses SimDistribStateManager.

SimCloudManager also maintains an up-to-date /live_nodes in SimDistribStateManager, provides a SolrClient instance for use in tests, and provides several convenience methods for setting up simulated clusters, populating node and replica metrics, collecting autoscaling-related event history, collecting autoscaling event statistics, etc. SimCloudManager runs actual OverseerTriggerThread so that it uses real trigger and trigger action implementations, as well as real event scheduling and processing code. It also provides methods for simulating Overseer leader change. An important part of the SimCloudManager is also a request handler that processes common autoscaling and collection admin requests. Autoscaling requests are processes by an instance of AutoScalingHandler (and result in changes in respective data stored in SimDistribStateManager). Collection admin commands are simulated, ie. they don't use actual CollectionsHandler due to the complex dependencies on real components.

`SimClusterStateProvider`

This components maintains collection and replica states:

Simulates delays between request and the actual cluster state changes
Marks replicas as down when a node goes down (optionally preserving the replica metrics in order to simulate a node coming back), and keeps track of per-node cores and disk space.
Runs a shard leader election look-alike on collection state updates.
Maintains up-to-date /clusterstate.json and /clusterprops.json in SimDistribStateManager (which in turn notifies Watcher-s about collection updates). Currently for simplicity it uses the old single /clusterstate.json format for representing ClusterState.

`SimNodeStateProvider`

This component maintains node metrics. When a simulated cluster is set up using eg. SimCloudManager.createCluster(int, org.apache.solr.common.util.TimeSource) method, each simulated node is initialized with some basic metrics that are expected by the autoscaling framework, such as node name, fake system load average, heap usage and disk usage. The number of cores and disk space metrics may be used in autoscaling calculations, so they are tracked and adjusted by SimClusterStateProvider according to the currently active replicas located on each node.

Limitations of the simulation framework

Currently the simulation framework is limited to testing the core autoscaling API in a single JVM. Using it for other purposes would require extensive modifications in Solr and in the framework code. Specifically, the framework supports testing the following autoscaling components:

OverseerTriggerThread and components that it uses.
Autoscaling config, triggers, trigger listeners, ScheduledTriggers, trigger event queues, ComputePlanAction / ExecutePlanAction, etc.

Overseer and CollectionsHandler Cmd implementations are NOT used, so cannot be properly tested - some of their functionality is simulated. Other SolrCloud components make too many direct references to ZkStateReader, or direct HTTP requests, or rely on too many other components and require much more complex functionality - they may be refactored later but the effort may be too high. Simulation framework definitely does not support the following functionality:

Solr searching and indexing
Any component that uses ZkController (eg. CoreContainer)
Any component that uses ShardHandler (eg. CollectionsHandler Cmd-s)

Interface Summary
Interface Description

ActionError
Interface that helps simulating action errors.

Interface Summary
Interface	Description
ActionError	Interface that helps simulating action errors.

Class Summary
Class	Description
GenericDistributedQueue	A distributed queue that uses `DistribStateManager` as the underlying distributed store.
GenericDistributedQueueFactory	Factory for `GenericDistributedQueue`.
LiveNodesSet	This class represents a set of live nodes and allows adding listeners to track their state.
SimCloudManager	Simulated `SolrCloudManager`.
SimClusterStateProvider	Simulated `ClusterStateProvider`.
SimDistribStateManager	Simulated `DistribStateManager` that keeps all data locally in a static structure.
SimDistribStateManager.Node
SimDistributedQueueFactory	Simulated `DistributedQueueFactory` that keeps all data in memory.
SimDistributedQueueFactory.SimDistributedQueue
SimNodeStateProvider	Simulated `NodeStateProvider`.

Package org.apache.solr.cloud.autoscaling.sim

Simulated environment for autoscaling.

Goals

Simulated SolrCloudManager - SimCloudManager

SimClusterStateProvider

SimNodeStateProvider

Limitations of the simulation framework

Simulated SolrCloudManager - `SimCloudManager`

`SimClusterStateProvider`

`SimNodeStateProvider`