SolrCloud Autoscaling Trigger Actions

Autoscaling is deprecated

The autoscaling framework in its current form is deprecated and will be removed in Solr 9.0.

A new design for this feature is currently under development in SOLR-14613 with a goal for release with Solr 9.0.

TriggerAction implementations process events generated by triggers in order to ensure the cluster’s health and good use of resources.

Currently two implementations are provided: ComputePlanAction and ExecutePlanAction.

Compute Plan Action

The ComputePlanAction uses the policy and preferences to calculate the optimal set of Collection API commands which can re-balance the cluster in response to trigger events.

The following parameters are configurable:

collections

A comma-separated list of collection names, or a selector on collection properties that can be used to filter collections for which the plan is computed.

If a non-empty list or selector is specified then the computed operations will only calculate collection operations that affect matched collections and ignore any other collection operations for collections not listed here. This does not affect non-collection operations.

A collection selector is of the form collections: {key1: value1, key2: value2, …​} where the key can be any collection property such as name, policy, numShards, etc. The value must match exactly and all specified properties must match for a collection to match.

A collection selector is useful in a cluster where collections are added and removed frequently and where selecting only collections that use a specific autoscaling policy is useful.

Example configurations:

{
 "set-trigger" : {
  "name" : "node_added_trigger",
  "event" : "nodeAdded",
  "waitFor" : "1s",
  "enabled" : true,
  "actions" : [
   {
    "name" : "compute_plan",
    "class" : "solr.ComputePlanAction",
    "collections" : "test1,test2"
   },
   {
    "name" : "execute_plan",
    "class" : "solr.ExecutePlanAction"
   }
  ]
 }
}

In this example only collections test1 and test2 will be potentially replicated / moved to an added node, other collections will be ignored even if they cause policy violations.

{
 "set-trigger" : {
  "name" : "node_added_trigger",
  "event" : "nodeAdded",
  "waitFor" : "1s",
  "enabled" : true,
  "actions" : [
   {
    "name" : "compute_plan",
    "class" : "solr.ComputePlanAction",
    "collections" : {"policy": "my_policy"}
   },
   {
    "name" : "execute_plan",
    "class" : "solr.ExecutePlanAction"
   }
  ]
 }
}

In this example only collections which use the my_policy as their autoscaling policy will be potentially replicated / moved to an added node, other collections will be ignored even if they cause policy violations.

{
 "set-trigger" : {
  "name" : "node_added_trigger",
  "event" : "nodeAdded",
  "waitFor" : "1s",
  "enabled" : true,
  "actions" : [
   {
    "name" : "compute_plan",
    "class" : "solr.ComputePlanAction",
    "collections" : {"policy": "my_policy", "numShards" :  "4"}
   },
   {
    "name" : "execute_plan",
    "class" : "solr.ExecutePlanAction"
   }
  ]
 }
}

In this example only collections which use the my_policy as their autoscaling policy and that have numShards equal to 4 will be potentially replicated / moved to an added node, other collections will be ignored even if they cause policy violations.

Execute Plan Action

The ExecutePlanAction executes the Collection API commands emitted by the ComputePlanAction against the cluster using SolrJ. It executes the commands serially, waiting for each of them to succeed before continuing with the next one.

Currently, it has the following configurable parameters:

taskTimeoutSeconds
Default value of this parameter is 120 seconds. This value defines how long the action will wait for a command to complete its execution. If a timeout is reached while the command is still running then the command status is provisionally considered a success but a warning is logged, unless taskTimeoutFail is set to true.
taskTimeoutFail
Boolean with a default value of false. If this value is true then a timeout in command processing will be marked as failure and an exception will be thrown.

If the Overseer node fails while ExecutePlanAction is running, then the new Overseer node will run the chain of actions for the same event again after waiting for any running Collection API operations belonging to the event to complete.

Please see SolrCloud Autoscaling Fault Tolerance for more details on fault tolerance within the autoscaling framework.