Circuit Breakers
Solr’s circuit breaker infrastructure allows prevention of actions that can cause a node to go beyond its capacity or to go down. The premise of circuit breakers is to ensure a higher quality of service and only accept request loads that are serviceable in the current resource configuration.
When To Use Circuit Breakers
Circuit breakers should be used when the user wishes to trade request throughput for a higher Solr stability. If circuit breakers are enabled, requests may be rejected under the condition of high node duress with HTTP error code 429 'Too Many Requests'.
It is up to the client to handle this error and potentially build retry logic as this should be a transient situation.
In a request to a sharded collection, the circuit breaker is only checked on the node handling the initial request, not for inter-node requests. It is therefore recommended to load balance client requests across Solr nodes to avoid hotspots.
Circuit Breaker Configurations
All circuit breaker configurations are listed as independent <circuitBreaker>
entries in solrconfig.xml
as shown below.
A circuit breaker can register itself to trip for query requests and/or update requests. By default only search requests are affected. A user may register multiple circuit breakers of the same type with different thresholds for each request type.
Currently Supported Circuit Breakers
The legacy configuration syntax using |
JVM Heap Usage
This circuit breaker tracks JVM heap memory usage and rejects incoming requests with a 429 error code if the heap usage exceeds a configured percentage of maximum heap allocated to the JVM (-Xmx). The main configuration for this circuit breaker is controlling the threshold percentage at which the breaker will trip.
To enable and configure the JVM heap usage based circuit breaker, add the following:
<circuitBreaker class="org.apache.solr.util.circuitbreaker.MemoryCircuitBreaker">
<double name="threshold">75</double>
</circuitBreaker>
The threshold
is defined as a percentage of the max heap allocated to the JVM.
For the circuit breaker configuration, a value of "0" maps to 0% usage and a value of "100" maps to 100% usage.
It does not logically make sense to have a threshold below 50% or above 95% of the max heap allocated to the JVM. Hence, the range of valid values for this parameter is [50, 95], both inclusive.
Consider the following example:
JVM has been allocated a maximum heap of 5GB (-Xmx) and threshold
is set to 75
.
In this scenario, the heap usage at which the circuit breaker will trip is 3.75GB.
System CPU Usage Circuit Breaker
This circuit breaker tracks system CPU usage and triggers if the recent CPU usage exceeds a configurable threshold.
This is tracked with the JMX metric OperatingSystemMXBean.getSystemCpuLoad()
. That measures the
recent CPU usage for the whole system. This metric is provided by the com.sun.management
package,
which is not implemented on all JVMs. If the metric is not available, the circuit breaker will be
disabled and log an error message. An alternative can then be to use the System Load Average Circuit Breaker.
To enable and configure the CPU utilization based circuit breaker:
<circuitBreaker class="org.apache.solr.util.circuitbreaker.CPUCircuitBreaker">
<double name="threshold">75</double>
</circuitBreaker>
The triggering threshold is defined in percent CPU usage. A value of "0" maps to 0% usage and a value of "100" maps to 100% usage. The example above will trip when the CPU usage is equal to or greater than 75%.
System Load Average Circuit Breaker
This circuit breaker tracks system load average and triggers if the recent load average exceeds a configurable threshold.
This is tracked with the JMX metric OperatingSystemMXBean.getSystemLoadAverage()
. That measures the
recent load average for the whole system. A "load average" is the number of processes using or waiting for a CPU,
usually averaged over one minute. Some systems include processes waiting on IO in the load average. Check the
documentation for your system and JVM to understand this metric. For more information, see the
Wikipedia page for Load,
To enable and configure the CPU utilization based circuit breaker:
<circuitBreaker class="org.apache.solr.util.circuitbreaker.LoadAverageCircuitBreaker">
<double name="threshold">8.0</double>
</circuitBreaker>
The triggering threshold is a floating point number matching load average. The example circuit breaker above will trip when the load average is equal to or greater than 8.0.
Advanced example
In this example we will prevent update requests above 80% CPU load, and prevent query requests above 95% CPU load. Supported request types are query
and update
.
This would prevent expensive bulk updates from impacting search. Note also the support for short-form class name.
<config>
<circuitBreaker class="solr.CPUCircuitBreaker">
<double name="threshold">80</double>
<arr name="requestTypes">
<str>update</str>
</arr>
</circuitBreaker>
<circuitBreaker class="solr.CPUCircuitBreaker">
<double name="threshold">95</double>
<arr name="requestTypes">
<str>query</str>
</arr>
</circuitBreaker>
</config>
Performance Considerations
While JVM or CPU circuit breakers do not add any noticeable overhead per request, having too many circuit breakers checked for a single request can cause a performance overhead.
In addition, it is a good practice to exponentially back off while retrying requests on a busy node. See the Wikipedia page for Exponential Backoff.