Rule-based Replica Placement
When Solr needs to assign nodes to collections, it can either automatically assign them randomly or the user can specify a set of nodes where it should create the replicas.
With very large clusters, it is hard to specify exact node names and it still does not give you fine grained control over how nodes are chosen for a shard. The user should be in complete control of where the nodes are allocated for each collection, shard and replica. This helps to optimally allocate hardware resources across the cluster.
Rule-based replica assignment allows the creation of rules to determine the placement of replicas in the cluster. In the future, this feature will help to automatically add or remove replicas when systems go down, or when higher throughput is required. This enables a more hands-off approach to administration of the cluster.
This feature is used in the following instances:
- Collection creation
- Shard creation
- Replica creation
- Shard splitting
Common Use Cases
There are several situations where this functionality may be used. A few of the rules that could be implemented are listed below:
- Don’t assign more than 1 replica of this collection to a host.
- Assign all replicas to nodes with more than 100GB of free disk space or, assign replicas where there is more disk space.
- Do not assign any replica on a given host because I want to run an overseer there.
- Assign only one replica of a shard in a rack.
- Assign replica in nodes hosting less than 5 cores.
- Assign replicas in nodes hosting the least number of cores.
Rule Conditions
A rule is a set of conditions that a node must satisfy before a replica core can be created there.
There are three possible conditions.
- shard: this is the name of a shard or a wild card (* means for all shards). If shard is not specified, then the rule applies to the entire collection.
- replica: this can be a number or a wild-card (* means any number zero to infinity).
- tag: this is an attribute of a node in the cluster that can be used in a rule, e.g., “freedisk”, “cores”, “rack”, “dc”, etc. The tag name can be a custom string. If creating a custom tag, a snitch is responsible for providing tags and values. The section Snitches below describes how to add a custom tag, and defines six pre-defined tags (cores, freedisk, host, port, node, and sysprop).
Rule Operators
A condition can have one of the following operators to set the parameters for the rule.
- equals (no operator required):
tag:x
means tag value must be equal to ‘x’ - greater than (>):
tag:>x
means tag value greater than ‘x’. x must be a number - less than (<):
tag:<x
means tag value less than ‘x’. x must be a number - not equal (!):
tag:!x
means tag value MUST NOT be equal to ‘x’. The equals check is performed on String value
Fuzzy Operator (~)
This can be used as a suffix to any condition. This would first try to satisfy the rule strictly. If Solr can’t find enough nodes to match the criterion, it tries to find the next best match which may not satisfy the criterion. For example, if we have a rule such as, freedisk:>200~
, Solr will try to assign replicas of this collection on nodes with more than 200GB of free disk space. If that is not possible, the node which has the most free disk space will be chosen instead.
Choosing Among Equals
The nodes are sorted first and the rules are used to sort them. This ensures that even if many nodes match the rules, the best nodes are picked up for node assignment. For example, if there is a rule such as freedisk:>20
, nodes are sorted first on disk space descending and the node with the most disk space is picked up first. Or, if the rule is cores:<5
, nodes are sorted with number of cores ascending and the node with the least number of cores is picked up first.
Rules for New Shards
The rules are persisted along with collection state. So, when a new replica is created, the system will assign replicas satisfying the rules. When a new shard is created as a result of using the Collection API’s CREATESHARD command, ensure that you have created rules specific for that shard name. Rules can be altered using the MODIFYCOLLECTION command. However, it is not required to do so if the rules do not specify explicit shard names. For example, a rule such as shard:shard1,replica:*,ip_3:168:
, will not apply to any new shard created. But, if your rule is replica:*,ip_3:168
, then it will apply to any new shard created.
The same is applicable to shard splitting. Shard splitting is treated exactly the same way as shard creation. Even though shard1_1
and shard1_2
may be created from shard1
, the rules treat them as distinct, unrelated shards.
Snitches
Tag values come from a plugin called Snitch. If there is a tag named ‘rack’ in a rule, there must be Snitch which provides the value for ‘rack’ for each node in the cluster. A snitch implements the Snitch interface. Solr, by default, provides a default snitch which provides the following tags:
- cores: Number of cores in the node
- freedisk: Disk space available in the node
- host: host name of the node
- port: port of the node
- node: node name
- role: The role of the node. The only supported role is 'overseer'
- ip_1, ip_2, ip_3, ip_4: These are ip fragments for each node. For example, in a host with ip
192.168.1.2
,ip_1 = 2
,ip_2 =1
,ip_3 = 168
and` ip_4 = 192` - sysprop.{PROPERTY_NAME}: These are values available from system properties.
sysprop.key
means a value that is passed to the node as-Dkey=keyValue
during the node startup. It is possible to use rules likesysprop.key:expectedVal,shard:*
How Snitches are Configured
It is possible to use one or more snitches for a set of rules. If the rules only need tags from default snitch it need not be explicitly configured. For example:
snitch=class:fqn.ClassName,key1:val1,key2:val2,key3:val3
How Tag Values are Collected
- Identify the set of tags in the rules
- Create instances of Snitches specified. The default snitch is always created.
- Ask each Snitch if it can provide values for the any of the tags. If even one tag does not have a snitch, the assignment fails.
- After identifying the Snitches, they provide the tag values for each node in the cluster.
- If the value for a tag is not obtained for a given node, it cannot participate in the assignment.
Replica Placement Examples
Keep less than 2 replicas (at most 1 replica) of this collection on any node
For this rule, we define the replica
condition with operators for "less than 2", and use a pre-defined tag named node
to define nodes with any name.
replica:<2,node:*
// this is equivalent to replica:<2,node:*,shard:**. We can omit shard:** because ** is the default value of shard
For a given shard, keep less than 2 replicas on any node
For this rule, we use the shard
condition to define any shard, the replica
condition with operators for "less than 2", and finally a pre-defined tag named node
to define nodes with any name.
shard:*,replica:<2,node:*
Assign all replicas in shard1 to rack 730
This rule limits the shard
condition to 'shard1', but any number of replicas. We’re also referencing a custom tag named rack
. Before defining this rule, we will need to configure a custom Snitch which provides values for the tag rack
.
shard:shard1,replica:*,rack:730
In this case, the default value of replica
is *
, or all replicas. It can be omitted and the rule will be reduced to:
shard:shard1,rack:730
Create replicas in nodes with less than 5 cores only
This rule uses the replica
condition to define any number of replicas, but adds a pre-defined tag named core
and uses operators for "less than 5".
replica:*,cores:<5
Again, we can simplify this to use the default value for replica
, like so:
cores:<5
Do not create any replicas in host 192.45.67.3
This rule uses only the pre-defined tag host
to define an IP address where replicas should not be placed.
host:!192.45.67.3
Defining Rules
Rules are specified per collection during collection creation as request parameters. It is possible to specify multiple ‘rule’ and ‘snitch’ parameters as in this example:
snitch=class:EC2Snitch&rule=shard:*,replica:1,dc:dc1&rule=shard:*,replica:<2,dc:dc3
These rules are persisted in clusterstate.json
in ZooKeeper and are available throughout the lifetime of the collection. This enables the system to perform any future node allocation without direct user interaction. The rules added during collection creation can be modified later using the MODIFYCOLLECTION API.