Class AffinityPlacementFactory

  • All Implemented Interfaces:
    ConfigurablePlugin<AffinityPlacementConfig>, PlacementPluginFactory<AffinityPlacementConfig>

    public class AffinityPlacementFactory
    extends Object
    implements PlacementPluginFactory<AffinityPlacementConfig>
    This factory is instantiated by config from its class name. Using it is the only way to create instances of AffinityPlacementFactory.AffinityPlacementPlugin.

    In order to configure this plugin to be used for placement decisions, the following curl command (or something equivalent) has to be executed once the cluster is already running in order to set the appropriate Zookeeper stored configuration. Replace localhost:8983 by one of your servers' IP address and port.

    
     curl -X POST -H 'Content-type:application/json' -d '{
     "add": {
       "name": ".placement-plugin",
       "class": "org.apache.solr.cluster.placement.plugins.AffinityPlacementFactory",
       "config": {
         "minimalFreeDiskGB": 10,
         "prioritizedFreeDiskGB": 50
       }
     }
     }' http://localhost:8983/api/cluster/plugin
     

    In order to delete the placement-plugin section (and to fallback to either Legacy or rule based placement if configured for a collection), execute:

    
     curl -X POST -H 'Content-type:application/json' -d '{
     "remove" : ".placement-plugin"
     }' http://localhost:8983/api/cluster/plugin
     

    AffinityPlacementFactory.AffinityPlacementPlugin implements placing replicas in a way that replicate past Autoscaling config defined here.

    This specification is doing the following:

    Spread replicas per shard as evenly as possible across multiple availability zones (given by a sys prop), assign replicas based on replica type to specific kinds of nodes (another sys prop), and avoid having more than one replica per shard on the same node.
    Only after these constraints are satisfied do minimize cores per node or disk usage.

    Overall strategy of this plugin:

    • The set of nodes in the cluster is obtained and transformed into 3 independent sets (that can overlap) of nodes accepting each of the three replica types.
    • For each shard on which placing replicas is required and then for each replica type to place (starting with NRT, then TLOG then PULL):
      • The set of candidates nodes corresponding to the replica type is used and from that set are removed nodes that already have a replica (of any type) for that shard
      • If there are not enough nodes, an error is thrown (this is checked further down during processing).
      • The number of (already existing) replicas of the current type on each Availability Zone is collected.
      • Separate the set of available nodes to as many subsets (possibly some are empty) as there are Availability Zones defined for the candidate nodes
      • In each AZ nodes subset, sort the nodes by increasing total number of cores count, with possibly a condition that pushes nodes with low disk space to the end of the list? Or a weighted combination of the relative importance of these two factors? Some randomization? Marking as non available nodes with not enough disk space? These and other are likely aspects to be played with once the plugin is tested or observed to be running in prod, don't expect the initial code drop(s) to do all of that.
      • Iterate over the number of replicas to place (for the current replica type for the current shard):
        • Based on the number of replicas per AZ collected previously, pick the non empty set of nodes having the lowest number of replicas. Then pick the first node in that set. That's the node the replica is placed one. Remove the node from the set of available nodes for the given AZ and increase the number of replicas placed on that AZ.
      • During this process, the number of cores on the nodes in general is tracked to take into account placement decisions so that not all shards decide to put their replicas on the same nodes (they might though if these are the less loaded nodes).

    This code is a realistic placement computation, based on a few assumptions. The code is written in such a way to make it relatively easy to adapt it to (somewhat) different assumptions. Additional configuration options could be introduced to allow configuration base option selection as well...