Colocating Collections
Solr provides a way to colocate a collection with another so that cross-collection joins are always possible.
The colocation guarantee applies to all future Collection operations made either via Collections API or by Autoscaling actions.
A collection may only be colocated with exactly one withCollection
. However, arbitrarily many collections may be
linked to the same withCollection
.
Create a Colocated Collection
The Create Collection API supports a parameter named withCollection
which can be used to specify a collection
with which the replicas of the newly created collection should be colocated. See Create Collection API.
/admin/collections?action=CREATE&name=techproducts&numShards=1&replicationFactor=2&withCollection=tech_categories
In the above example, all replicas of the techproducts
collection will be colocated on a node with at least one
replica of the tech_categories
collection.
Colocating Existing Collections
When collections already exist beforehand, the Modify Collection API can be
used to set the withCollection
parameter so that the two collections can be linked. This will not trigger
changes to the cluster automatically because moving a large number of replicas immediately might de-stabilize the system.
Instead, it is recommended that the Suggestions UI page should be consulted on the operations that can be performed
to change the cluster manually.
Example:
/admin/collections?action=MODIFYCOLLECTION&collection=techproducts&withCollection=tech_categories
Deleting Colocated Collections
Deleting a collection which has been linked to another will fail unless the link itself is deleted first by using the
Modify Collection API to un-set the withCollection
attribute.
Example:
/admin/collections?action=MODIFYCOLLECTION&collection=techproducts&withCollection=
Limitations and Caveats
The collection being used as the withCollection
must have one shard only and that shard should be named shard1
. Note
that when using the default router, the shard name is always set to shard1
but special care must be taken to name the
shard as shard1
when using the implicit router.
In case new replicas of the withCollection
have to be added to maintain the colocation guarantees then the new replicas
will be of type NRT
only. Automatically creating replicas of TLOG
or PULL
types is not supported.
In case, replicas have to be moved from one node to another, perhaps in response to a node lost trigger, then the target
nodes will be chosen by preferring nodes that already have a replica of the withCollection
so that the number of moves
is minimized. However, this also means that unless there are Autoscaling policy violations, Solr will continue to move
such replicas to already loaded nodes instead of preferring empty nodes. Therefore, it is advised to have policy rules
which can prevent such overloading by e.g., setting the maximum number of cores per node to a fixed value.
Example:
{'cores' : '<8', 'node' : '#ANY'}
The colocation guarantee is one-way only i.e., a collection 'X' colocated with 'Y' will always have one or more replicas of 'Y' on any node that has a replica of 'X' but the reverse is not true. There may be nodes which have one or more replicas of 'Y' but no replicas of 'X'. Such replicas of 'Y' will not be considered a violation of colocation rules and will not be cleaned up automatically.