Result Grouping
Result Grouping groups documents with a common field value into groups and returns the top documents for each group.
For example, if you searched for "DVD" on an electronic retailer’s e-commerce site, you might be returned three categories such as "TV and Video", "Movies", and "Computers" with three results per category. In this case, the query term "DVD" appeared in all three categories, so Solr groups them together in order to increase relevancy for the user.
Prefer Collapse & Expand instead
Solr’s Collapse and Expand feature is newer and mostly overlaps with Result Grouping. There are features unique to both, and they have different performance characteristics. That said, in most cases Collapse and Expand is preferable to Result Grouping. |
Result Grouping is separate from Faceting. Though it is conceptually similar, faceting returns all relevant results and allows the user to refine the results based on the facet category. For example, if you search for "shoes" on a footwear retailer’s e-commerce site, Solr would return all results for that query term, along with selectable facets such as "size," "color," "brand," and so on.
You can however combine grouping with faceting. Grouped faceting supports facet.field
and facet.range
but currently doesn’t support date and pivot faceting. The facet counts are computed based on the first group.field
parameter, and other group.field
parameters are ignored.
Grouped faceting differs from non grouped facets (sum of all facets) == (total of products with that property)
as shown in the following example:
Object 1
- name: Phaser 4620a
- ppm: 62
- product_range: 6
Object 2
- name: Phaser 4620i
- ppm: 65
- product_range: 6
Object 3
- name: ML6512
- ppm: 62
- product_range: 7
If you ask Solr to group these documents by "product_range", then the total amount of groups is 2, but the facets for ppm are 2 for 62 and 1 for 65.
Grouping Parameters
Result Grouping takes the following request parameters. Any number of these request parameters can be included in a single request:
group
- If
true
, query results will be grouped. group.field
- The name of the field by which to group results. The field must be single-valued, and either be indexed or a field type that has a value source and works in a function query, such as
ExternalFileField
. It must also be a string-based field, such asStrField
orTextField
group.func
Group based on the unique values of a function query.
This option does not work with distributed searches. group.query
- Return a single group of documents that match the given query.
rows
- The number of groups to return. The default value is
10
. start
- Specifies an initial offset for the list of groups.
group.limit
- Specifies the number of results to return for each group. The default value is
1
. group.offset
- Specifies an initial offset for the document list of each group.
sort
- Specifies how Solr sorts the groups relative to each other. For example,
sort=popularity desc
will cause the groups to be sorted according to the highest popularity document in each group. The default value isscore desc
. group.sort
- Specifies how Solr sorts documents within each group. The default behavior if
group.sort
is not specified is to use the same effective value as thesort
parameter. group.format
- If this parameter is set to
simple
, the grouped documents are presented in a single flat list, and thestart
androws
parameters affect the numbers of documents instead of groups. An alternate value for this parameter isgrouped
. group.main
- If
true
, the result of the first field grouping command is used as the main result list in the response, usinggroup.format=simple
. group.ngroups
If
true
, Solr includes the number of groups that have matched the query in the results. The default value is false.See below for Distributed Result Grouping Caveats when using sharded indexes.
group.truncate
- If
true
, facet counts are based on the most relevant document of each group matching the query. The default value isfalse
. group.facet
Determines whether to compute grouped facets for the field facets specified in facet.field parameters. Grouped facets are computed based on the first specified group. As with normal field faceting, fields shouldn’t be tokenized (otherwise counts are computed for each token). Grouped faceting supports single and multivalued fields. Default is
false
.There can be a heavy performance cost to this option. See below for Distributed Result Grouping Caveats when using sharded indexes.
group.cache.percent
Setting this parameter to a number greater than 0 enables caching for result grouping. Result Grouping executes two searches; this option caches the second search. The default value is
0
. The maximum value is100
.Testing has shown that group caching only improves search time with Boolean, wildcard, and fuzzy queries. For simple queries like term or "match all" queries, group caching degrades performance.
Any number of group commands (e.g., group.field
, group.func
, group.query
, etc.) may be specified in a single request.
Grouping Examples
All of the following sample queries work with Solr’s “bin/solr -e techproducts” example.
Grouping Results by Field
In this example, we will group results based on the manu_exact
field, which specifies the manufacturer of the items in the sample dataset.
http://localhost:8983/solr/techproducts/select?fl=id,name&q=solr+memory&group=true&group.field=manu_exact
{
"..."
"grouped":{
"manu_exact":{
"matches":6,
"groups":[{
"groupValue":"Apache Software Foundation",
"doclist":{"numFound":1,"start":0,"docs":[
{
"id":"SOLR1000",
"name":"Solr, the Enterprise Search Server"}]
}},
{
"groupValue":"Corsair Microsystems Inc.",
"doclist":{"numFound":2,"start":0,"docs":[
{
"id":"VS1GB400C3",
"name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail"}]
}},
{
"groupValue":"A-DATA Technology Inc.",
"doclist":{"numFound":1,"start":0,"docs":[
{
"id":"VDBDB1A16",
"name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM"}]
}},
{
"groupValue":"Canon Inc.",
"doclist":{"numFound":1,"start":0,"docs":[
{
"id":"0579B002",
"name":"Canon PIXMA MP500 All-In-One Photo Printer"}]
}},
{
"groupValue":"ASUS Computer Inc.",
"doclist":{"numFound":1,"start":0,"docs":[
{
"id":"EN7800GTX/2DHTV/256M",
"name":"ASUS Extreme N7800GTX/2DHTV (256 MB)"}]
}
}]}}}
The response indicates that there are six total matches for our query. For each of the five unique values of group.field
, Solr returns a docList
for that groupValue
such that the numFound
indicates the total number of documents in that group, and the top documents are returned according to the implicit default group.limit=1
and group.sort=score desc
parameters. The resulting groups are then sorted by the score of the top document within each group based on the implicit sort=score desc
, and the number of groups returned is limited to the implicit rows=10
.
We can run the same query with the request parameter group.main=true
. This will format the results as a single flat document list. This flat format does not include as much information as the normal result grouping query results – notably the numFound
in each group – but it may be easier for existing Solr clients to parse.
http://localhost:8983/solr/techproducts/select?fl=id,name,manufacturer&q=solr+memory&group=true&group.field=manu_exact&group.main=true
{
"responseHeader":{
"status":0,
"QTime":1,
"params":{
"fl":"id,name,manufacturer",
"indent":"true",
"q":"solr memory",
"group.field":"manu_exact",
"group.main":"true",
"group":"true"}},
"grouped":{},
"response":{"numFound":6,"start":0,"docs":[
{
"id":"SOLR1000",
"name":"Solr, the Enterprise Search Server"},
{
"id":"VS1GB400C3",
"name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail"},
{
"id":"VDBDB1A16",
"name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM"},
{
"id":"0579B002",
"name":"Canon PIXMA MP500 All-In-One Photo Printer"},
{
"id":"EN7800GTX/2DHTV/256M",
"name":"ASUS Extreme N7800GTX/2DHTV (256 MB)"}]
}
}
Grouping by Query
In this example, we will use the group.query
parameter to find the top three results for "memory" in two different price ranges: 0.00 to 99.99, and over 100.
http://localhost:8983/solr/techproducts/select?indent=true&fl=name,price&q=memory&group=true&group.query=price:[0+TO+99.99]&group.query=price:[100+TO+*]&group.limit=3
{
"responseHeader":{
"status":0,
"QTime":42,
"params":{
"fl":"name,price",
"indent":"true",
"q":"memory",
"group.limit":"3",
"group.query":["price:[0 TO 99.99]",
"price:[100 TO *]"],
"group":"true"}},
"grouped":{
"price:[0 TO 99.99]":{
"matches":5,
"doclist":{"numFound":1,"start":0,"docs":[
{
"name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail",
"price":74.99}]
}},
"price:[100 TO *]":{
"matches":5,
"doclist":{"numFound":3,"start":0,"docs":[
{
"name":"CORSAIR XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail",
"price":185.0},
{
"name":"Canon PIXMA MP500 All-In-One Photo Printer",
"price":179.99},
{
"name":"ASUS Extreme N7800GTX/2DHTV (256 MB)",
"price":479.95}]
}
}
}
}
In this case, Solr found five matches for "memory," but only returns four results grouped by price. This is because one result for "memory" did not have a price assigned to it.
Distributed Result Grouping Caveats
Grouping is supported for distributed searches, with some caveats:
- Currently
group.func
is is not supported in any distributed searches group.ngroups
andgroup.facet
require that all documents in each group must be co-located on the same shard in order for accurate counts to be returned. Document routing via composite keys can be a useful solution in many situations.