Faceting
Faceting is the arrangement of search results into categories based on indexed terms.
Searchers are presented with the indexed terms, along with numerical counts of how many matching documents were found for each term. Faceting makes it easy for users to explore search results, narrowing in on exactly the results they are looking for.
See also JSON Facet API for an alternative approach to this.
General Facet Parameters
There are two general parameters for controlling faceting.
facet
-
Required
Default:
false
If set to
true
, this parameter enables facet counts in the query response. If set tofalse
, a blank or missing value, this parameter disables faceting. None of the other parameters listed below will have any effect unless this parameter is set totrue
. facet.query
-
Optional
Default: none
Specify an arbitrary query in the Lucene default syntax to generate a facet count.
By default, Solr’s faceting feature automatically determines the unique terms for a field and returns a count for each of those terms. Using
facet.query
, you can override this default behavior and select exactly which terms or expressions you would like to see counted. In a typical implementation of faceting, you will specify a number offacet.query
parameters. This parameter can be particularly useful for numeric-range-based facets or prefix-based facets.You can set the
facet.query
parameter multiple times to indicate that multiple queries should be used as separate facet constraints.To use facet queries in a syntax other than the default syntax, prefix the facet query with the name of the query notation. For example, to use the hypothetical
myfunc
query parser, you could set thefacet.query
parameter like so:facet.query={!myfunc}name~fred
Field-Value Faceting Parameters
Several parameters can be used to trigger faceting based on the indexed terms in a field.
When using these parameters, it is important to remember that "term" is a very specific concept in Lucene: it relates to the literal field/value pairs that are indexed after any analysis occurs. For text fields that include stemming, lowercasing, or word splitting, the resulting terms may not be what you expect.
If you want Solr to perform both analysis (for searching) and faceting on the full literal strings, use the copyField
directive in your Schema to create two versions of the field: one Text and one String.
The Text field should have indexed="true" docValues="false"
if used for searching but not faceting and the String field should have indexed="false" docValues="true"
if used for faceting but not searching.
(For more information about the copyField
directive, see Copy Fields.)
Unless otherwise specified, all of the parameters below can be specified on a per-field basis with the syntax of f.<fieldname>.facet.<parameter>
facet.field
-
Optional
Default: none
Identifies a field that should be treated as a facet. It iterates over each Term in the field and generate a facet count using that Term as the constraint. This parameter can be specified multiple times in a query to select multiple facet fields.
If you do not set this parameter to at least one field in the schema, none of the other parameters described in this section will have any effect. facet.prefix
-
Optional
Default: none
Limits the terms on which to facet to those starting with the given string prefix. This does not limit the query in any way, only the facets that would be returned in response to the query.
facet.contains
-
Optional
Default: none
Limits the terms on which to facet to those containing the given substring. This does not limit the query in any way, only the facets that would be returned in response to the query.
facet.contains.ignoreCase
-
Optional
Default: none
If set to
true
, causes case to be ignored when matching thefacet.contains
substring against candidate facet terms. facet.matches
-
Optional
Default: none
Only returns facet buckets for the terms that match this regular expression.
facet.sort
-
Optional
Default: none
The ordering of the facet field terms. There are two options:
count
-
Return the terms sorted by highest count first.
index
-
Return the terms sorted lexicographically. For terms in the ASCII range, this will be alphabetically sorted.
The default is
count
iffacet.limit
is greater than 0, otherwise, the default isindex
. Note that the default logic is changed when Limiting Facet with Certain Terms.
facet.limit
-
Optional
Default:
100
The number of facet counts returned. A negative value means that Solr will return all counts.
facet.offset
-
Optional
Default:
0
An offset into the list of facets returned to allow paging.
facet.mincount
-
Optional
Default:
0
The minimum count required for a facet field to be included in the response. If a field’s counts are below the minimum, the field’s facet is not returned.
facet.missing
-
Optional
Default:
false
If set to
true
, this parameter indicates that, in addition to the Term-based constraints of a facet field, a count of all results that match the query but which have no facet value for the field should be computed and returned in the response. facet.method
-
Optional
Default:
fc
Selects the type of algorithm or method to use when faceting a field.
The following methods are available.
enum
-
Enumerates all terms in a field, calculating the set intersection of documents that match the term with documents that match the query.
This method is recommended for faceting multi-valued fields that have only a few distinct values. The average number of values per document does not matter.
For example, faceting on a field with U.S. States such as
Alabama, Alaska, … Wyoming
would lead to fifty cached filters which would be used over and over again. ThefilterCache
should be large enough to hold all the cached filters. fc
-
Calculates facet counts by iterating over documents that match the query and summing the terms that appear in each document.
This is currently implemented using an
UnInvertedField
cache if the field either is multi-valued or is tokenized (according toFieldType.isTokened()
). Each document is looked up in the cache to see what terms/values it contains, and a tally is incremented for each value.This method is excellent for situations where the number of indexed values for the field is high, but the number of values per document is low. For multi-valued fields, a hybrid approach is used that uses term filters from the
filterCache
for terms that match many documents. The lettersfc
stand for field cache. fcs
-
Per-segment field faceting for single-valued string fields. Enable with
facet.method=fcs
and control the number of threads used with thethreads
local parameter. This parameter allows faceting to be faster in the presence of rapid index changes.
The default value is
fc
(except for fields using theBoolField
field type and whenfacet.exists=true
is requested) since it tends to use less memory and is faster when a field has many unique terms in the index. facet.enum.cache.minDf
-
Optional
Default:
0
Indicates the minimum document frequency (the number of documents matching a term) for which the filterCache should be used when determining the constraint count for that term. This is only used with the
facet.method=enum
method of faceting.A value greater than zero decreases the filterCache’s memory usage, but increases the time required for the query to be processed. If you are faceting on a field with a very large number of terms, and you wish to decrease memory usage, try setting this parameter to a value between
25
and50
, and run a few tests. Then, optimize the parameter setting as necessary.The default value is
0
, causing the filterCache to be used for all terms in the field. facet.exists
-
Optional
Default: none
To cap facet counts by 1, specify
facet.exists=true
. This parameter can be used withfacet.method=enum
or when it’s omitted. It can be used only on non-trie fields (such as strings). It may speed up facet counting on large indices and/or high-cardinality facet values. facet.excludeTerms
-
Optional
Default: none
Removes terms from facet counts but keeps them in the index.
facet.overrequest.count
andfacet.overrequest.ratio
-
Optional
Default: see description
In some situations, the accuracy in selecting the "top" constraints returned for a facet in a distributed Solr query can be improved by "over-requesting" the number of desired constraints (i.e.,
facet.limit
) from each of the individual shards. In these situations, each shard is by default asked for the top10 + (1.5 * facet.limit)
constraints.Depending on how your docs are partitioned across your shards and what
facet.limit
value you used, you may find it advantageous to increase or decrease the amount of over-requesting Solr does. This can be achieved by setting thefacet.overrequest.count
(defaults to10
) andfacet.overrequest.ratio
(defaults to1.5
) parameters. facet.threads
-
Optional
Default: 0
The maximum number of parallel threads used to load the underlying fields used in faceting.
Omitting this parameter or specifying the thread count as
0
will not spawn any threads, and only the main request thread will be used. Specifying a negative number of threads will create up toInteger.MAX_VALUE
threads.
Range Faceting
You can use Range Faceting on any date field or any numeric field that supports range queries. This is particularly useful for stitching together a series of range queries (as facet by query) for things like prices.
facet.range
-
Required
Default: none
The field for which Solr should create range facets. For example:
facet.range=price&facet.range=age
facet.range=lastModified_dt
facet.range.start
-
Required
Default: none
The lower bound of the ranges. You can specify this parameter on a per field basis with the syntax of
f.<fieldname>.facet.range.start
. For example:f.price.facet.range.start=0.0&f.age.facet.range.start=10
f.lastModified_dt.facet.range.start=NOW/DAY-30DAYS
facet.range.end
-
Required
Default: none
The upper bound of the ranges. You can specify this parameter on a per field basis with the syntax of
f.<fieldname>.facet.range.end
. For example:f.price.facet.range.end=1000.0&f.age.facet.range.start=99
f.lastModified_dt.facet.range.end=NOW/DAY+30DAYS
facet.range.gap
-
Required
Default: none
The span of each range expressed as a value to be added to the lower bound. For date fields, this should be expressed using the
DateMathParser
syntax (such as,facet.range.gap=%2B1DAY … '+1DAY'
).You can specify this parameter on a per-field basis with the syntax of
f.<fieldname>.facet.range.gap
. For example:f.price.facet.range.gap=100&f.age.facet.range.gap=10
f.lastModified_dt.facet.range.gap=+1DAY
facet.range.hardend
-
Optional
Default:
false
How to handle cases where the
facet.range.gap
does not divide evenly betweenfacet.range.start
andfacet.range.end
.If
true
, the last range constraint will have thefacet.range.end
value as an upper bound. Iffalse
, the last range will have the smallest possible upper bound greater thenfacet.range.end
so the range is the exact width of the specified range gap.This parameter can be specified on a per field basis with the syntax
f.<fieldname>.facet.range.hardend
. facet.range.include
-
Optional
Default: see description
By default, the ranges used to compute range faceting between
facet.range.start
andfacet.range.end
are inclusive of their lower bounds and exclusive of the upper bounds. The "before" range defined with thefacet.range.other
parameter is exclusive and the "after" range is inclusive. This default, equivalent to "lower" below, will not result in double counting at the boundaries. You can use thefacet.range.include
parameter to modify this behavior using the following options:-
lower
: All gap-based ranges include their lower bound. -
upper
: All gap-based ranges include their upper bound. -
edge
: The first and last gap ranges include their edge bounds (lower for the first one, upper for the last one) even if the corresponding upper/lower option is not specified. -
outer
: The "before" and "after" ranges will be inclusive of their bounds, even if the first or last ranges already include those boundaries. -
all
: Includes all options:lower
,upper
,edge
, andouter
.
You can specify this parameter on a per field basis with the syntax of
f.<fieldname>.facet.range.include
, and you can specify it multiple times to indicate multiple choices. -
To ensure you avoid double-counting, do not choose both lower and upper , do not choose outer , and do not choose all .
|
facet.range.other
-
Optional
Default: none
In addition to the counts for each range constraint between
facet.range.start
andfacet.range.end
, counts will also be computed for these options:-
before
: All records with field values lower than lower bound of the first range. -
after
: All records with field values greater than the upper bound of the last range. -
between
: All records with field values between the start and end bounds of all ranges. -
none
: Do not compute any counts. -
all
: Compute counts for before, between, and after.
This parameter can be specified on a per field basis with the syntax of
f.<fieldname>.facet.range.other
. In addition to theall
option, this parameter can be specified multiple times to indicate multiple choices, butnone
will override all other options. -
facet.range.method
-
Optional
Default:
filter
Selects the type of algorithm or method to use for range faceting. Both methods produce the same results, but performance may vary.
- filter
-
Generates the ranges based on other facet.range parameters, and for each of them executes a filter that later intersects with the main query resultset to get the count. It will make use of the filterCache, so it will benefit of a cache large enough to contain all ranges.
- dv
-
Iterates the documents that match the main query, and for each of them finds the correct range for the value. This method will make use of DocValues (if enabled for the field) or fieldCache. The
dv
method is not supported for field type DateRangeField or when using group.facets.
Date Ranges & Time Zones
Range faceting on date fields is a common situation where the For more information, see the examples in the section Date Formatting and Date Math. |
Pivot (Decision Tree) Faceting
Pivoting is a summarization tool that lets you automatically sort, count, total or average data stored in a table. The results are typically displayed in a second table showing the summarized data. Pivot faceting lets you create a summary table of the results from a faceting documents by multiple fields.
Another way to look at it is that the query produces a Decision Tree, in that Solr tells you "for facet A, the constraints/counts are X/N, Y/M, etc. If you were to constrain A by X, then the constraint counts for B would be S/P, T/Q, etc." In other words, it tells you in advance what the "next" set of facet results would be for a field if you apply a constraint from the current facet results.
facet.pivot
-
Optional
Default: none
The fields to use for the pivot. Multiple
facet.pivot
values will create multiple "facet_pivot" sections in the response. Separate each list of fields with a comma. facet.pivot.mincount
-
Optional
Default:
1
The minimum number of documents that need to match in order for the facet to be included in results.
Using the “bin/solr start -e techproducts” example, A query URL like this one will return the data below, with the pivot faceting results found in the section "facet_pivot":
http://localhost:8983/solr/techproducts/select?q=*:*&facet.pivot=cat,popularity,inStock &facet.pivot=popularity,cat&facet=true&facet.field=cat&facet.limit=5&rows=0&facet.pivot.mincount=2
{ "facet_counts":{ "facet_queries":{}, "facet_fields":{ "cat":[ "electronics",14, "currency",4, "memory",3, "connector",2, "graphics card",2]}, "facet_dates":{}, "facet_ranges":{}, "facet_pivot":{ "cat,popularity,inStock":[{ "field":"cat", "value":"electronics", "count":14, "pivot":[{ "field":"popularity", "value":6, "count":5, "pivot":[{ "field":"inStock", "value":true, "count":5}]}] }]}}}
Combining Stats Component With Pivots
In addition to some of the general local params supported by other types of faceting, a stats
local params can be used with facet.pivot
to refer to stats.field
instances (by tag) that you would like to have computed for each Pivot Constraint.
In the example below, two different (overlapping) sets of statistics are computed for each of the facet.pivot result hierarchies:
stats=true
stats.field={!tag=piv1,piv2 min=true max=true}price
stats.field={!tag=piv2 mean=true}popularity
facet=true
facet.pivot={!stats=piv1}cat,inStock
facet.pivot={!stats=piv2}manu,inStock
Results:
{"facet_pivot":{
"cat,inStock":[{
"field":"cat",
"value":"electronics",
"count":12,
"pivot":[{
"field":"inStock",
"value":true,
"count":8,
"stats":{
"stats_fields":{
"price":{
"min":74.98999786376953,
"max":399.0}}}},
{
"field":"inStock",
"value":false,
"count":4,
"stats":{
"stats_fields":{
"price":{
"min":11.5,
"max":649.989990234375}}}}],
"stats":{
"stats_fields":{
"price":{
"min":11.5,
"max":649.989990234375}}}},
{
"field":"cat",
"value":"currency",
"count":4,
"pivot":[{
"field":"inStock",
"value":true,
"count":4,
"stats":{
"stats_fields":{
"price":{
"..."
"manu,inStock":[{
"field":"manu",
"value":"inc",
"count":8,
"pivot":[{
"field":"inStock",
"value":true,
"count":7,
"stats":{
"stats_fields":{
"price":{
"min":74.98999786376953,
"max":2199.0},
"popularity":{
"mean":5.857142857142857}}}},
{
"field":"inStock",
"value":false,
"count":1,
"stats":{
"stats_fields":{
"price":{
"min":479.95001220703125,
"max":479.95001220703125},
"popularity":{
"mean":7.0}}}}],
"..."}]}}}}]}]}}
Combining Facet Queries And Facet Ranges With Pivot Facets
A query
local parameter can be used with facet.pivot
to refer to facet.query
instances (by tag) that should be computed for each pivot constraint.
Similarly, a range
local parameter can be used with facet.pivot
to refer to facet.range
instances.
In the example below, two query facets are computed for h of the facet.pivot
result hierarchies:
facet=true
facet.query={!tag=q1}manufacturedate_dt:[2006-01-01T00:00:00Z TO NOW]
facet.query={!tag=q1}price:[0 TO 100]
facet.pivot={!query=q1}cat,inStock
{"facet_counts": {
"facet_queries": {
"{!tag=q1}manufacturedate_dt:[2006-01-01T00:00:00Z TO NOW]": 9,
"{!tag=q1}price:[0 TO 100]": 7
},
"facet_fields": {},
"facet_dates": {},
"facet_ranges": {},
"facet_intervals": {},
"facet_heatmaps": {},
"facet_pivot": {
"cat,inStock": [
{
"field": "cat",
"value": "electronics",
"count": 12,
"queries": {
"{!tag=q1}manufacturedate_dt:[2006-01-01T00:00:00Z TO NOW]": 9,
"{!tag=q1}price:[0 TO 100]": 4
},
"pivot": [
{
"field": "inStock",
"value": true,
"count": 8,
"queries": {
"{!tag=q1}manufacturedate_dt:[2006-01-01T00:00:00Z TO NOW]": 6,
"{!tag=q1}price:[0 TO 100]": 2
}
},
"..."]}]}}}
In a similar way, in the example below, two range facets are computed for each of the facet.pivot
result hierarchies:
facet=true
facet.range={!tag=r1}manufacturedate_dt
facet.range.start=2006-01-01T00:00:00Z
facet.range.end=NOW/YEAR
facet.range.gap=+1YEAR
facet.pivot={!range=r1}cat,inStock
{"facet_counts":{
"facet_queries":{},
"facet_fields":{},
"facet_dates":{},
"facet_ranges":{
"manufacturedate_dt":{
"counts":[
"2006-01-01T00:00:00Z",9,
"2007-01-01T00:00:00Z",0,
"2008-01-01T00:00:00Z",0,
"2009-01-01T00:00:00Z",0,
"2010-01-01T00:00:00Z",0,
"2011-01-01T00:00:00Z",0,
"2012-01-01T00:00:00Z",0,
"2013-01-01T00:00:00Z",0,
"2014-01-01T00:00:00Z",0],
"gap":"+1YEAR",
"start":"2006-01-01T00:00:00Z",
"end":"2015-01-01T00:00:00Z"}},
"facet_intervals":{},
"facet_heatmaps":{},
"facet_pivot":{
"cat,inStock":[{
"field":"cat",
"value":"electronics",
"count":12,
"ranges":{
"manufacturedate_dt":{
"counts":[
"2006-01-01T00:00:00Z",9,
"2007-01-01T00:00:00Z",0,
"2008-01-01T00:00:00Z",0,
"2009-01-01T00:00:00Z",0,
"2010-01-01T00:00:00Z",0,
"2011-01-01T00:00:00Z",0,
"2012-01-01T00:00:00Z",0,
"2013-01-01T00:00:00Z",0,
"2014-01-01T00:00:00Z",0],
"gap":"+1YEAR",
"start":"2006-01-01T00:00:00Z",
"end":"2015-01-01T00:00:00Z"}},
"pivot":[{
"field":"inStock",
"value":true,
"count":8,
"ranges":{
"manufacturedate_dt":{
"counts":[
"2006-01-01T00:00:00Z",6,
"2007-01-01T00:00:00Z",0,
"2008-01-01T00:00:00Z",0,
"2009-01-01T00:00:00Z",0,
"2010-01-01T00:00:00Z",0,
"2011-01-01T00:00:00Z",0,
"2012-01-01T00:00:00Z",0,
"2013-01-01T00:00:00Z",0,
"2014-01-01T00:00:00Z",0],
"gap":"+1YEAR",
"start":"2006-01-01T00:00:00Z",
"end":"2015-01-01T00:00:00Z"}}},
"..."]}]}}}
Additional Pivot Parameters
Although facet.pivot.mincount
deviates in name from the facet.mincount
parameter used by field faceting, many of the faceting parameters described above can also be used with pivot faceting:
-
facet.limit
-
facet.offset
-
facet.sort
-
facet.overrequest.count
-
facet.overrequest.ratio
Interval Faceting
Another supported form of faceting is interval faceting. This sounds similar to range faceting, but the functionality is really closer to doing facet queries with range queries. Interval faceting allows you to set variable intervals and count the number of documents that have values within those intervals in the specified field.
Even though the same functionality can be achieved by using a facet query with range queries, the implementation of these two methods is very different and will provide different performance depending on the context.
If you are concerned about the performance of your searches you should test with both options. Interval faceting tends to be better with multiple intervals for the same fields, while facet query tend to be better in environments where filter cache is more effective (static indexes for example).
This method will use DocValues if they are enabled for the field, will use fieldCache otherwise.
Use these parameters for interval faceting:
facet.interval
-
Optional
Default: none
The field where interval faceting must be applied. It can be used multiple times in the same request to indicate multiple fields.
facet.interval=price&facet.interval=size
facet.interval.set
-
Optional
Default: none
Sets the intervals for the field. It can be specified multiple times to indicate multiple intervals. This parameter is global, which means that it will be used for all fields indicated with
facet.interval
unless there is an override for a specific field. To override this parameter on a specific field you can use:f.<fieldname>.facet.interval.set
, for example:f.price.facet.interval.set=[0,10]&f.price.facet.interval.set=(10,100]
Interval Syntax
Intervals must begin with either '(' or '[', be followed by the start value, then a comma (','), the end value, and finally a closing ')' or ']’.
For example:
-
(1,10) → will include values greater than 1 and lower than 10
-
[1,10) → will include values greater or equal to 1 and lower than 10
-
[1,10] → will include values greater or equal to 1 and lower or equal to 10
The initial and end values cannot be empty.
If the interval needs to be unbounded, the special character *
can be used for both, start and end, limits.
When using this special character, the start syntax options ((
and [
), and end syntax options ()
and ]
) will be treated the same.
[*,*]
will include all documents with a value in the field.
The interval limits may be strings but there is no need to add quotes.
All the text until the comma will be treated as the start limit, and the text after that will be the end limit.
For example: [Buenos Aires,New York]
.
Keep in mind that a string-like comparison will be done to match documents in string intervals (case-sensitive).
The comparator can’t be changed.
Commas, brackets and square brackets can be escaped by using \
in front of them.
Whitespaces before and after the values will be omitted.
The start limit can’t be grater than the end limit.
Equal limits are allowed, this allows you to indicate the specific values that you want to count, like [A,A]
, [B,B]
and [C,Z]
.
Interval faceting supports output key replacement described below.
Output keys can be replaced in both the facet.interval parameter
and in the facet.interval.set parameter
.
For example:
&facet.interval={!key=popularity}some_field
&facet.interval.set={!key=bad}[0,5]
&facet.interval.set={!key=good}[5,*]
&facet=true
Local Params for Faceting
The LocalParams syntax allows overriding global settings. It can also provide a method of adding metadata to other parameter values, much like XML attributes.
Tagging and Excluding Filters
You can tag specific filters and exclude those filters when faceting. This is useful when doing multi-select faceting.
Consider the following example query with faceting:
q=mainquery&fq=status:public&fq=doctype:pdf&facet=true&facet.field=doctype
Because everything is already constrained by the filter doctype:pdf
, the facet.field=doctype
facet command is currently redundant and will return 0 counts for everything except doctype:pdf
.
To implement a multi-select facet for doctype, a GUI may want to still display the other doctype values and their associated counts, as if the doctype:pdf
constraint had not yet been applied.
For example:
=== Document Type ===
[ ] Word (42)
[x] PDF (96)
[ ] Excel(11)
[ ] HTML (63)
To return counts for doctype values that are currently not selected, tag filters that directly constrain doctype, and exclude those filters when faceting on doctype.
q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=true&facet.field={!ex=dt}doctype
Filter exclusion is supported for all types of facets.
Both the tag
and ex
local params may specify multiple values by separating them with commas.
Changing the Output Key
To change the output key for a faceting command, specify a new name with the key
local parameter.
For example:
facet.field={!ex=dt key=mylabel}doctype
The parameter setting above causes the field facet results for the "doctype" field to be returned using the key "mylabel" rather than "doctype" in the response. This can be helpful when faceting on the same field multiple times with different exclusions.
Limiting Facet with Certain Terms
To limit field facet with certain terms specify them comma separated with terms
local parameter.
Commas and quotes in terms can be escaped with backslash, as in \,
.
In this case facet is calculated on a way similar to facet.method=enum
, but ignores facet.enum.cache.minDf
.
For example:
facet.field={!terms='alfa,betta,with\,with\',with space'}symbol
This local parameter overrides default logic for facet.sort
.
if facet.sort
is omitted, facets are returned in the given terms order that might be changed with index
and count
values.
Note: other parameters might not be fully supported when this parameter is supplied.
Related Topics
See Spatial Search for examples of faceting by distance and generating heatmaps via faceting.
See json.nl for details on the json.nl
parameter for controlling the format for writing out field facet data when using the JSON response writer.