Analytics Component
The Analytics Component allows users to calculate complex statistical aggregations over result sets.
The component enables interacting with data in a variety of ways, both through a diverse set of analytics functions as well as powerful faceting functionality. The standard facets are supported within the analytics component with additions that leverage its analytical capabilities.
Module
This is provided via the analytics
Solr Module that needs to be enabled before use.
Analytics Configuration
Since the Analytics framework is a search component, it must be declared as such and added to the search handler.
For distributed analytics requests over cloud collections, the component uses the AnalyticsHandler
strictly for intershard communication.
The Analytics Handler should not be used by users to submit analytics requests.
Next you need to register the request handler and search component.
Add the following lines to solrconfig.xml
, near the defintions for other request handlers:
<! To handle user requests >
<searchComponent name="analytics" class="org.apache.solr.handler.component.AnalyticsComponent" />
<requestHandler name="/select" class="solr.SearchHandler">
<arr name="lastcomponents">
<str>analytics</str>
</arr>
</requestHandler>
<! For intershard communication during distributed requests >
<requestHandler name="/analytics" class="org.apache.solr.handler.AnalyticsHandler" />
For these changes to take effect, restart Solr or reload the core or collection.
Request Syntax
An Analytics request is passed to Solr with the parameter analytics
in a request sent to the
search handler.
Since the analytics request is sent inside of a search handler request, it will compute results based on the result set determined by the search handler.
For example, this curl command encodes and POSTs a simple analytics request to the search handler:
$ curl databinary 'analytics={
"expressions" : {
"revenue" : "sum(mult(price,quantity))"
}
}'
http://localhost:8983/solr/sales/select?q=*:*&wt=json&rows=0
There are 3 main parts of any analytics request:
 Expressions

A list of calculations to perform over the entire result set. Expressions aggregate the search results into a single value to return. This list is entirely independent of the expressions defined in each of the groupings. Find out more about them in the section Expressions.
 Functions

One or more Variable Functions to be used throughout the rest of the request. These are essentially lambda functions and can be combined in a number of ways. These functions for the expressions defined in
expressions
as well asgroupings
.  Groupings

The list of Groupings to calculate in addition to the expressions. Groupings hold a set of facets and a list of expressions to compute over those facets. The expressions defined in a grouping are only calculated over the facets defined in that grouping.
Optional Parameters
Either the expressions or the groupings parameter must be present in the request, or else there will be no analytics to compute.
The functions parameter is always optional.

{
"functions": {
"sale()": "mult(price,quantity)"
},
"expressions" : {
"max_sale" : "max(sale())",
"med_sale" : "median(sale())"
},
"groupings" : {
"sales" : {
"expressions" : {
"stddev_sale" : "stddev(sale())",
"min_price" : "min(price)",
"max_quantity" : "max(quantity)"
},
"facets" : {
"category" : {
"type" : "value",
"expression" : "fill_missing(category, 'No Category')",
"sort" : {
"criteria" : [
{
"type" : "expression",
"expression" : "min_price",
"direction" : "ascending"
},
{
"type" : "facetvalue",
"direction" : "descending"
}
],
"limit" : 10
}
},
"temps" : {
"type" : "query",
"queries" : {
"hot" : "temp:[90 TO *]",
"cold" : "temp:[* TO 50]"
}
}
}
}
}
}
Expressions
Expressions are the way to request pieces of information from the analytics component. These are the statistical expressions that you want computed and returned in your response.
Constructing an Expression
Expression Components
An expression is built using fields, constants, mapping functions and reduction functions. The ways that these can be defined are described below.
 Sources


Constants: The values defined in the expression. The supported constant types are described in the section Constants.

Fields: Solr fields that are read from the index. The supported fields are listed in the section Supported Field Types.

 Mapping Functions

Mapping functions map values for each Solr Document or Reduction. The provided mapping functions are detailed in the Analytics Mapping Functions.

Unreduced Mapping: Mapping a Field with another Field or Constant returns a value for every Solr Document. Unreduced mapping functions can take fields, constants as well as other unreduced mapping functions as input.

Reduced Mapping: Mapping a Reduction Function with another Reduction Function or Constant returns a single value.

 Reduction Functions

Functions that reduce the values of sources and/or unreduced mapping functions for every Solr Document to a single value. The provided reduction functions are detailed in the Analytics Reduction Functions.
Component Ordering
The expression components must be used in the following order to create valid expressions.

Reduced Mapping Function

Constants

Reduction Function

Sources

Unreduced Mapping Function

Sources

Unreduced Mapping Function



Reduced Mapping Function


Reduction Function
This ordering is based on the following rules:

No reduction function can be an argument of another reduction function. Since all reduction is done together in one step, one reduction function cannot rely on the result of another.

No fields can be left unreduced, since the analytics component cannot return a list of values for an expression (one for every document). Every expression must be reduced to a single value.

Mapping functions are not necessary when creating functions, however as many nested mappings as needed can be used.

Nested mapping functions must be the same type, so either both must be unreduced or both must be reduced. A reduced mapping function cannot take an unreduced mapping function as a parameter and vice versa.
Example Construction
With the above definitions and ordering, an example expression can be broken up into its components:
div(sum(a,fill_missing(b,0)),add(10.5,count(mult(a,c)))))
As a whole, this is a reduced mapping function.
The div
function is a reduced mapping function since it is a provided mapping function and has reduced arguments.
If we break down the expression further:

sum(a,fill_missing(b,0))
: Reduction Function
sum
is a provided reduction function.
a
: Field 
fill_missing(b,0)
: Unreduced Mapping Function
fill_missing
is an unreduced mapping function since it is a provided mapping function and has a field argument.
b
: Field 
0
: Constant



add(10.5,count(mult(a,c)))
: Reduced Mapping Function
add
is a reduced mapping function since it is a provided mapping function and has a reduction function argument.
10.5
: Constant 
count(mult(a,c))
: Reduction Function
count
is a provided reduction function.
mult(a,c)
: Unreduced Mapping Function
mult
is an unreduced mapping function since it is a provided mapping function and has two field arguments.
a
: Field 
c
: Field



Expression Cardinality (MultiValued and SingleValued)
The root of all multivalued expressions are multivalued fields. Singlevalued expressions can be started with constants or singlevalued fields. All singlevalued expressions can be treated as multivalued expressions that contain one value.
Singlevalued expressions and multivalued expressions can be used together in many mapping functions, as well as multivalued expressions being used alone, and many singlevalued expressions being used together. For example:
add(<singlevalued double>, <singlevalued double>, …)

Returns a singlevalued double expression where the value of the values of each expression are added.
add(<singlevalued double>, <multivalued double>)

Returns a multivalued double expression where each value of the second expression is added to the single value of the first expression.
add(<multivalued double>, <singlevalued double>)

Acts the same as the above function.
add(<multivalued double>)

Returns a singlevalued double expression which is the sum of the multiple values of the parameter expression.
Types and Implicit Casting
The new analytics component currently supports the types listed in the below table. These types have oneway implicit casting enabled for the following relationships:
Type  Implicitly Casts To 

Boolean 
String 
Date 
Long, String 
Integer 
Long, Float, Double, String 
Long 
Double, String 
Float 
Double, String 
Double 
String 
String 
none 
An implicit cast means that if a function requires a certain type of value as a parameter, arguments will be automatically converted to that type if it is possible.
For example, concat()
only accepts string parameters and since all types can be implicitly cast to strings, any type is accepted as an argument.
This also goes for dynamically typed functions.
fill_missing()
requires two arguments of the same type.
However, two types that implicitly cast to the same type can also be used.
For example, fill_missing(<long>,<float>)
will be cast to fill_missing(<double>,<double>)
since long cannot be cast to float and float cannot be cast to long implicitly.
There is an ordering to implicit casts, where the more specialized type is ordered ahead of the more general type. Therefore even though both long and float can be implicitly cast to double and string, they will be cast to double. This is because double is a more specialized type than string, which every type can be cast to.
The ordering is the same as their order in the above table.
Cardinality can also be implicitly cast. Singlevalued expressions can always be implicitly cast to multivalued expressions, since all singlevalued expressions are multivalued expressions with one value.
Implicit casting will only occur when an expression will not "compile" without it.
If an expression follows all typing rules initially, no implicit casting will occur.
Certain functions such as string()
, date()
, round()
, floor()
, and ceil()
act as explicit casts, declaring the type that is desired.
However round()
, floor()
and cell()
can return either int or long, depending on the argument type.
Variable Functions
Variable functions are a way to shorten your expressions and make writing analytics queries easier. They are essentially lambda functions defined in a request.
{
"functions" : {
"sale()" : "mult(price,quantity)"
},
"expressions" : {
"max_sale" : "max(sale())",
"med_sale" : "median(sale())"
}
}
In the above request, instead of writing mult(price,quantity)
twice, a function sale()
was defined to abstract this idea.
Then that function was used in the multiple expressions.
Suppose that we want to look at the sales of specific categories:
{
"functions" : {
"clothing_sale()" : "filter(mult(price,quantity),equal(category,'Clothing'))",
"kitchen_sale()" : "filter(mult(price,quantity),equal(category,\"Kitchen\"))"
},
"expressions" : {
"max_clothing_sale" : "max(clothing_sale())"
, "med_clothing_sale" : "median(clothing_sale())"
, "max_kitchen_sale" : "max(kitchen_sale())"
, "med_kitchen_sale" : "median(kitchen_sale())"
}
}
Arguments
Instead of making a function for each category, it would be much easier to use category
as an input to the sale()
function.
An example of this functionality is shown below:
{
"functions" : {
"sale(cat)" : "filter(mult(price,quantity),equal(category,cat))"
},
"expressions" : {
"max_clothing_sale" : "max(sale(\"Clothing\"))"
, "med_clothing_sale" : "median(sale('Clothing'))"
, "max_kitchen_sale" : "max(sale(\"Kitchen\"))"
, "med_kitchen_sale" : "median(sale('Kitchen'))"
}
}
Variable Functions can take any number of arguments and use them in the function expression as if they were a field or constant.
Variable Length Arguments
There are analytics functions that take a variable amount of parameters. Therefore there are use cases where variable functions would need to take a variable amount of parameters.
For example, maybe there are multiple, yet undetermined, number of components to the price of a product.
Functions can take a variable length of parameters if the last parameter is followed by ..
{
"functions" : {
"sale(cat, costs..)" : "filter(mult(add(costs),quantity),equal(category,cat))"
},
"expressions" : {
"max_clothing_sale" : "max(sale('Clothing', material, tariff, tax))"
, "med_clothing_sale" : "median(sale('Clothing', material, tariff, tax))"
, "max_kitchen_sale" : "max(sale('Kitchen', material, construction))"
, "med_kitchen_sale" : "median(sale('Kitchen', material, construction))"
}
}
In the above example a variable length argument is used to encapsulate all of the costs to use for a product.
There is no definite number of arguments requested for the variable length parameter, therefore the clothing expressions can use 3 and the kitchen expressions can use 2.
When the sale()
function is called, costs
is expanded to the arguments given.
Therefore in the above request, inside of the sale
function:

add(costs)
is expanded to both of the following:

add(material, tariff, tax)

add(material, construction)
ForEach Functions
Advanced Functionality
The following function details are for advanced requests. 
Although the above functionality allows for an undefined number of arguments to be passed to a function, it does not allow for interacting with those arguments.
Many times we might want to wrap each argument in additional functions.
For example maybe we want to be able to look at multiple categories at the same time.
So we want to see if category EQUALS x OR category EQUALS y
and so on.
In order to do this we need to use foreach lambda functions, which transform each value of the variable length parameter.
The foreach is started with the :
character after the variable length parameter.
{
"functions" : {
"sale(cats..)" : "filter(mult(price,quantity),or(cats:equal(category,_)))"
},
"expressions" : {
"max_sale_1" : "max(sale('Clothing', 'Kitchen'))"
, "med_sale_1" : "median(sale('Clothing', 'Kitchen'))"
, "max_sale_2" : "max(sale('Electronics', 'Entertainment', 'Travel'))"
, "med_sale_2" : "median(sale('Electronics', 'Entertainment', 'Travel'))"
}
}
In this example, cats:
is the syntax that starts a foreach lambda function over every parameter cats
, and the _
character is used to refer to the value of cats
in each iteration in the foreach.
When sale("Clothing", "Kitchen")
is called, the lambda function equal(category,_)
is applied to both Clothing and Kitchen inside of the or()
function.
Using all of these rules, the expression:
`sale("Clothing","Kitchen")`
is expanded to:
`filter(mult(price,quantity),or(equal(category,"Kitchen"),equal(category,"Clothing")))`
by the expression parser.
Groupings And Facets
Facets, much like in other parts of Solr, allow analytics results to be broken up and grouped by attributes of the data that the expressions are being calculated over.
The currently available facets for use in the analytics component are Value Facets, Pivot Facets, Range Facets and Query Facets. Each facet is required to have a unique name within the grouping it is defined in, and no facet can be defined outside of a grouping.
Groupings allow users to calculate the same grouping of expressions over a set of facets.
Groupings must have both expressions
and facets
given.
{
"functions" : {
"sale()" : "mult(price,quantity)"
},
"groupings" : {
"sales_numbers" : {
"expressions" : {
"max_sale" : "max(sale())",
"med_sale" : "median(sale())"
},
"facets" : {
"<name>" : "< facet request >"
}
}
}
}
{
"analytics_response" : {
"groupings" : {
"sales_numbers" : {
"<name>" : "< facet response >"
}
}
}
}
Facet Sorting
Some Analytics facets allow for complex sorting of their results. The two current sortable facets are Analytic Value Facets and Analytic Pivot Facets.
Parameters
criteria

Required
Default: none
The list of criteria to sort the facet by.
It takes the following parameters:
type

Required
Default: none
The type of sort. There are two possible values: *
expression
: Sort by the value of an expression defined in the same grouping. *facetvalue
: Sort by the stringrepresentation of the facet value. Direction

Optional
Default:
ascending
The direction to sort. The options are
ascending
ordescending
. expression

Optional
Default: none
When
type
isexpression
, the name of an expression defined in the same grouping.
limit

Optional
Default:
1
Limit the number of returned facet values to the top N. The default means there is no limit.
offset

Optional
Default:
0
When a limit is set, skip the top N facet values.
{
"criteria" : [
{
"type" : "expression",
"expression" : "max_sale",
"direction" : "ascending"
},
{
"type" : "facetvalue",
"direction" : "descending"
}
],
"limit" : 10,
"offset" : 5
}
Value Facets
Value facets are used to group documents by the value of a mapping expression applied to each document. Mapping expressions are expressions that do not include a reduction function.
For more information, refer to the Expressions section. For example:

mult(quantity, sum(price, tax))
: breakup documents by the revenue generated. 
fillmissing(state, "N/A")
: breakup documents by state, where N/A is used when the document doesn’t contain a state.
Value facets can be sorted.
Parameters
expression

Required
Default: none
The expression to choose a facet bucket for each document.
sort

Optional
Default: none
A sort for the results of the pivot.
{
"type" : "value",
"expression" : "fillmissing(category,'No Category')",
"sort" : {}
}
[
{ "..." : "..." },
{
"value" : "Electronics",
"results" : {
"max_sale" : 103.75,
"med_sale" : 15.5
}
},
{
"value" : "Kitchen",
"results" : {
"max_sale" : 88.25,
"med_sale" : 11.37
}
},
{ "..." : "..." }
]
This is a replacement for field facets that existed in the original Analytics Component. Field facet functionality is maintained in value facets by using the name of a field as the expression. 
Analytic Pivot Facets
Pivot Facets are used to group documents by the value of multiple mapping expressions applied to each document.
Pivot Facets work much like layers of Value Facets. A list of pivots is required, and the order of the list directly impacts the results returned. The first pivot given will be treated like a normal value facet. The second pivot given will be treated like one value facet for each value of the first pivot. Each of these secondlevel value facets will be limited to the documents in their firstlevel facet bucket. This continues for however many pivots are provided.
Sorting is enabled on a perpivot basis.
This means that if your top pivot has a sort with limit:1
, then only that first value of the facet will be drilled down into.
Sorting in each pivot is independent of the other pivots.
Parameters
pivots

Required
Default: none
The list of pivots to calculate a drilldown facet for. The list is ordered by topmost to bottommost level.
name

Required
Default: none
The name of the pivot.
expression

Required
Default: none
The expression to choose a facet bucket for each document.
sort

Optional
Default: none
A sort for the results of the pivot.
{
"type" : "pivot",
"pivots" : [
{
"name" : "country",
"expression" : "country",
"sort" : {}
},
{
"name" : "state",
"expression" : "fillmissing(state, fillmissing(providence, territory))"
},
{
"name" : "city",
"expression" : "fillmissing(city, 'N/A')",
"sort" : {}
}
]
}
[
{ "..." : "..." },
{
"pivot" : "Country",
"value" : "USA",
"results" : {
"max_sale" : 103.75,
"med_sale" : 15.5
},
"children" : [
{ "..." : "..." },
{
"pivot" : "State",
"value" : "Texas",
"results" : {
"max_sale" : 99.2,
"med_sale" : 20.35
},
"children" : [
{ "..." : "..." },
{
"pivot" : "City",
"value" : "Austin",
"results" : {
"max_sale" : 94.34,
"med_sale" : 17.60
}
},
{ "..." : "..." }
]
},
{ "..." : "..." }
]
},
{ "..." : "..." }
]
Analytics Range Facets
Range Facets are used to group documents by the value of a field into a given set of ranges. The inputs for analytics range facets are identical to those used for Solr range facets. Refer to the section Range Faceting for more information regarding use.
Parameters
field

Required
Default: none
Field to be faceted over.
start

Required
Default: none
The bottom end of the range.
end

Required
Default: none
The top end of the range.
gap

Required
Default: none
A list of range gaps to generate facet buckets. If the buckets do not add up to fit the
start
toend
range, then the lastgap
value will repeated as many times as needed to fill any unused range. hardend

Optional
Default:
false
Whether to cutoff the last facet bucket range at the
end
value if it spills over. include

Optional
Default:
lower
The boundaries to include in the facet buckets. *
lower
 All gapbased ranges include their lower bound. *upper
 All gapbased ranges include their upper bound. *edge
 The first and last gap ranges include their edge bounds (lower for the first one, upper for the last one), even if the corresponding upper/lower option is not specified. *outer
 Thebefore
andafter
ranges will be inclusive of their bounds, even if the first or last ranges already include those boundaries. *all
 Includes all options:lower
,upper
,edge
, andouter
. others

Optional
Default:
none
Additional ranges to include in the facet. *
before
 All records with field values lower then lower bound of the first range. *after
 All records with field values greater then the upper bound of the last range. *between
 All records with field values between the lower bound of the first range and the upper bound of the last range. *none
 Include facet buckets for none of the above. *all
 Include facet buckets forbefore
,after
andbetween
.
{
"type" : "range",
"field" : "price",
"start" : "0",
"end" : "100",
"gap" : [
"5",
"10",
"10",
"25"
],
"hardend" : true,
"include" : [
"lower",
"upper"
],
"others" : [
"after",
"between"
]
}
[
{
"value" : "[0 TO 5]",
"results" : {
"max_sale" : 4.75,
"med_sale" : 3.45
}
},
{
"value" : "[5 TO 15]",
"results" : {
"max_sale" : 13.25,
"med_sale" : 10.20
}
},
{
"value" : "[15 TO 25]",
"results" : {
"max_sale" : 22.75,
"med_sale" : 18.50
}
},
{
"value" : "[25 TO 50]",
"results" : {
"max_sale" : 47.55,
"med_sale" : 30.33
}
},
{
"value" : "[50 TO 75]",
"results" : {
"max_sale" : 70.25,
"med_sale" : 64.54
}
},
{ "..." : "..." }
]
Query Facets
Query facets are used to group documents by given set of queries.
Parameters
queries

Required
Default: none
The list of queries to facet by.
{
"type" : "query",
"queries" : {
"high_quantity" : "quantity:[ 5 TO 14 ] AND price:[ 100 TO * ]",
"low_quantity" : "quantity:[ 1 TO 4 ] AND price:[ 100 TO * ]"
}
}
[
{
"value" : "high_quantity",
"results" : {
"max_sale" : 4.75,
"med_sale" : 3.45
}
},
{
"value" : "low_quantity",
"results" : {
"max_sale" : 13.25,
"med_sale" : 10.20
}
}
]