Streaming Expressions

Streaming expressions exposes the capabilities of SolrCloud as composable functions. These functions provide a system for searching, transforming, analyzing, and visualizing data stored in SolrCloud collections.

At a high level there are four main capabilities that will be explored in the documentation:

  • Searching, sampling and aggregating results from Solr.

  • Transforming result sets after they are retrieved from Solr.

  • Analyzing and modeling result sets using probability and statistics and machine learning libraries.

  • Visualizing result sets, aggregations and statistical models of the data.

Stream Language Basics

Streaming expressions are comprised of streaming functions which work with a Solr collection. They emit a stream of tuples (key/value Maps).

Some of the provided streaming functions are designed to work with entire result sets rather than the top N results like normal search. This is supported by the /export handler.

Some streaming functions act as stream sources to originate the stream flow. Other streaming functions act as stream decorators to wrap other stream functions and perform operations on the stream of tuples. Many streams functions can be parallelized across a worker collection. This can be particularly powerful for relational algebra functions.

Streaming Requests and Responses

Solr has a /stream request handler that takes streaming expression requests and returns the tuples as a JSON stream. This request handler is implicitly defined, meaning there is nothing that has to be defined in solrconfig.xml - see Implicit Request Handlers.

The /stream request handler takes one parameter, expr, which is used to specify the streaming expression. For example, this curl command encodes and POSTs a simple search() expression to the /stream handler:

curl --data-urlencode 'expr=search(enron_emails,
                                   q="from:1800flowers*",
                                   fl="from, to",
                                   sort="from asc")' http://localhost:8983/solr/enron_emails/stream

Details of the parameters for each function are included below.

For the above example the /stream handler responded with the following JSON response:

{"result-set":{"docs":[
   {"from":"1800flowers.133139412@s2u2.com","to":"lcampbel@enron.com"},
   {"from":"1800flowers.93690065@s2u2.com","to":"jtholt@ect.enron.com"},
   {"from":"1800flowers.96749439@s2u2.com","to":"alewis@enron.com"},
   {"from":"1800flowers@1800flowers.flonetwork.com","to":"lcampbel@enron.com"},
   {"from":"1800flowers@1800flowers.flonetwork.com","to":"lcampbel@enron.com"},
   {"from":"1800flowers@1800flowers.flonetwork.com","to":"lcampbel@enron.com"},
   {"from":"1800flowers@1800flowers.flonetwork.com","to":"lcampbel@enron.com"},
   {"from":"1800flowers@1800flowers.flonetwork.com","to":"lcampbel@enron.com"},
   {"from":"1800flowers@shop2u.com","to":"ebass@enron.com"},
   {"from":"1800flowers@shop2u.com","to":"lcampbel@enron.com"},
   {"from":"1800flowers@shop2u.com","to":"lcampbel@enron.com"},
   {"from":"1800flowers@shop2u.com","to":"lcampbel@enron.com"},
   {"from":"1800flowers@shop2u.com","to":"ebass@enron.com"},
   {"from":"1800flowers@shop2u.com","to":"ebass@enron.com"},
   {"EOF":true,"RESPONSE_TIME":33}]}
}

Note the last tuple in the above example stream is {"EOF":true,"RESPONSE_TIME":33}. The EOF indicates the end of the stream. To process the JSON response, you’ll need to use a streaming JSON implementation because streaming expressions are designed to return the entire result set which may have millions of records. In your JSON client you’ll need to iterate each doc (tuple) and check for the EOF tuple to determine the end of stream.

Configuration

Timeouts for Streaming Expressions can be configured with the socketTimeout and connTimeout startup parameters.

Elements of the Language

Stream Sources

Stream sources originate streams. There are rich set of searching, sampling and aggregation stream sources to choose from.

A full reference to all available source expressions is available in Stream Source Reference.

Stream Decorators

Stream decorators wrap stream sources and other stream decorators to transform a stream.

A full reference to all available decorator expressions is available in Stream Decorator Reference.

Math Expressions

Math expressions are a vector and matrix math library that can be combined with streaming expressions to perform analysis and build mathematical models of the result sets. From a language standpoint math expressions are a sub-language of streaming expressions that don’t return streams of tuples. Instead, they operate on and return numbers, vectors, matrices and mathematical models. The documentation will show how to combine streaming expressions and math expressions.

The math expressions user guide is available in Streaming Expressions and Math Expressions.

From a language standpoint math expressions are referred to as stream evaluators.

A full reference to all available evaluator expressions is available in Stream Evaluator Reference.

Visualization

Visualization of both streaming expressions and math expressions is done using Apache Zeppelin and the Zeppelin-Solr Interpreter.

Visualizing Streaming expressions and setting up of Apache Zeppelin is documented in Zeppelin-Solr Interpreter.

The Streaming Expressions and Math Expressions has in depth coverage of visualization techniques.

Stream Screen

  • Stream Screen: Submit streaming expressions and see results and parsing explanations.