Learning To Rank

With the Learning To Rank (or LTR for short) module you can configure and run machine learned ranking models in Solr.

The module also supports feature logging inside Solr. The only thing you need to do outside Solr is train your own ranking model.

Learning to Rank Concepts

Re-Ranking

Re-Ranking allows you to run a simple query for matching documents and then re-rank the top N documents using the scores from a different, more complex query. This page describes the use of LTR complex queries, information on other rank queries included in the Solr distribution can be found in Query Re-Ranking.

Learning To Rank Models

In information retrieval systems, Learning to Rank is used to re-rank the top N retrieved documents using trained machine learning models. The hope is that such sophisticated models can make more nuanced ranking decisions than standard ranking functions like TF-IDF or BM25.

Ranking Model

A ranking model computes the scores used to rerank documents. Irrespective of any particular algorithm or implementation, a ranking model’s computation can use three types of inputs:

  • parameters that represent the scoring algorithm

  • features that represent the document being scored

  • features that represent the query for which the document is being scored

Interleaving

Interleaving is an approach to Online Search Quality evaluation that allows to compare two models interleaving their results in the final ranked list returned to the user.

  • currently only the Team Draft Interleaving algorithm is supported (and its implementation assumes all results are from the same shard)

Feature

A feature is a value, a number, that represents some quantity or quality of the document being scored or of the query for which documents are being scored. For example documents often have a 'recency' quality and 'number of past purchases' might be a quantity that is passed to Solr as part of the search query.

Normalizer

Some ranking models expect features on a particular scale. A normalizer can be used to translate arbitrary feature values into normalized values e.g., on a 0..1 or 0..100 scale.

Training Models

Feature Engineering

The LTR module includes several feature classes as well as support for custom features. Each feature class’s javadocs contain an example to illustrate use of that class. The process of feature engineering itself is then entirely up to your domain expertise and creativity.

Feature Class Example parameters External Feature Information

field length

FieldLengthFeature

{"field":"title"}

not (yet) supported

field value

FieldValueFeature

{"field":"hits"}

not (yet) supported

original score

OriginalScoreFeature

{}

not applicable

solr query

SolrFeature

{"q":"{!func} recip(ms(NOW,last_modified) ,3.16e-11,1,1)"}

supported

solr filter query

SolrFeature

{"fq":["{!terms f=category}book"]}

supported

solr query + filter query

SolrFeature

{"q":"{!func} recip(ms(NOW,last_modified), 3.16e-11,1,1)", "fq":["{!terms f=category}book"]}

supported

value

ValueFeature

{"value":"${userFromMobile}","required":true}

supported

(custom)

(custom class extending Feature)

Normalizer Class Example parameters

Identity

IdentityNormalizer

{}

MinMax

MinMaxNormalizer

{"min":"0", "max":"50" }

Standard

StandardNormalizer

{"avg":"42","std":"6"}

(custom)

(custom class extending Normalizer)

Feature Logging

The ltr module includes a [features] transformer to support the calculation and return of feature values for feature extraction purposes including and especially when you do not yet have an actual reranking model.

Feature Selection and Model Training

Feature selection and model training take place offline and outside Solr. The ltr module supports two generalized forms of models as well as custom models. Each model class’s javadocs contain an example to illustrate configuration of that class. In the form of JSON files your trained model or models (e.g., different models for different customer geographies) can then be directly uploaded into Solr using provided REST APIs.

General form Class Specific examples

Linear

LinearModel

RankSVM, Pranking

Multiple Additive Trees

MultipleAdditiveTreesModel

LambdaMART, Gradient Boosted Regression Trees (GBRT)

Neural Network

NeuralNetworkModel

RankNet

(wrapper)

DefaultWrapperModel

(not applicable)

(custom)

(custom class extending AdapterModel)

(not applicable)

(custom)

(custom class extending LTRScoringModel)

(not applicable)

Module

This is provided via the ltr Solr Module that needs to be enabled before use.

Installation of LTR

The ltr module requires the modules/ltr/lib/solr-ltr-*.jar JARs.

LTR Configuration

Learning-To-Rank is a module and therefore its plugins must be configured in solrconfig.xml.

Minimum Requirements

  • Enable the ltr module to make the LTR classes available on Solr’s classpath. See Solr Module for more details.

  • Declaration of the ltr query parser.

    <queryParser name="ltr" class="org.apache.solr.ltr.search.LTRQParserPlugin"/>
  • Configuration of the feature values cache.

    <cache name="QUERY_DOC_FV"
           class="solr.search.CaffeineCache"
           size="4096"
           initialSize="2048"
           autowarmCount="4096"
           regenerator="solr.search.NoOpRegenerator" />
  • Declaration of the [features] transformer.

    <transformer name="features" class="org.apache.solr.ltr.response.transform.LTRFeatureLoggerTransformerFactory">
      <str name="fvCacheName">QUERY_DOC_FV</str>
    </transformer>
  • Declaration of the [interleaving] transformer.

    <transformer name="interleaving" class="org.apache.solr.ltr.response.transform.LTRInterleavingTransformerFactory"/>

LTR Lifecycle

Feature Stores

It is recommended that you organise all your features into stores which are akin to namespaces:

  • Features within a store must be named uniquely.

  • Across stores identical or similar features can share the same name.

  • If no store name is specified then the default _DEFAULT_ feature store will be used.

To discover the names of all your feature stores:

http://localhost:8983/solr/techproducts/schema/feature-store

To inspect the content of the commonFeatureStore feature store:

http://localhost:8983/solr/techproducts/schema/feature-store/commonFeatureStore

Models

  • A model uses features from exactly one feature store.

  • If no store is specified then the default _DEFAULT_ feature store will be used.

  • A model need not use all the features defined in a feature store.

  • Multiple models can use the same feature store.

To log features for currentFeatureStore 's features:

http://localhost:8983/solr/techproducts/query?q=test&fl=id,score,[features store=currentFeatureStore]

To log features for nextFeatureStore features whilst reranking with currentModel based on currentFeatureStore:

http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=currentModel reRankDocs=100}&fl=id,score,[features store=nextFeatureStore]

To view all models:

http://localhost:8983/solr/techproducts/schema/model-store

To delete the currentModel model:

curl -XDELETE 'http://localhost:8983/solr/techproducts/schema/model-store/currentModel'
A feature store may be deleted only when there are no models using it.

To delete the currentFeatureStore feature store:

curl -XDELETE 'http://localhost:8983/solr/techproducts/schema/feature-store/currentFeatureStore'

Using Large Models

With SolrCloud, large models may fail to upload due to the limitation of ZooKeeper’s buffer. In this case, DefaultWrapperModel may help you to separate the model definition from uploaded file.

Assuming that you consider to use a large model placed at /path/to/models/myModel.json through DefaultWrapperModel.

{
  "store" : "largeModelsFeatureStore",
  "name" : "myModel",
  "class" : "...",
  "features" : [
    "..."
  ],
  "params" : {
    "...": "..."
  }
}

First, add the directory to Solr’s resource paths with a <lib/> directive:

  <lib dir="/path/to" regex="models" />

Then, configure DefaultWrapperModel to wrap myModel.json:

{
  "store" : "largeModelsFeatureStore",
  "name" : "myWrapperModel",
  "class" : "org.apache.solr.ltr.model.DefaultWrapperModel",
  "params" : {
    "resource" : "myModel.json"
  }
}

myModel.json will be loaded during the initialization and be able to use by specifying model=myWrapperModel.

No "features" are configured in myWrapperModel because the features of the wrapped model (myModel) will be used; also note that the "store" configured for the wrapper model must match that of the wrapped model i.e., in this example the feature store called largeModelsFeatureStore is used.
<lib dir="/path/to/models" regex=".*\.json" /> doesn’t work as expected in this case, because SolrResourceLoader considers given resources as JAR if <lib /> indicates files.

As an alternative to the above-described DefaultWrapperModel, it is possible to increase ZooKeeper’s file size limit.

Applying Changes

The feature store and the model store are both Managed Resources. Changes made to managed resources are not applied to the active Solr components until the Solr collection (or Solr core in single server mode) is reloaded.

Quick Start with LTR

The "techproducts" example included with Solr is pre-configured to load the plugins required for learning-to-rank from the ltr Solr Module, but they are disabled by default.

To enable the plugins, please specify the solr.ltr.enabled JVM System Property when running the techproducts example:

bin/solr start -e techproducts -Dsolr.modules=ltr -Dsolr.ltr.enabled=true

Uploading Features

To upload features in a /path/myFeatures.json file, please run:

curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' --data-binary "@/path/myFeatures.json" -H 'Content-type:application/json'

To view the features you just uploaded please open the following URL in a browser:

http://localhost:8983/solr/techproducts/schema/feature-store/_DEFAULT_
Example: /path/myFeatures.json
[
  {
    "name" : "documentRecency",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : {
      "q" : "{!func}recip( ms(NOW,last_modified), 3.16e-11, 1, 1)"
    }
  },
  {
    "name" : "isBook",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : {
      "fq": ["{!terms f=cat}book"]
    }
  },
  {
    "name" : "originalScore",
    "class" : "org.apache.solr.ltr.feature.OriginalScoreFeature",
    "params" : {}
  }
]

Logging Features

To log features as part of a query, add [features] to the fl parameter, for example:

http://localhost:8983/solr/techproducts/query?q=test&fl=id,score,[features]

The output will include feature values as a comma-separated list, resembling the output shown here:

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"test",
      "fl":"id,score,[features]"}},
  "response":{"numFound":2,"start":0,"maxScore":1.959392,"docs":[
      {
        "id":"GB18030TEST",
        "score":1.959392,
        "[features]":"documentRecency=0.020893794,isBook=0.0,originalScore=1.959392"},
      {
        "id":"UTF8TEST",
        "score":1.5513437,
        "[features]":"documentRecency=0.020893794,isBook=0.0,originalScore=1.5513437"}]
  }}

Feature Logging Parameters

The feature logger transformer accepts the parameters described below. Examples on how to use them can be found in the LTR Examples section below.

store

No Re-Ranking

Optional

Default: _DEFAULT_

Re-Ranking

Optional

Default: model feature store

This parameter specifies the feature store to use for logging features.

In a reranking query, the default feature store used is the model feature store (e.g. [features]).

logAll

No Re-Ranking

Default: true

Re-Ranking

Logger and Model have same feature store

Default: false

Re-Ranking

Logger and Model have different feature store

Default: true

This parameter specifies the features to log.

If set to true all the features from the feature store are printed.

If set to false only the features used by the model are printed.

When no re-ranking query is passed, only logAll = 'true' is supported. Passing false will cause a Solr exception.
In a logging scenario where a re-ranking query is passed, if the logger store is different from the model store, only logAll = 'true' is supported. Passing false will cause a Solr exception.
format

Optional

Default: dense

This parameter specifies the format to use for logging features. The supported values are: dense and sparse.

You can change the default behavior to be sparse, putting <str name="defaultFormat">sparse</str> in the feature logger transformer declaration in solrconfig.xml as follows:

<transformer name="features" class="org.apache.solr.ltr.response.transform.LTRFeatureLoggerTransformerFactory">
  <str name="fvCacheName">QUERY_DOC_FV</str>
  <str name="defaultFormat">sparse</str>
  <str name="csvKeyValueDelimiter">:</str>
  <str name="csvFeatureSeparator"> </str>
</transformer>

Uploading a Model

To upload the model in a /path/myModel.json file, please run:

curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' --data-binary "@/path/myModel.json" -H 'Content-type:application/json'

To view the model you just uploaded please open the following URL in a browser:

http://localhost:8983/solr/techproducts/schema/model-store
Example: /path/myModel.json
{
  "class" : "org.apache.solr.ltr.model.LinearModel",
  "name" : "myModel",
  "features" : [
    { "name" : "documentRecency" },
    { "name" : "isBook" },
    { "name" : "originalScore" }
  ],
  "params" : {
    "weights" : {
      "documentRecency" : 1.0,
      "isBook" : 0.1,
      "originalScore" : 0.5
    }
  }
}

Running a Rerank Query

To rerank the results of a query, add the rq parameter to your search, for example:

http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModel reRankDocs=100}&fl=id,score

The addition of the rq parameter will not change the output of the search.

To obtain the feature values computed during reranking, add [features] to the fl parameter, for example:

http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModel reRankDocs=100}&fl=id,score,[features]

The output will include feature values as a comma-separated list, resembling the output shown here:

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"test",
      "fl":"id,score,[features]",
      "rq":"{!ltr model=myModel reRankDocs=100}"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0005897,"docs":[
      {
        "id":"GB18030TEST",
        "score":1.0005897,
        "[features]":"documentRecency=0.020893792,isBook=0.0,originalScore=1.959392"},
      {
        "id":"UTF8TEST",
        "score":0.79656565,
        "[features]":"documentRecency=0.020893792,isBook=0.0,originalScore=1.5513437"}]
  }}

Running a Rerank Query and Query Limits

Apache Solr allows to define Query Limits to interrupt particularly expensive queries (Time Allowed, Cpu Allowed).

If a query limit is exceeded while reranking, the rescoring is aborted and fully reverted.

The original ranked list is returned and the response marked with the responseHeader 'partialResults'. The details of what limit was exceeded is returned in the responseHeader 'partialResultsDetails'.

See Partial Results Parameter for more details on how to handle partial results.

Running a Rerank Query Interleaving Two Models

To rerank the results of a query, interleaving two models (myModelA, myModelB) add the rq parameter to your search, passing two models in input, for example:

http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA model=myModelB reRankDocs=100}&fl=id,score

To obtain the model that interleaving picked for a search result, computed during reranking, add [interleaving] to the fl parameter, for example:

http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA model=myModelB reRankDocs=100}&fl=id,score,[interleaving]

The output will include the model picked for each search result, resembling the output shown here:

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"test",
      "fl":"id,score,[interleaving]",
      "rq":"{!ltr model=myModelA model=myModelB reRankDocs=100}"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0005897,"docs":[
      {
        "id":"GB18030TEST",
        "score":1.0005897,
        "[interleaving]":"myModelB"},
      {
        "id":"UTF8TEST",
        "score":0.79656565,
        "[interleaving]":"myModelA"}]
  }}

Running a Rerank Query Interleaving a Model with the Original Ranking

When approaching Search Quality Evaluation with interleaving it may be useful to compare a model with the original ranking. To rerank the results of a query, interleaving a model with the original ranking, add the rq parameter to your search, passing the special inbuilt OriginalRanking model identifier as one model and your comparison model as the other model, for example:

http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=_OriginalRanking_ model=myModel reRankDocs=100}&fl=id,score

The addition of the rq parameter will not change the output of the search.

To obtain the model that interleaving picked for a search result, computed during reranking, add [interleaving] to the fl parameter, for example:

http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=_OriginalRanking_ model=myModel reRankDocs=100}&fl=id,score,[interleaving]

The output will include the model picked for each search result, resembling the output shown here:

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"test",
      "fl":"id,score,[features]",
      "rq":"{!ltr model=_OriginalRanking_ model=myModel reRankDocs=100}"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0005897,"docs":[
      {
        "id":"GB18030TEST",
        "score":1.0005897,
        "[interleaving]":"_OriginalRanking_"},
      {
        "id":"UTF8TEST",
        "score":0.79656565,
        "[interleaving]":"myModel"}]
  }}

Running a Rerank Query with Interleaving Passing a Specific Algorithm

To rerank the results of a query, interleaving two models using a specific algorithm, add the interleavingAlgorithm local parameter to the ltr query parser, for example:

http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA model=myModelB reRankDocs=100 interleavingAlgorithm=TeamDraft}&fl=id,score

Currently, the only (and default) algorithm supported is 'TeamDraft'.

External Feature Information

The ValueFeature and SolrFeature classes support the use of external feature information, efi for short.

Uploading Features

To upload features in a /path/myEfiFeatures.json file, please run:

curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' --data-binary "@/path/myEfiFeatures.json" -H 'Content-type:application/json'

To view the features you just uploaded please open the following URL in a browser:

http://localhost:8983/solr/techproducts/schema/feature-store/myEfiFeatureStore
Example: /path/myEfiFeatures.json
[
  {
    "store" : "myEfiFeatureStore",
    "name" : "isPreferredManufacturer",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : { "fq" : [ "{!field f=manu}${preferredManufacturer}" ] }
  },
  {
    "store" : "myEfiFeatureStore",
    "name" : "userAnswerValue",
    "class" : "org.apache.solr.ltr.feature.ValueFeature",
    "params" : { "value" : "${answer:42}" }
  },
  {
    "store" : "myEfiFeatureStore",
    "name" : "userFromMobileValue",
    "class" : "org.apache.solr.ltr.feature.ValueFeature",
    "params" : { "value" : "${fromMobile}", "required" : true }
  },
  {
    "store" : "myEfiFeatureStore",
    "name" : "userTextCat",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : { "q" : "{!field f=cat}${text}" }
  }
]

Logging Features

To log myEfiFeatureStore features as part of a query, add efi.* parameters to the [features] part of the fl parameter, for example:

http://localhost:8983/solr/techproducts/query?q=test&fl=id,cat,manu,score,[features store=myEfiFeatureStore efi.text=test efi.preferredManufacturer=Apache efi.fromMobile=1]
http://localhost:8983/solr/techproducts/query?q=test&fl=id,cat,manu,score,[features store=myEfiFeatureStore efi.text=test efi.preferredManufacturer=Apache efi.fromMobile=0 efi.answer=13]

Uploading a Model

To upload the model in a /path/myEfiModel.json file, please run:

curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' --data-binary "@/path/myEfiModel.json" -H 'Content-type:application/json'

To view the model you just uploaded please open the following URL in a browser:

http://localhost:8983/solr/techproducts/schema/model-store
Example: /path/myEfiModel.json
{
  "store" : "myEfiFeatureStore",
  "name" : "myEfiModel",
  "class" : "org.apache.solr.ltr.model.LinearModel",
  "features" : [
    { "name" : "isPreferredManufacturer" },
    { "name" : "userAnswerValue" },
    { "name" : "userFromMobileValue" },
    { "name" : "userTextCat" }
  ],
  "params" : {
    "weights" : {
      "isPreferredManufacturer" : 0.2,
      "userAnswerValue" : 1.0,
      "userFromMobileValue" : 1.0,
      "userTextCat" : 0.1
    }
  }
}

Running a Rerank Query

To obtain the feature values computed during reranking, add [features] to the fl parameter and efi.* parameters to the rq parameter, for example:

http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myEfiModel efi.text=test efi.preferredManufacturer=Apache efi.fromMobile=1}&fl=id,cat,manu,score,[features]
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myEfiModel efi.text=test efi.preferredManufacturer=Apache efi.fromMobile=0 efi.answer=13}&fl=id,cat,manu,score,[features]

Notice the absence of efi.* parameters in the [features] part of the fl parameter.

Logging Features While Reranking

To log features for myEfiFeatureStore features while still reranking with myModel:

http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModel}&fl=id,cat,manu,score,[features store=myEfiFeatureStore efi.text=test efi.preferredManufacturer=Apache efi.fromMobile=1]

Notice the absence of efi.* parameters in the rq parameter (because myModel does not use efi feature) and the presence of efi.* parameters in the [features] part of the fl parameter (because myEfiFeatureStore contains efi features).

Training Example

Example training data and a demo train_and_upload_demo_model.py script can be found in the solr/modules/ltr/example folder in the Apache Solr Git repository (mirrored on github.com). This example folder is not shipped in the Solr binary release.

Advanced Options

LTRThreadModule

A thread module can be configured for the query parser and/or the transformer to parallelize the creation of feature weights. For details, please refer to the LTRThreadModule javadocs.

Models handling features' null values

This feature is available only for MultipleAdditiveTreesModel.

In some scenarios a null value for a feature has a different meaning than a zero value. There are models that are trained to distinguish the two (e.g. https://xgboost.readthedocs.io/en/stable/faq.html#how-to-deal-with-missing-values), in Solr an additional missing branch parameter has been introduced to support this feature.

This defines the branch to follow when the corresponding feature value is null. With the default configuration a null and a zero value have the same meaning.

To handle null values, the myFeatures.json file needs to be modified. A defaultValue parameter with a NaN value needs to be added to each feature that can assume a null value.

Example: /path/myFeatures.json
[
  {
    "name": "matchedTitle",
    "class": "org.apache.solr.ltr.feature.SolrFeature",
    "params": {
      "q": "{!terms f=title}${user_query}"
    }
  },
  {
    "name": "productReviewScore",
    "class": "org.apache.solr.ltr.feature.FieldValueFeature",
    "params": {
      "field": "product_review_score",
      "defaultValue": "NaN"
    }
  }
]

Also, the model configuration needs two additional parameter:

  • isNullSameAsZero needs to be defined in the model params and set to false;

  • the missing parameter needs to be added to each branch where the corresponding feature supports null values. This can assume one value between left and right.

Example: /path/myModel.json
{
  "class":"org.apache.solr.ltr.model.MultipleAdditiveTreesModel",
  "name":"multipleadditivetreesmodel",
  "features":[
    { "name": "matchedTitle"},
    { "name": "productReviewScore"}
  ],
  "params":{
    "isNullSameAsZero": "false",
    "trees": [
      {
        "weight" : "1f",
        "root": {
          "feature": "matchedTitle",
          "threshold": "0.5f",
          "left" : {
            "value" : "-100"
          },
          "right": {
            "feature" : "productReviewScore",
            "threshold": "0f",
            "missing": "left",
            "left" : {
              "value" : "50"
            },
            "right" : {
              "value" : "65"
            }
          }
        }
      }
    ]
  }
}

When isNullSameAsZero is false for your model, the feature vector changes.

  • dense format: all features values are shown, also the default values which can be zero or null values.

  • sparse format: only non default values are shown.

e.g.

given the features defined before; if their values are matchedTitle=0 and productReviewScore=0, the sparse format will return productReviewScore:0 (0 is the default value of matchedTitle=0 and therefore it is not returned, 0 is not the default value of productReviewScore=0 and therefore it is returned).

Implementation and Contributions

How does Solr Learning-To-Rank work under the hood?

Please refer to the ltr javadocs for an implementation overview.

How could I write additional models and/or features?

Contributions for further models, features, normalizers and interleaving algorithms are welcome. Related links:

LTR Examples

One Feature Store, Multiple Ranking Models

  • leftModel and rightModel both use features from commonFeatureStore and the only different between the two models is the weights attached to each feature.

  • Conventions used:

    • commonFeatureStore.json file contains features for the commonFeatureStore feature store

    • leftModel.json file contains model named leftModel

    • rightModel.json file contains model named rightModel

    • The model’s features and weights are sorted alphabetically by name, this makes it easy to see what the commonalities and differences between the two models are.

    • The stores features are sorted alphabetically by name, this makes it easy to lookup features used in the models

Example: /path/commonFeatureStore.json
[
  {
    "store" : "commonFeatureStore",
    "name" : "documentRecency",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : {
      "q" : "{!func}recip( ms(NOW,last_modified), 3.16e-11, 1, 1)"
    }
  },
  {
    "store" : "commonFeatureStore",
    "name" : "isBook",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : {
      "fq": [ "{!terms f=category}book" ]
    }
  },
  {
    "store" : "commonFeatureStore",
    "name" : "originalScore",
    "class" : "org.apache.solr.ltr.feature.OriginalScoreFeature",
    "params" : {}
  }
]
Example: /path/leftModel.json
{
  "store" : "commonFeatureStore",
  "name" : "leftModel",
  "class" : "org.apache.solr.ltr.model.LinearModel",
  "features" : [
    { "name" : "documentRecency" },
    { "name" : "isBook" },
    { "name" : "originalScore" }
  ],
  "params" : {
    "weights" : {
      "documentRecency" : 0.1,
      "isBook" : 1.0,
      "originalScore" : 0.5
    }
  }
}
Example: /path/rightModel.json
{
  "store" : "commonFeatureStore",
  "name" : "rightModel",
  "class" : "org.apache.solr.ltr.model.LinearModel",
  "features" : [
    { "name" : "documentRecency" },
    { "name" : "isBook" },
    { "name" : "originalScore" }
  ],
  "params" : {
    "weights" : {
      "documentRecency" : 1.0,
      "isBook" : 0.1,
      "originalScore" : 0.5
    }
  }
}

Model Evolution

  • linearModel201701 uses features from featureStore201701

  • treesModel201702 uses features from featureStore201702

  • linearModel201701 and treesModel201702 and their feature stores can co-exist whilst both are needed.

  • When linearModel201701 has been deleted then featureStore201701 can also be deleted.

  • Conventions used:

    • <store>.json file contains features for the <store> feature store

    • <model>.json file contains model name <model>

    • a 'generation' id (e.g., YYYYMM year-month) is part of the feature store and model names

    • The model’s features and weights are sorted alphabetically by name, this makes it easy to see what the commonalities and differences between the two models are.

    • The stores features are sorted alphabetically by name, this makes it easy to see what the commonalities and differences between the two feature stores are.

Example: /path/featureStore201701.json
[
  {
    "store" : "featureStore201701",
    "name" : "documentRecency",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : {
      "q" : "{!func}recip( ms(NOW,last_modified), 3.16e-11, 1, 1)"
    }
  },
  {
    "store" : "featureStore201701",
    "name" : "isBook",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : {
      "fq": [ "{!terms f=category}book" ]
    }
  },
  {
    "store" : "featureStore201701",
    "name" : "originalScore",
    "class" : "org.apache.solr.ltr.feature.OriginalScoreFeature",
    "params" : {}
  }
]
Example: /path/linearModel201701.json
{
  "store" : "featureStore201701",
  "name" : "linearModel201701",
  "class" : "org.apache.solr.ltr.model.LinearModel",
  "features" : [
    { "name" : "documentRecency" },
    { "name" : "isBook" },
    { "name" : "originalScore" }
  ],
  "params" : {
    "weights" : {
      "documentRecency" : 0.1,
      "isBook" : 1.0,
      "originalScore" : 0.5
    }
  }
}
Example: /path/featureStore201702.json
[
  {
    "store" : "featureStore201702",
    "name" : "isBook",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : {
      "fq": [ "{!terms f=category}book" ]
    }
  },
  {
    "store" : "featureStore201702",
    "name" : "originalScore",
    "class" : "org.apache.solr.ltr.feature.OriginalScoreFeature",
    "params" : {}
  }
]
Example: /path/treesModel201702.json
{
  "store" : "featureStore201702",
  "name" : "treesModel201702",
  "class" : "org.apache.solr.ltr.model.MultipleAdditiveTreesModel",
  "features" : [
    { "name" : "isBook" },
    { "name" : "originalScore" }
  ],
  "params" : {
    "trees" : [
      {
        "weight" : "1",
        "root" : {
          "feature" : "isBook",
          "threshold" : "0.5",
          "left" : { "value" : "-100" },
          "right" : {
            "feature" : "originalScore",
            "threshold" : "10.0",
            "left" : { "value" : "50" },
            "right" : { "value" : "75" }
          }
        }
      },
      {
        "weight" : "2",
        "root" : {
          "value" : "-10"
        }
      }
    ]
  }
}

Features Logging

logAll parameter

Suppose to have a complete feature store like:

Example: /path/completeFeaturesStore.json
[
  {
    "store" : "completeFeaturesStore",
    "name" : "documentRecency",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : {
      "q" : "{!func}recip( ms(NOW,last_modified), 3.16e-11, 1, 1)"
    }
  },
  {
    "store" : "completeFeaturesStore",
    "name" : "isBook",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : {
      "fq": ["{!terms f=cat}book"]
    }
  },
  {
    "store" : "completeFeaturesStore",
    "name" : "originalScore",
    "class" : "org.apache.solr.ltr.feature.OriginalScoreFeature",
    "params" : {}
  }
]

And suppose to have a simple linear model that uses just two of the completeFeaturesStore 's features:

Example: /path/linearModel.json
{
  "store" : "completeFeaturesStore",
  "name" : "linearModel",
  "class" : "org.apache.solr.ltr.model.LinearModel",
  "features" : [
    { "name" : "isBook" },
    { "name" : "originalScore" }
  ],
  "params" : {
    "weights" : {
      "isBook" : 1.0,
      "originalScore" : 0.5
    }
  }
}

Making a logging + re-ranking query without defining the store and logAll parameters will print only the model features (default: store=model store and logAll=false).

The query:

http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=linearModel reRankDocs=100}&fl=id,score,[features]

The output:

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"test",
      "fl":"id,score,[features]",
      "rq":"{!ltr model=linearModel reRankDocs=100}"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0005897,"docs":[
      {
        "id":"GB18030TEST",
        "score":1.0005897,
        "[features]":"isBook=0.0,originalScore=1.959392"},
      {
        "id":"UTF8TEST",
        "score":0.79656565,
        "[features]":"isBook=0.0,originalScore=1.5513437"}]
  }}

Making a logging + re-ranking query without defining the store parameter and setting logAll = true will print all the features from the model store.

The query:

http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=linearModel reRankDocs=100}&fl=id,score,[features logAll=true]

The output:

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"test",
      "fl":"id,score,[features logAll=true]",
      "rq":"{!ltr model=linearModel reRankDocs=100}"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0005897,"docs":[
      {
        "id":"GB18030TEST",
        "score":1.0005897,
        "[features]":"documentRecency=0.020893792,isBook=0.0,originalScore=1.959392"},
      {
        "id":"UTF8TEST",
        "score":0.79656565,
        "[features]":"documentRecency=0.020893792,isBook=0.0,originalScore=1.5513437"}]
  }}

Suppose to have a different feature store like:

Example: /path/differentFeaturesStore.json
[
  {
    "store": "differentFeaturesStore",
    "name": "valueFeature1",
    "class": "org.apache.solr.ltr.feature.FieldValueFeature",
    "params": {
        "field": "field1"
    }
  },
  {
    "store": "differentFeaturesStore",
    "name": "valueFeature2",
    "class": "org.apache.solr.ltr.feature.FieldValueFeature",
    "params": {
        "field": "field2"
    }
  }
]

Making a logging + re-ranking query defining a store parameter different from the model store without defining the logAll parameter will print all the features from the selected feature store (default: logAll=true).

The query:

http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=linearModel reRankDocs=100}&fl=id,score,[features store=differentFeaturesStore]

The output:

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"test",
      "fl":"id,score,[features store=differentFeaturesStore]",
      "rq":"{!ltr model=linearModel reRankDocs=100}"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0005897,"docs":[
      {
        "id":"GB18030TEST",
        "score":1.0005897,
        "[features]":"valueFeature1=0.1,valueFeature2=2.0"},
      {
        "id":"UTF8TEST",
        "score":0.79656565,
        "[features]":"valueFeature1=1.3,valueFeature2=4.0"}]
  }}

format parameter

Suppose to have a feature store like:

Example: /path/myFeaturesStore.json
[
  {
    "store": "myFeaturesStore",
    "name": "featureA",
    "class": "org.apache.solr.ltr.feature.FieldValueFeature",
    "params": {
        "field": "field1"
    }
  },
  {
    "store": "myFeaturesStore",
    "name": "featureB",
    "class": "org.apache.solr.ltr.feature.FieldValueFeature",
    "params": {
        "field": "field2"
    }
  },
  {
    "store": "myFeaturesStore",
    "name": "featureC",
    "class": "org.apache.solr.ltr.feature.FieldValueFeature",
    "params": {
        "field": "field3"
    }
  }
]

To return dense CSV values such as: featureA=0.1,featureB=0.2,featureC=0.0, pass the format=dense parameter to the feature logger transformer:

http://localhost:8983/solr/techproducts/query?q=test&fl=id,score,[features store=myFeaturesStore format=dense]

The output:

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"test",
      "fl":"id,score,[features store=myFeaturesStore format=dense]"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0005897,"docs":[
      {
        "id":"GB18030TEST",
        "score":1.0005897,
        "[features]":"featureA=0.1,featureB=0.2,featureC=0.0"},
      {
        "id":"UTF8TEST",
        "score":0.79656565,
        "[features]":"featureA=1.3,featureB=0.0,featureC=2.1"}]
  }}

To return sparse CSV values such as: featureA=0.1,featureB=0.2, pass the format=sparse parameter to the feature logger transformer:

http://localhost:8983/solr/techproducts/query?q=test&fl=id,score,[features store=myFeaturesStore format=sparse]

The output:

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"test",
      "fl":"id,score,[features store=myFeaturesStore format=sparse]"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0005897,"docs":[
      {
        "id":"GB18030TEST",
        "score":1.0005897,
        "[features]":"featureA=0.1,featureB=0.2"},
      {
        "id":"UTF8TEST",
        "score":0.79656565,
        "[features]":"featureA=1.3,featureC=2.1"}]
  }}