Suggester

The SuggestComponent in Solr provides users with automatic suggestions for query terms.

You can use this to implement a powerful auto-suggest feature in your search application.

Although it is possible to use the Spell Checking functionality to power autosuggest behavior, Solr has a dedicated SuggestComponent designed for this functionality.

This approach utilizes Lucene’s Suggester implementation and supports all of the lookup implementations available in Lucene.

The main features of this Suggester are:

  • Lookup implementation pluggability

  • Term dictionary pluggability, giving you the flexibility to choose the dictionary implementation

  • Distributed support

The solrconfig.xml found in Solr’s "techproducts" example has a Suggester implementation configured already. For more on search components, see the section Request Handlers and Search Components.

The "techproducts" example solrconfig.xml has a suggest search component and a /suggest request handler already configured. You can use that as the basis for your configuration, or create it from scratch, as detailed below.

Adding the Suggest Search Component

The first step is to add a search component to solrconfig.xml and tell it to use the SuggestComponent. Here is some sample code that could be used.

<searchComponent name="suggest" class="solr.SuggestComponent">
  <lst name="suggester">
    <str name="name">mySuggester</str>
    <str name="lookupImpl">FuzzyLookupFactory</str>
    <str name="dictionaryImpl">DocumentDictionaryFactory</str>
    <str name="field">cat</str>
    <str name="weightField">price</str>
    <str name="suggestAnalyzerFieldType">string</str>
    <str name="buildOnStartup">false</str>
  </lst>
</searchComponent>

Suggester Search Component Parameters

The Suggester search component takes several configuration parameters.

The choice of the lookup implementation (lookupImpl, how terms are found in the suggestion dictionary) and the dictionary implementation (dictionaryImpl, how terms are stored in the suggestion dictionary) will dictate some of the parameters required.

Below are the main parameters that can be used no matter what lookup or dictionary implementation is used. In the following sections additional parameters are provided for each implementation.

searchComponent name

Required

Default: none

Arbitrary name for the overall search component definition.

searchComponent class

Required

Default: none

The Suggester class. This should be defined as solr.SuggestComponent.

name

Required

Default: none

A symbolic name for this suggester. You can refer to this name in the URL parameters and in the SearchHandler configuration. It is possible to have multiples of these in one solrconfig.xml file. In the example configuration above, this is the name referenced in the line: <str name="name">mySuggester</str>.

lookupImpl

Optional

Default: JaspellLookupFactory

Lookup implementation. There are several possible implementations, described in the section Lookup Implementations.

dictionaryImpl

Optional

Default: see description

The dictionary implementation to use. There are several possible implementations, described in the section Dictionary Implementations.

If not set, the default dictionary implementation is HighFrequencyDictionaryFactory. However, if a sourceLocation is used, the dictionary implementation will be FileDictionaryFactory.

field

Optional

Default: none

A field from the index to use as the basis of suggestion terms. If sourceLocation is empty (meaning any dictionary implementation other than FileDictionaryFactory), then terms from this field in the index will be used.

To be used as the basis for a suggestion, the field must be stored. You may want to use copyField rules to create a special 'suggest' field comprised of terms from other fields in documents.

You usually want a minimal amount of analysis on the field (no stemming, no synonyms, etc.), so an option is to create a field type in your schema that only uses basic tokenizers or filters. An example of such a field type is shown here:

<fieldType class="solr.TextField" name="textSuggest" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

However, this minimal analysis is not required if you want more analysis to occur on terms.

If using the AnalyzingLookupFactory as your lookupImpl, however, you have the option of defining the field type rules to use for index and query time analysis.

sourceLocation

Optional

Default: see description

The path to the dictionary file if using the FileDictionaryFactory. If this value is empty, the main index will be used as a source of terms and weights.

storeDir

Optional

Default: none

The location to store the dictionary file.

buildOnCommit and buildOnOptimize

Optional

Default: false

If true, the lookup data structure will be rebuilt after soft-commit. If false,then the lookup data will be built only when requested by URL parameter suggest.build=true. Use buildOnCommit to rebuild the dictionary with every soft commit, or buildOnOptimize to build the dictionary only when the index is optimized.

Some lookup implementations may take a long time to build, especially with large indexes. In such cases, using buildOnCommit or buildOnOptimize, particularly with a high frequency of soft commits is not recommended. Instead build the suggester at a lower frequency by manually issuing requests with suggest.build=true.

buildOnStartup

Optional

Default: false

If true, then the lookup data structure will be built when Solr starts or when the core is reloaded. If this parameter is not specified, the suggester will check if the lookup data structure is present on disk and build it if not found.

Enabling this to true could lead to Solr taking longer to load (or reload) cores as the suggester data structure is built, which can sometimes take a long time. It’s usually preferred to leave this set to false and build suggesters manually with suggest.build=true.

Lookup Implementations

The lookupImpl parameter defines the algorithms used to look up terms in the suggest index. There are several possible implementations to choose from, and some require additional parameters to be configured.

AnalyzingLookupFactory

A lookup that first analyzes the incoming text and adds the analyzed form to a weighted FST, and then does the same thing at lookup time.

This implementation uses the following additional properties:

suggestAnalyzerFieldType

Required

Default: none

The field type to use for the query-time and build-time term suggestion analysis.

exactMatchFirst

Optional

Default: true

If true, exact suggestions are returned first, even if they are prefixes or other strings in the FST have larger weights.

preserveSep

Optional

Default: true

If true, then a separator between tokens is preserved. This means that suggestions are sensitive to tokenization (e.g., baseball is different from base ball).

preservePositionIncrements

Optional

Default: false

If true, the suggester will preserve position increments. This means that token filters which leave gaps (for example, when StopFilter matches a stopword) the position would be respected when building the suggester.

FuzzyLookupFactory

This is a suggester which is an extension of the AnalyzingSuggester but is fuzzy in nature. The similarity is measured by the Levenshtein algorithm.

This implementation uses the following additional properties:

exactMatchFirst

Optional

Default: true

If true, exact suggestions are returned first, even if they are prefixes or other strings in the FST have larger weights.

preserveSep

Optional

Default: true

If true, then a separator between tokens is preserved. This means that suggestions are sensitive to tokenization (e.g., "baseball" is different from "base ball").

maxSurfaceFormsPerAnalyzedForm

Optional

Default: 256

The maximum number of surface forms to keep for a single analyzed form. When there are too many surface forms we discard the lowest weighted ones.

maxGraphExpansions

Optional

Default: -1

When building the FST ("index-time"), we add each path through the tokenstream graph as an individual entry. This places an upper-bound on how many expansions will be added for a single suggestion.

preservePositionIncrements

Optional

Default: false

If true, the suggester will preserve position increments. This means that token filters which leave gaps (for example, when StopFilter matches a stopword) the position would be respected when building the suggester.

maxEdits

Optional

Default: 1

The maximum number of string edits allowed. Solr’s hard limit is 2.

transpositions

Optional

Default: true

If true, transpositions should be treated as a primitive edit operation.

nonFuzzyPrefix

Optional

Default: 1

The length of the common non fuzzy prefix match which must match a suggestion.

minFuzzyLength

Optional

Default: 3

The minimum length of query before which any string edits will be allowed.

unicodeAware

Optional

Default: false

If true, the maxEdits, minFuzzyLength, transpositions and nonFuzzyPrefix parameters will be measured in unicode code points (actual letters) instead of bytes.

AnalyzingInfixLookupFactory

Analyzes the input text and then suggests matches based on prefix matches to any tokens in the indexed text. This uses a Lucene index for its dictionary.

This implementation uses the following additional properties.

indexPath

Optional

Default: see description

When using AnalyzingInfixSuggester you can provide your own path where the index will get built. The default is analyzingInfixSuggesterIndexDir and will be created in your collection’s data/ directory.

minPrefixChars

Optional

Default: 4

Minimum number of leading characters before PrefixQuery is used. Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).

allTermsRequired

Optional

Default: true

If true, all terms will be required.

highlight

Optional

Default: true

Highlight suggest terms.

This implementation supports Context Filtering.

BlendedInfixLookupFactory

An extension of the AnalyzingInfixSuggester which provides additional functionality to weight prefix matches across the matched documents. It scores higher if a hit is closer to the start of the suggestion.

This implementation uses the following additional properties:

blenderType

Optional

Default: position_linear

Used to calculate weight coefficient using the position of the first matching word. Available options are:

  • position_linear: Matches to the start will be given a higher score.

    weightFieldValue * (1 - 0.10*position)

  • position_reciprocal: Matches to the start will be given a higher score. The score of matches positioned far from the start of the suggestion decays faster than linear.

    weightFieldValue / (1 + position)

  • position_exponential_reciprocal: Matches to the start will be given a higher score. The score of matches positioned far from the start of the suggestion decays faster than reciprocal.

    weightFieldValue / pow(1 + position,exponent)

    When using this blender type, an additional parameter is available:

    • exponent: Controls how fast the score will decrease. The default 2.0.

numFactor

Optional

Default: 10

The factor to multiply the number of searched elements from which results will be pruned.

indexPath

Optional

Default: see description

When using BlendedInfixSuggester you can provide your own path where the index will get built. The default directory name is blendedInfixSuggesterIndexDir and will be created in your collection’s data directory.

minPrefixChars

Optional

Default: 4

Minimum number of leading characters before PrefixQuery is used. Prefixes shorter than this are indexed as character ngrams, which increases index size but makes lookups faster.

This implementation supports Context Filtering.

FreeTextLookupFactory

It looks at the last tokens plus the prefix of whatever final token the user is typing, if present, to predict the most likely next token. The number of previous tokens that need to be considered can also be specified. This suggester would only be used as a fallback, when the primary suggester fails to find any suggestions.

This implementation uses the following additional properties:

suggestFreeTextAnalyzerFieldType

Required

Default: none

The field type used at "query-time" and "build-time" to analyze suggestions.

ngrams

Optional

Default: 2

The max number of tokens out of which singles will be made the dictionary. Increasing this would mean you want more than the previous 2 tokens to be taken into consideration when making the suggestions.

FSTLookupFactory

An automaton-based lookup. This implementation is slower to build, but provides the lowest memory cost. We recommend using this implementation unless you need more sophisticated matching results, in which case you should use the Jaspell implementation.

This implementation uses the following additional properties:

exactMatchFirst

Optional

Default: true

If true, the default, exact suggestions are returned first, even if they are prefixes or other strings in the FST have larger weights.

weightBuckets

Optional

Default: none

The number of separate buckets for weights which the suggester will use while building its dictionary.

TSTLookupFactory

A simple compact ternary trie based lookup.

WFSTLookupFactory

A weighted automaton representation which is an alternative to FSTLookup for more fine-grained ranking. WFSTLookup does not use buckets, but instead a shortest path algorithm.

Note that it expects weights to be whole numbers. If weight is missing it’s assumed to be 1.0. Weights affect the sorting of matching suggestions when spellcheck.onlyMorePopular=true is selected: weights are treated as "popularity" score, with higher weights preferred over suggestions with lower weights.

JaspellLookupFactory

A more complex lookup based on a ternary trie from the JaSpell project. Use this implementation if you need more sophisticated matching results.

Dictionary Implementations

The dictionary implementations define how terms are stored. There are several options, and multiple dictionaries can be used in a single request if necessary.

DocumentDictionaryFactory

A dictionary with terms, weights, and an optional payload taken from the index.

This dictionary implementation takes the following parameters in addition to parameters described for the Suggester generally and for the lookup implementation:

weightField

Optional

Default: none

A field that is stored or a numeric DocValue field.

payloadField

Optional

Default: none

The payloadField should be a field that is stored.

contextField

Optional

Default: none

Field to be used for Context Filtering. Note that only some lookup implementations support filtering.

DocumentExpressionDictionaryFactory

This dictionary implementation is the same as the DocumentDictionaryFactory but allows users to specify an arbitrary expression into the weightExpression tag.

This dictionary implementation takes the following parameters in addition to parameters described for the Suggester generally and for the lookup implementation:

payloadField

Optional

Default: none

The payloadField should be a field that is stored.

weightExpression

Required

Default: none

An arbitrary expression used for scoring the suggestions. The fields used must be numeric fields.

contextField

Optional

Default: none

Field to be used for Context Filtering. Note that only some lookup implementations support filtering.

HighFrequencyDictionaryFactory

This dictionary implementation allows adding a threshold to prune out less frequent terms in cases where very common terms may overwhelm other terms.

This dictionary implementation takes one parameter in addition to parameters described for the Suggester generally and for the lookup implementation:

threshold

Optional

Default: 0

A value between 0 and 1 representing the minimum fraction of the total documents where a term should appear in order to be added to the lookup dictionary.

FileDictionaryFactory

This dictionary implementation allows using an external file that contains suggest entries. Weights and payloads can also be used.

If using a dictionary file, it should be a plain text file in UTF-8 encoding. You can use both single terms and phrases in the dictionary file. If adding weights or payloads, those should be separated from terms using the delimiter defined with the fieldDelimiter property (the default is \t, the tab representation). If using payloads, the first line in the file must specify a payload.

This dictionary implementation takes one parameter in addition to parameters described for the Suggester generally and for the lookup implementation:

fieldDelimiter

Optional

Default: \t

Specifies the delimiter to be used separating the entries, weights and payloads. The default is tab (\t).

Example File
acquire
accidentally    2.0
accommodate 3.0

Multiple Dictionaries

It is possible to include multiple dictionaryImpl definitions in a single SuggestComponent definition.

To do this, simply define separate suggesters, as in this example:

<searchComponent name="suggest" class="solr.SuggestComponent">
  <lst name="suggester">
    <str name="name">mySuggester</str>
    <str name="lookupImpl">FuzzyLookupFactory</str>
    <str name="dictionaryImpl">DocumentDictionaryFactory</str>
    <str name="field">cat</str>
    <str name="weightField">price</str>
    <str name="suggestAnalyzerFieldType">string</str>
  </lst>
  <lst name="suggester">
    <str name="name">altSuggester</str>
    <str name="dictionaryImpl">DocumentExpressionDictionaryFactory</str>
    <str name="lookupImpl">FuzzyLookupFactory</str>
    <str name="field">product_name</str>
    <str name="weightExpression">((price * 2) + ln(popularity))</str>
    <str name="sortField">weight</str>
    <str name="sortField">price</str>
    <str name="storeDir">suggest_fuzzy_doc_expr_dict</str>
    <str name="suggestAnalyzerFieldType">text_en</str>
  </lst>
</searchComponent>

When using these Suggesters in a query, you would define multiple suggest.dictionary parameters in the request, referring to the names given for each Suggester in the search component definition. The response will include the terms in sections for each Suggester. See the Example Usages section below for an example request and response.

Adding the Suggest Request Handler

After adding the search component, a request handler must be added to solrconfig.xml. This request handler works the same as any other request handler, and allows you to configure default parameters for serving suggestion requests. The request handler definition must incorporate the "suggest" search component defined previously.

<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
  <lst name="defaults">
    <str name="suggest">true</str>
    <str name="suggest.count">10</str>
  </lst>
  <arr name="components">
    <str>suggest</str>
  </arr>
</requestHandler>

Suggest Request Handler Parameters

The following parameters allow you to set defaults for the Suggest request handler:

suggest

Optional

Default: false

This parameter should always be true, because we always want to run the Suggester for queries submitted to this handler.

suggest.dictionary

Required

Default: none

The name of the dictionary component configured in the search component. It can be set in the request handler, or sent as a parameter at query time.

suggest.q

Optional

Default: none

The query to use for suggestion lookups. If not provided, the q parameter is used.

suggest.count

Optional

Default: 1

Specifies the number of suggestions for Solr to return.

suggest.cfq

Optional

Default: none

A context filter query used to filter suggestions based on the context field, if supported by the suggester.

Context filtering is currently only supported by AnalyzingInfixLookupFactory and BlendedInfixLookupFactory, and only when backed by a Document*Dictionary. All other implementations will return unfiltered matches as if filtering was not requested.

suggest.build

Optional

Default: false

If true, it will build the suggester index. This is likely useful only for initial requests; you would probably not want to build the dictionary on every request, particularly in a production system. If you would like to keep your dictionary up to date, you should use the buildOnCommit or buildOnOptimize parameter for the search component.

suggest.reload

Optional

Default: false

If true, it will reload the suggester index.

suggest.buildAll

Optional

Default: false

If true, it will build all suggester indexes.

suggest.reloadAll

Optional

Default: false

If true, it will reload all suggester indexes.

These properties can also be overridden at query time, or not set in the request handler at all and always sent at query time.

Example Usages

Get Suggestions with Weights

This is a basic suggestion using a single dictionary and a single Solr core.

Example query:

http://localhost:8983/solr/techproducts/suggest?suggest=true&suggest.build=true&suggest.dictionary=mySuggester&suggest.q=elec

In this example, we’ve simply requested the string 'elec' with the suggest.q parameter and requested that the suggestion dictionary be built with suggest.build (note, however, that you would likely not want to build the index on every query - instead you should use buildOnCommit or buildOnOptimize if you have regularly changing documents).

Example response:

{
  "responseHeader": {
    "status": 0,
    "QTime": 35
  },
  "command": "build",
  "suggest": {
    "mySuggester": {
      "elec": {
        "numFound": 3,
        "suggestions": [
          {
            "term": "electronics and computer1",
            "weight": 2199,
            "payload": ""
          },
          {
            "term": "electronics",
            "weight": 649,
            "payload": ""
          },
          {
            "term": "electronics and stuff2",
            "weight": 279,
            "payload": ""
          }
        ]
      }
    }
  }
}

Using Multiple Dictionaries

If you have defined multiple dictionaries, you can use them in queries.

Example query:

http://localhost:8983/solr/techproducts/suggest?suggest=true&suggest.dictionary=mySuggester&suggest.dictionary=altSuggester&suggest.q=elec

In this example we have sent the string 'elec' as the suggest.q parameter and named two suggest.dictionary definitions to be used.

Example response:

{
  "responseHeader": {
    "status": 0,
    "QTime": 3
  },
  "suggest": {
    "mySuggester": {
      "elec": {
        "numFound": 1,
        "suggestions": [
          {
            "term": "electronics and computer1",
            "weight": 100,
            "payload": ""
          }
        ]
      }
    },
    "altSuggester": {
      "elec": {
        "numFound": 1,
        "suggestions": [
          {
            "term": "electronics and computer1",
            "weight": 10,
            "payload": ""
          }
        ]
      }
    }
  }
}

Context Filtering

Context filtering lets you filter suggestions by a separate context field, such as category, department or any other token. The AnalyzingInfixLookupFactory and BlendedInfixLookupFactory currently support this feature, when backed by DocumentDictionaryFactory.

Add contextField to your suggester configuration. This example will suggest names and allow to filter by category:

solrconfig.xml
<searchComponent name="suggest" class="solr.SuggestComponent">
  <lst name="suggester">
    <str name="name">mySuggester</str>
    <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
    <str name="dictionaryImpl">DocumentDictionaryFactory</str>
    <str name="field">name</str>
    <str name="weightField">price</str>
    <str name="contextField">cat</str>
    <str name="suggestAnalyzerFieldType">string</str>
    <str name="buildOnStartup">false</str>
  </lst>
</searchComponent>

Example context filtering suggest query:

http://localhost:8983/solr/techproducts/suggest?suggest=true&suggest.build=true&suggest.dictionary=mySuggester&suggest.q=c&suggest.cfq=memory

The suggester will only bring back suggestions for products tagged with 'cat=memory'.