The Extended DisMax (eDismax) Query Parser
The Extended DisMax (eDisMax) query parser is an improved version of the DisMax query parser.
In addition to supporting all the DisMax query parser parameters, Extended Dismax:
- supports Solr’s standard query parser syntax such as (non-exhaustive list):
- boolean operators such as AND (+, &&), OR (||), NOT (-).
- optionally treats lowercase "and" and "or" as "AND" and "OR" in Lucene syntax mode
- optionally allows embedded queries using other query parsers or functions
- includes improved smart partial escaping in the case of syntax errors; fielded queries, +/-, and phrase queries are still supported in this mode.
- improves proximity boosting by using word shingles; you do not need the query to match all words in the document before proximity boosting is applied.
- includes advanced stopword handling: stopwords are not required in the mandatory part of the query but are still used in the proximity boosting part. If a query consists of all stopwords, such as "to be or not to be", then all words are required.
- includes improved boost function: in Extended DisMax, the
boost
function is a multiplier rather than an addend, improving your boost results; the additive boost functions of DisMax (bf
andbq
) are also supported. - supports pure negative nested queries: queries such as
+foo (-foo)
will match all documents. - lets you specify which fields the end user is allowed to query, and to disallow direct fielded searches.
Extended DisMax Parameters
In addition to all the DisMax parameters, Extended DisMax includes these query parameters:
sow
- Split on whitespace. If set to
true
, text analysis is invoked separately for each individual whitespace-separated term. The default isfalse
; whitespace-separated term sequences will be provided to text analysis in one shot, enabling proper function of analysis filters that operate over term sequences, e.g., multi-word synonyms and shingles. mm
- Minimum should match. See the DisMax mm parameter for a description of
mm
. The default eDisMaxmm
value differs from that of DisMax:- The default
mm
value is 0%:- if the query contains an explicit operator other than "AND" ("-", "+", "OR", "NOT"); or
- if
q.op
is "OR" or is not specified.
- The default
mm
value is 100% ifq.op
is "AND" and the query does not contain any explicit operators other than "AND".
- The default
mm.autoRelax
If
true
, the number of clauses required (minimum should match) will automatically be relaxed if a clause is removed (by e.g., stopwords filter) from some but not allqf
fields. Use this parameter as a workaround if you experience that queries return zero hits due to uneven stopword removal between theqf
fields.Note that relaxing
mm
may cause undesired side effects, such as hurting the precision of the search, depending on the nature of your index content.boost
- A multivalued list of strings parsed as queries with scores multiplied by the score from the main query for all matching documents. This parameter is shorthand for wrapping the query produced by eDisMax using the
BoostQParserPlugin
. lowercaseOperators
- A Boolean parameter indicating if lowercase "and" and "or" should be treated the same as operators "AND" and "OR".
Defaults to
false
. ps
- Phrase Slop. The default amount of slop - distance between terms - on phrase queries built with
pf
,pf2
and/orpf3
fields (affects boosting). See also the section Using 'Slop' below. pf2
- A multivalued list of fields with optional weights. Similar to
pf
, but based on word pair shingles. ps2
- This is similar to
ps
but overrides the slop factor used forpf2
. If not specified,ps
is used. pf3
- A multivalued list of fields with optional weights, based on triplets of word shingles. Similar to
pf
, except that instead of building a phrase per field out of all the words in the input, it builds a set of phrases for each field out of word triplet shingles. ps3
- This is similar to
ps
but overrides the slop factor used forpf3
. If not specified,ps
is used. stopwords
- A Boolean parameter indicating if the
StopFilterFactory
configured in the query analyzer should be respected when parsing the query. If this is set tofalse
, then theStopFilterFactory
in the query analyzer is ignored. uf
Specifies which schema fields the end user is allowed to explicitly query and to toggle whether embedded Solr queries are supported. This parameter supports wildcards. Multiple fields must be separated by a space.
The default is to allow all fields and no embedded Solr queries, equivalent to
uf=* -_query_
.- To allow only title field, use
uf=title
. - To allow title and all fields ending with '_s', use
uf=title *_s
. - To allow all fields except title, use
uf=* -title
. - To disallow all fielded searches, use
uf=-*
. - To allow embedded Solr queries (e.g.,
_query_:"…"
or_val_:"…"
or{!lucene …}
), you must expressly enable this by referring to the magic field_query_
inuf
.
- To allow only title field, use
Field Aliasing using Per-Field qf Overrides
Per-field overrides of the qf
parameter may be specified to provide 1-to-many aliasing from field names specified in the query string, to field names used in the underlying query. By default, no aliasing is used and field names specified in the query string are treated as literal field names in the index.
Examples of eDismax Queries
All of the sample URLs in this section assume you are running Solr’s “techproducts
” example:
bin/solr -e techproducts
Boost the result of the query term "hello" based on the document’s popularity:
http://localhost:8983/solr/techproducts/select?defType=edismax&q=hello&pf=text&qf=text&boost=popularity
Search for iPods OR video:
http://localhost:8983/solr/techproducts/select?defType=edismax&q=ipod+OR+video
Search across multiple fields, specifying (via boosts) how important each field is relative each other:
http://localhost:8983/solr/techproducts/select?q=video&defType=edismax&qf=features^20.0+text^0.3
You can boost results that have a field that matches a specific value:
http://localhost:8983/solr/techproducts/select?q=video&defType=edismax&qf=features^20.0+text^0.3&bq=cat:electronics^5.0
Using the mm
parameter, 1 and 2 word queries require that all of the optional clauses match, but for queries with three or more clauses one missing clause is allowed:
http://localhost:8983/solr/techproducts/select?q=belkin+ipod&defType=edismax&mm=2
http://localhost:8983/solr/techproducts/select?q=belkin+ipod+gibberish&defType=edismax&mm=2
http://localhost:8983/solr/techproducts/select?q=belkin+ipod+apple&defType=edismax&mm=2
In the example below, we see a per-field override of the qf
parameter being used to alias "name" in the query string to either the “last_name
” and “first_name
” fields:
defType=edismax
q=sysadmin name:Mike
qf=title text last_name first_name
f.name.qf=last_name first_name
Using Negative Boost
Negative query boosts have been supported at the "Query" object level for a long time (resulting in negative scores for matching documents). Now the QueryParsers have been updated to handle this too.
Using 'Slop'
Dismax
and Edismax
can run queries against all query fields, and also run a query in the form of a phrase against the phrase fields. (This will work only for boosting documents, not actually for matching.) However, that phrase query can have a 'slop,' which is the distance between the terms of the query while still considering it a phrase match. For example:
q=foo bar
qf=field1^5 field2^10
pf=field1^50 field2^20
defType=dismax
With these parameters, the Dismax Query Parser generates a query that looks something like this:
(+(field1:foo^5 OR field2:foo^10) AND (field1:bar^5 OR field2:bar^10))
But it also generates another query that will only be used for boosting results:
field1:"foo bar"^50 OR field2:"foo bar"^20
Thus, any document that has the terms "foo" and "bar" will match; however if some of those documents have both of the terms as a phrase, it will score much higher because it’s more relevant.
If you add the parameter ps
(phrase slop), the second query will instead be:
ps=10 field1:"foo bar"~10^50 OR field2:"foo bar"~10^20
This means that if the terms "foo" and "bar" appear in the document with less than 10 terms between each other, the phrase will match. For example the doc that says:
*Foo* term1 term2 term3 *bar*
will match the phrase query.
How does one use phrase slop? Usually it is configured in the request handler (in solrconfig
).
With query slop (qs
) the concept is similar, but it applies to explicit phrase queries from the user. For example, if you want to search for a name, you could enter:
q="Hans Anderson"
A document that contains "Hans Anderson" will match, but a document that contains the middle name "Christian" or where the name is written with the last name first ("Anderson, Hans") won’t. For those cases one could configure the query field qs
, so that even if the user searches for an explicit phrase query, a slop is applied.
Finally, in addition to the phrase fields (pf
) parameter, edismax
also supports the pf2
and pf3
parameters, for fields over which to create bigram and trigram phrase queries. The phrase slop for these parameters' queries can be specified using the ps2
and ps3
parameters, respectively. If you use pf2
/pf3
but not ps2
/ps3
, then the phrase slop for these parameters' queries will be taken from the ps
parameter, if any.
Synonyms Expansion in Phrase Queries with Slop
When a phrase query with slop (e.g., pf
with ps
) triggers synonym expansions, a separate clause will be generated for each combination of synonyms. For example, with configured synonyms dog,canine
and cat,feline
, the query "dog chased cat"
will generate the following phrase query clauses:
"dog chased cat"
"canine chased cat"
"dog chased feline"
"canine chased feline"