Exercise 4: Using ParamSets
Exercise 4: Using ParamSets
This exercise will teach you to use ParamSets to group a number of different query parameters into a labelled grouping that you can refer to in your queries.
Getting Ready
Make sure you have a running Solr, following the steps in tutorial-films.adoc#restart-solr. Then go ahead to the next section.
Create a New Collection
$ bin/solr create -c films
Because we didn’t specify a ConfigSet, we will end up using the _default
ConfigSet.
We’ll specify the specific schema for a couple of fields that Solr would otherwise guess differently (than we’d like) about:
$ curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{
"add-field" : [
{
"name":"name",
"type":"text_general",
"multiValued":false,
"stored":true
},
{
"name":"initial_release_date",
"type":"pdate",
"stored":true
}
]
}'
Without explicitly defining those field types, the name field would have been guessed as a multi-valued string field type
and initial_release_date would have been guessed as a multi-valued |
Index the Data
Now that we have updated our Schema, we need to index the sample film data, or, if you already have indexed it, then re-index it to take advantage of the new field definitions we added.
Linux/Mac
$ bin/solr post -c films example/films/films.json
Windows
$ bin/solr post -c films example\films\films.json
Let’s get Searching!
Search for 'Batman':
-
If you get an error about the name field not existing, you haven’t yet indexed the data.
-
If you don’t get an error, but zero results, chances are that the name field schema type override wasn’t set before indexing the data the first time (it ended up as a "string" type, requiring exact matching by case even). It’s easiest to simply reset your environment and try again, ensuring that each step successfully executes.
Show me all 'Super hero' movies:
$ curl 'http://localhost:8983/solr/films/query?q=*:*&fq=genre:"Superhero movie"'
Let’s see the distribution of genres across all the movies. See the facet section of the response for the counts:
$ curl 'http://localhost:8983/solr/films/query?q=*:*&facet=true&facet.field=genre'
Time for relevancy tuning with ParamSets :
Now that we can query our data, let’s actually use the ParamSets to organize our parameters into two experiments.
Search for 'harry potter':
Notice the very first result is the movie Dumb & Dumberer: When Harry Met Lloyd? That is clearly not related to any Harry Potter movies.
Let’s set up two relevancy algorithms, using our APIs, and then compare the quality of the results.
Algorithm A will specify using dismax
and a qf
parameter, while Algorithm B will use dismax
, qf
and a must match mm
set to 100%.
curl http://localhost:8983/solr/films/config/params -X POST -H 'Content-type:application/json' --data-binary '{
"set": {
"algo_a":{
"defType":"dismax",
"qf":"name"
}
},
"set": {
"algo_b":{
"defType":"dismax",
"qf":"name",
"mm":"100%"
}
}
}'
Search for 'harry potter' with Algorithm A:
We are returning the five results, including the Harry Potter movies, however notice that we still have the Dumb & Dumberer: When Harry Met Lloyd movie coming back?
Search for 'harry potter' with Algorithm B:
We are returning only the four Harry Potter movies, leading to more precise results! We can say that we believe Algorithm B is better then Algorithm A, at least for this one query. You can validate this hypothesis with online A/B testing to confirm with real users that Algorithm B is better overall.