Putting the Pieces Together
At the highest level, schema.xml
is structured as follows.
This example is not real XML, but it gives you an idea of the structure of the file.
<schema>
<types>
<fields>
<uniqueKey>
<copyField>
</schema>
Obviously, most of the excitement is in types
and fields
, where the field types and the actual field definitions live.
These are supplemented by copyFields
.
The uniqueKey
must always be defined.
Types and fields are optional tags
Note that the |
Choosing Appropriate Numeric Types
For general numeric needs, consider using one of the IntPointField
, LongPointField
, FloatPointField
, or DoublePointField
classes, depending on the specific values you expect. These "Dimensional Point" based numeric classes use specially encoded data structures to support efficient range queries regardless of the size of the ranges used. Enable DocValues on these fields as needed for sorting and/or faceting.
Some Solr features may not yet work with "Dimensional Points", in which case you may want to consider the equivalent TrieIntField
, TrieLongField
, TrieFloatField
, and TrieDoubleField
classes. These field types are deprecated and are likely to be removed in a future major Solr release, but they can still be used if necessary. Configure a precisionStep="0"
if you wish to minimize index size, but if you expect users to make frequent range queries on numeric types, use the default precisionStep
(by not specifying it) or specify it as precisionStep="8"
(which is the default). This offers faster speed for range queries at the expense of increasing index size.
Working With Text
Handling text properly will make your users happy by providing them with the best possible results for text searches.
One technique is using a text field as a catch-all for keyword searching. Most users are not sophisticated about their searches and the most common search is likely to be a simple keyword search. You can use copyField
to take a variety of fields and funnel them all into a single text field for keyword searches.
In the schema.xml
file for the “techproducts
” example included with Solr, copyField
declarations are used to dump the contents of cat
, name
, manu
, features
, and includes
into a single field, text
. In addition, it could be a good idea to copy ID
into text
in case users wanted to search for a particular product by passing its product number to a keyword search.
Another technique is using copyField
to use the same field in different ways. Suppose you have a field that is a list of authors, like this:
Schildt, Herbert; Wolpert, Lewis; Davies, P.
For searching by author, you could tokenize the field, convert to lower case, and strip out punctuation:
schildt / herbert / wolpert / lewis / davies / p
For sorting, just use an untokenized field, converted to lower case, with punctuation stripped:
schildt herbert wolpert lewis davies p
Finally, for faceting, use the primary author only via a StrField
:
Schildt, Herbert