Putting the Pieces Together
At the highest level,
schema.xml is structured as follows.
This example is not real XML, but it gives you an idea of the structure of the file.
<schema> <types> <fields> <uniqueKey> <copyField> </schema>
Obviously, most of the excitement is in
fields, where the field types and the actual field definitions live.
These are supplemented by
uniqueKey must always be defined.
Types and fields are optional tags
Note that the
Choosing Appropriate Numeric Types
For general numeric needs, consider using one of the
DoublePointField classes, depending on the specific values you expect. These "Dimensional Point" based numeric classes use specially encoded data structures to support efficient range queries regardless of the size of the ranges used. Enable DocValues on these fields as needed for sorting and/or faceting.
Some Solr features may not yet work with "Dimensional Points", in which case you may want to consider the equivalent
TrieDoubleField classes. These field types are deprecated and are likely to be removed in a future major Solr release, but they can still be used if necessary. Configure a
precisionStep="0" if you wish to minimize index size, but if you expect users to make frequent range queries on numeric types, use the default
precisionStep (by not specifying it) or specify it as
precisionStep="8" (which is the default). This offers faster speed for range queries at the expense of increasing index size.
Working With Text
Handling text properly will make your users happy by providing them with the best possible results for text searches.
One technique is using a text field as a catch-all for keyword searching. Most users are not sophisticated about their searches and the most common search is likely to be a simple keyword search. You can use
copyField to take a variety of fields and funnel them all into a single text field for keyword searches.
schema.xml file for the “
techproducts” example included with Solr,
copyField declarations are used to dump the contents of
includes into a single field,
text. In addition, it could be a good idea to copy
text in case users wanted to search for a particular product by passing its product number to a keyword search.
Another technique is using
copyField to use the same field in different ways. Suppose you have a field that is a list of authors, like this:
Schildt, Herbert; Wolpert, Lewis; Davies, P.
For searching by author, you could tokenize the field, convert to lower case, and strip out punctuation:
schildt / herbert / wolpert / lewis / davies / p
For sorting, just use an untokenized field, converted to lower case, with punctuation stripped:
schildt herbert wolpert lewis davies p
Finally, for faceting, use the primary author only via a