Putting the Pieces Together

At the highest level, schema.xml is structured as follows.

This example is not real XML, but it gives you an idea of the structure of the file.

<schema>
  <types>
  <fields>
  <uniqueKey>
  <copyField>
</schema>

Obviously, most of the excitement is in types and fields, where the field types and the actual field definitions live.

These are supplemented by copyFields.

The uniqueKey must always be defined.

Types and fields are optional tags

Note that the types and fields sections are optional, meaning you are free to mix field, dynamicField, copyField and fieldType definitions on the top level. This allows for a more logical grouping of related tags in your schema.

Choosing Appropriate Numeric Types

For general numeric needs, consider using one of the IntPointField, LongPointField, FloatPointField, or DoublePointField classes, depending on the specific values you expect. These "Dimensional Point" based numeric classes use specially encoded data structures to support efficient range queries regardless of the size of the ranges used. Enable DocValues on these fields as needed for sorting and/or faceting.

Some Solr features may not yet work with "Dimensional Points", in which case you may want to consider the equivalent TrieIntField, TrieLongField, TrieFloatField, and TrieDoubleField classes. These field types are deprecated and are likely to be removed in a future major Solr release, but they can still be used if necessary. Configure a precisionStep="0" if you wish to minimize index size, but if you expect users to make frequent range queries on numeric types, use the default precisionStep (by not specifying it) or specify it as precisionStep="8" (which is the default). This offers faster speed for range queries at the expense of increasing index size.

Working With Text

Handling text properly will make your users happy by providing them with the best possible results for text searches.

One technique is using a text field as a catch-all for keyword searching. Most users are not sophisticated about their searches and the most common search is likely to be a simple keyword search. You can use copyField to take a variety of fields and funnel them all into a single text field for keyword searches.

In the schema.xml file for the “techproducts” example included with Solr, copyField declarations are used to dump the contents of cat, name, manu, features, and includes into a single field, text. In addition, it could be a good idea to copy ID into text in case users wanted to search for a particular product by passing its product number to a keyword search.

Another technique is using copyField to use the same field in different ways. Suppose you have a field that is a list of authors, like this:

Schildt, Herbert; Wolpert, Lewis; Davies, P.

For searching by author, you could tokenize the field, convert to lower case, and strip out punctuation:

schildt / herbert / wolpert / lewis / davies / p

For sorting, just use an untokenized field, converted to lower case, with punctuation stripped:

schildt herbert wolpert lewis davies p

Finally, for faceting, use the primary author only via a StrField:

Schildt, Herbert