This section describes several other important elements of schema.xml
not covered in earlier sections.
Unique Key
The uniqueKey
element specifies which field is a unique identifier for documents. Although uniqueKey
is not required, it is nearly always warranted by your application design. For example, uniqueKey
should be used if you will ever update a document in the index.
You can define the unique key field by naming it:
<uniqueKey>id</uniqueKey>
Schema defaults and copyFields
cannot be used to populate the uniqueKey
field. The fieldType
of uniqueKey
must not be analyzed and must not be any of the *PointField
types. You can use UUIDUpdateProcessorFactory
to have uniqueKey
values generated automatically.
Further, the operation will fail if the uniqueKey
field is used, but is multivalued (or inherits the multivalue-ness from the fieldtype
). However, uniqueKey
will continue to work, as long as the field is properly used.
Similarity
Similarity is a Lucene class used to score a document in searching.
Each collection has one "global" Similarity, and by default Solr uses an implicit SchemaSimilarityFactory
which allows individual field types to be configured with a "per-type" specific Similarity and implicitly uses BM25Similarity
for any field type which does not have an explicit Similarity.
This default behavior can be overridden by declaring a top level <similarity/>
element in your schema.xml
, outside of any single field type. This similarity declaration can either refer directly to the name of a class with a no-argument constructor, such as in this example showing BM25Similarity
:
<similarity class="solr.BM25SimilarityFactory"/>
or by referencing a SimilarityFactory
implementation, which may take optional initialization parameters:
<similarity class="solr.DFRSimilarityFactory">
<str name="basicModel">P</str>
<str name="afterEffect">L</str>
<str name="normalization">H2</str>
<float name="c">7</float>
</similarity>
In most cases, specifying global level similarity like this will cause an error if your schema.xml
also includes field type specific <similarity/>
declarations. One key exception to this is that you may explicitly declare a SchemaSimilarityFactory
and specify what that default behavior will be for all field types that do not declare an explicit Similarity using the name of field type (specified by defaultSimFromFieldType
) that is configured with a specific similarity:
<similarity class="solr.SchemaSimilarityFactory">
<str name="defaultSimFromFieldType">text_dfr</str>
</similarity>
<fieldType name="text_dfr" class="solr.TextField">
<analyzer ... />
<similarity class="solr.DFRSimilarityFactory">
<str name="basicModel">I(F)</str>
<str name="afterEffect">B</str>
<str name="normalization">H3</str>
<float name="mu">900</float>
</similarity>
</fieldType>
<fieldType name="text_ib" class="solr.TextField">
<analyzer ... />
<similarity class="solr.IBSimilarityFactory">
<str name="distribution">SPL</str>
<str name="lambda">DF</str>
<str name="normalization">H2</str>
</similarity>
</fieldType>
<fieldType name="text_other" class="solr.TextField">
<analyzer ... />
</fieldType>
In the example above IBSimilarityFactory
(using the Information-Based model) will be used for any fields of type text_ib
, while DFRSimilarityFactory
(divergence from random) will be used for any fields of type text_dfr
, as well as any fields using a type that does not explicitly specify a <similarity/>
.
If SchemaSimilarityFactory
is explicitly declared without configuring a defaultSimFromFieldType
, then BM25Similarity
is implicitly used as the default.
In addition to the various factories mentioned on this page, there are several other similarity implementations that can be used such as the SweetSpotSimilarityFactory
, ClassicSimilarityFactory
, etc. For details, see the Solr Javadocs for the similarity factories.