This section describes several other important elements of
schema.xml not covered in earlier sections.
uniqueKey element specifies which field is a unique identifier for documents. Although
uniqueKey is not required, it is nearly always warranted by your application design. For example,
uniqueKey should be used if you will ever update a document in the index.
You can define the unique key field by naming it:
Schema defaults and
copyFields cannot be used to populate the
uniqueKey field. The
uniqueKey must not be analyzed and must not be any of the
*PointField types. You can use
UUIDUpdateProcessorFactory to have
uniqueKey values generated automatically.
Further, the operation will fail if the
uniqueKey field is used, but is multivalued (or inherits the multivalue-ness from the
uniqueKey will continue to work, as long as the field is properly used.
Similarity is a Lucene class used to score a document in searching.
Each collection has one "global" Similarity, and by default Solr uses an implicit
SchemaSimilarityFactory which allows individual field types to be configured with a "per-type" specific Similarity and implicitly uses
BM25Similarity for any field type which does not have an explicit Similarity.
This default behavior can be overridden by declaring a top level
<similarity/> element in your
schema.xml, outside of any single field type. This similarity declaration can either refer directly to the name of a class with a no-argument constructor, such as in this example showing
or by referencing a
SimilarityFactory implementation, which may take optional initialization parameters:
<similarity class="solr.DFRSimilarityFactory"> <str name="basicModel">P</str> <str name="afterEffect">L</str> <str name="normalization">H2</str> <float name="c">7</float> </similarity>
In most cases, specifying global level similarity like this will cause an error if your
schema.xml also includes field type specific
<similarity/> declarations. One key exception to this is that you may explicitly declare a
SchemaSimilarityFactory and specify what that default behavior will be for all field types that do not declare an explicit Similarity using the name of field type (specified by
defaultSimFromFieldType) that is configured with a specific similarity:
<similarity class="solr.SchemaSimilarityFactory"> <str name="defaultSimFromFieldType">text_dfr</str> </similarity> <fieldType name="text_dfr" class="solr.TextField"> <analyzer ... /> <similarity class="solr.DFRSimilarityFactory"> <str name="basicModel">I(F)</str> <str name="afterEffect">B</str> <str name="normalization">H3</str> <float name="mu">900</float> </similarity> </fieldType> <fieldType name="text_ib" class="solr.TextField"> <analyzer ... /> <similarity class="solr.IBSimilarityFactory"> <str name="distribution">SPL</str> <str name="lambda">DF</str> <str name="normalization">H2</str> </similarity> </fieldType> <fieldType name="text_other" class="solr.TextField"> <analyzer ... /> </fieldType>
In the example above
IBSimilarityFactory (using the Information-Based model) will be used for any fields of type
DFRSimilarityFactory (divergence from random) will be used for any fields of type
text_dfr, as well as any fields using a type that does not explicitly specify a
SchemaSimilarityFactory is explicitly declared without configuring a
BM25Similarity is implicitly used as the default for
luceneMatchVersion >= 8.0.0 and otherwise
LegacyBM25Similarity is used to mimic the same BM25 formula that was the default in those versions.
In addition to the various factories mentioned on this page, there are several other similarity implementations that can be used such as the
LegacyBM25SimilarityFactory etc. For details, see the Solr Javadocs for the similarity factories.