Solr stores details about the field types and fields it is expected to understand in a schema file.
The name and location of Solr’s schema file may vary depending on how you initially configured Solr or if you modified it later.
You may explicitly configure the managed schema features to use an alternative filename if you choose, but the contents of the files are still updated automatically by Solr.
schema.xmlis the traditional name for a schema file which can be edited manually by users who use the
If you are using SolrCloud you may not be able to find any file by these names on the local filesystem. You will only be able to see the schema through the Schema API (if enabled) or through the Solr Admin UI’s Cloud Screens.
Whichever name of the file in use in your installation, the structure of the file is not changed. However, the way you interact with the file will change. If you are using the managed schema, it is expected that you only interact with the file with the Schema API, and never make manual edits. If you do not use the managed schema, you will only be able to make manual edits to the file, the Schema API will not support any modifications.
Note that if you are not using the Schema API yet you do use SolrCloud, you will need to interact with the schema file through ZooKeeper using
downconfig commands to make a local copy and upload your changes.
The options for doing this are described in Solr Control Script Reference and ZooKeeper File Management.
This example is not real XML, but shows the primary elements that make up a schema file.
<schema> <types> <fieldType> <fields> <field> <copyField> <dynamicField> <similarity> <uniqueKey> </schema>
The most commonly defined elements are
fields, where the field types and the actual fields are configured.
The sections Field Type Definitions and Properties, and Fields describe how to configure these for your schema.
uniqueKey described in Unique Key below must always be defined.
similarity will be used, but can be modified as described in the section Similarity below.
Types and fields are optional tags
Note that the
uniqueKey element specifies which field is a unique identifier for documents.
uniqueKey is not required, it is nearly always warranted by your application design.
uniqueKey should be used if you will ever update a document in the index.
You can define the unique key field by naming it:
Schema defaults and
copyFields cannot be used to populate the
uniqueKey must not be analyzed and must not be any of the
You can use
UUIDUpdateProcessorFactory to have
uniqueKey values generated automatically.
Further, the operation will fail if the
uniqueKey field is used, but is multivalued (or inherits the multivalued-ness from the
uniqueKey will continue to work, as long as the field is properly used.
Similarity is a Lucene class used to score a document in searching.
Each collection has one "global" Similarity.
By default Solr uses an implicit
SchemaSimilarityFactory which allows individual field types to be configured with a "per-type" specific Similarity and implicitly uses
BM25Similarity for any field type which does not have an explicit Similarity.
This default behavior can be overridden by declaring a top level
<similarity/> element in your schema, outside of any single field type.
This similarity declaration can either refer directly to the name of a class with a no-argument constructor, such as in this example showing
or by referencing a
SimilarityFactory implementation, which may take optional initialization parameters:
<similarity class="solr.DFRSimilarityFactory"> <str name="basicModel">P</str> <str name="afterEffect">L</str> <str name="normalization">H2</str> <float name="c">7</float> </similarity>
In most cases, specifying global level similarity like this will cause an error if your schema also includes field type specific
One key exception to this is that you may explicitly declare a
SchemaSimilarityFactory and specify what that default behavior will be for all field types that do not declare an explicit Similarity using the name of field type (specified by
defaultSimFromFieldType) that is configured with a specific similarity:
<similarity class="solr.SchemaSimilarityFactory"> <str name="defaultSimFromFieldType">text_dfr</str> </similarity> <fieldType name="text_dfr" class="solr.TextField"> <analyzer ... /> <similarity class="solr.DFRSimilarityFactory"> <str name="basicModel">I(F)</str> <str name="afterEffect">B</str> <str name="normalization">H3</str> <float name="mu">900</float> </similarity> </fieldType> <fieldType name="text_ib" class="solr.TextField"> <analyzer ... /> <similarity class="solr.IBSimilarityFactory"> <str name="distribution">SPL</str> <str name="lambda">DF</str> <str name="normalization">H2</str> </similarity> </fieldType> <fieldType name="text_other" class="solr.TextField"> <analyzer ... /> </fieldType>
In the example above
IBSimilarityFactory (using the Information-Based model) will be used for any fields of type
DFRSimilarityFactory (divergence from random) will be used for any fields of type
text_dfr, as well as any fields using a type that does not explicitly specify a
SchemaSimilarityFactory is explicitly declared without configuring a
BM25Similarity is implicitly used as the default.
In addition to the various factories mentioned on this page, there are several other similarity implementations that can be used such as the
For details, see the Solr Javadocs for the similarity factories.