Field Type Definitions and Properties
A field type defines the analysis that will occur on a field when documents are indexed or queries are sent to the index.
A field type definition can include four types of information:
-
The name of the field type (mandatory).
-
An implementation class name (mandatory).
-
If the field type is
TextField, a description of the field analysis for the field type. -
Field type properties - depending on the implementation class, some properties may be mandatory.
Field Type Definitions in the Schema
Field types are defined in the collection’s schema.
Each field type is defined between fieldType elements.
They can optionally be grouped within a types element.
Here is an example of a field type definition for a type called text_general:
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> (1)
<analyzer type="index"> (2)
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
| 1 | The first line in the example above contains the field type name, text_general, and the name of the implementing class, solr.TextField. |
| 2 | The rest of the definition is about field analysis, described in Document Analysis in Solr. |
The implementing class is responsible for making sure the field is handled correctly.
In the class names, the string solr is shorthand for org.apache.solr.schema or org.apache.solr.analysis.
Therefore, solr.TextField is really org.apache.solr.schema.TextField.
Field Type Properties
The field type class determines most of the behavior of a field type, but optional properties can also be defined.
For example, the following definition of a date field type defines two properties, sortMissingLast and omitNorms.
<fieldType name="date" class="solr.DatePointField"
sortMissingLast="true" omitNorms="true"/>
The properties that can be specified for a given field type fall into three major categories:
-
Properties specific to the field type’s class.
-
General Properties Solr supports for any field type.
-
Field Default Properties that can be specified on the field type that will be inherited by fields that use this type instead of the default behavior.
General Properties
These are the general properties for fields:
name-
Required
Default: none
The name of the fieldType. This value gets used in field definitions, in the "type" attribute. It is strongly recommended that names consist of alphanumeric or underscore characters only and not start with a digit. This is not currently strictly enforced.
class-
Required
Default: none
The class name used to store and index the data for this type. Note that you may prefix included class names with "solr." and Solr will automatically figure out which packages to search for the class - so
solr.TextFieldwill work.If you are using a third-party class, you will probably need to have a fully qualified class name. The fully qualified equivalent for
solr.TextFieldisorg.apache.solr.schema.TextField. positionIncrementGap-
Optional
Default: none
For multivalued fields, specifies a distance between multiple values, which prevents spurious phrase matches.
autoGeneratePhraseQueries-
Optional
Default: none
For text fields. If
true, Solr automatically generates phrase queries for adjacent terms. Iffalse, terms must be enclosed in double-quotes to be treated as phrases. synonymQueryStyle-
Optional
Default:
as_same_termQuery used to combine scores of overlapping query terms (i.e., synonyms). Consider a search for "blue tee" with query-time synonyms
tshirt,tee.-
as_same_term: Blends terms, i.e.,SynonymQuery(tshirt,tee)where each term will be treated as equally important. This option is appropriate when terms are true synonyms (e.g., "television, tv"). -
pick_best: Selects the most significant synonym when scoringDismax(tee,tshirt). Use this when synonyms are expanding to hyponyms(q=jeans w/ jeans⇒jeans,pants)and you want exact to come before parent and sibling concepts. -
as_distinct_terms: Biases scoring towards the most significant synonym(pants OR slacks).This blog post Solr Synonyms and Taxonomies: Mea Culpa discusses Solr’s behavior with synonym expansion.
-
enableGraphQueries-
Optional
Default:
trueFor text fields, applicable when querying with
sow=false(the default). Usetruefor field types with query analyzers including graph-aware filters, e.g., Synonym Graph Filter and Word Delimiter Graph Filter.Use
falsefor field types with query analyzers including filters that can match docs when some tokens are missing, e.g., Shingle Filter.
docValuesFormat-
Optional
Default: none
Defines a custom
DocValuesFormatto use for fields of this type. This requires that a schema-aware codec, such as the Schema Codec Factory, is in use. postingsFormat-
Optional
Default: none
Defines a custom
PostingsFormatto use for fields of this type. This requires that a schema-aware codec, such as the Schema Codec Factory, is in use.
|
Lucene index back-compatibility is only supported for the default codec.
If you choose to customize the |
Field Default Properties
These are properties that can be specified either on the field types, or on individual fields to override the values provided by the field types.
The default values for each property depend on the underlying FieldType class, which in turn may depend on the version attribute of the <schema/>.
The table below includes the default value for most FieldType implementations provided by Solr, assuming a schema that declares version="1.6".
| Property | Description | Implicit Default |
|---|---|---|
|
If |
|
|
If |
|
|
If |
|
|
Control the placement of documents when a sort field is not present. |
|
|
If |
|
|
If |
|
|
If |
* |
|
If |
* |
|
Similar to |
* |
|
These options instruct Solr to maintain full term vectors for each document, optionally including position, offset, and payload information for each term occurrence in those vectors. These can be used to accelerate highlighting and other ancillary functionality, but impose a substantial cost in terms of index size. They are not necessary for typical uses of Solr. |
|
|
Instructs Solr to reject any attempts to add a document which does not have a value for this field. This property defaults to false. |
|
|
If the field has DocValues enabled, setting this to true would allow the field to be returned as if it were a stored field (even if it has |
|
|
Large fields are always lazy loaded and will only take up space in the document cache if the actual value is < 512KB. This option requires |
|
Choosing Appropriate Numeric Types
For general numeric needs, consider using one of the IntPointField, LongPointField, FloatPointField, or DoublePointField classes, depending on the specific values you expect.
These "Dimensional Point" based numeric classes use specially encoded data structures to support efficient range queries regardless of the size of the ranges used.
Enable DocValues on these fields as needed for sorting and/or faceting.
Some Solr features may not yet work with "Dimensional Points", in which case you may want to consider the equivalent TrieIntField, TrieLongField, TrieFloatField, and TrieDoubleField classes.
These field types are deprecated and are likely to be removed in a future major Solr release, but they can still be used if necessary.
Configure a precisionStep="0" if you wish to minimize index size, but if you expect users to make frequent range queries on numeric types, use the default precisionStep (by not specifying it) or specify it as precisionStep="8" (which is the default).
This offers faster speed for range queries at the expense of increasing index size.
Working With Text
Handling text properly will make your users happy by providing them with the best possible results for text searches.
One technique is using a text field as a catch-all for keyword searching.
Most users are not sophisticated about their searches and the most common search is likely to be a simple keyword search.
You can use copyField to take a variety of fields and funnel them all into a single text field for keyword searches.
In the schema for the "techproducts" example included with Solr, copyField declarations are used to dump the contents of cat, name, manu, features, and includes into a single field, text. In addition, it could be a good idea to copy ID into text in case users wanted to search for a particular product by passing its product number to a keyword search.
Another technique is using copyField to use the same field in different ways.
Suppose you have a field that is a list of authors, like this:
Schildt, Herbert; Wolpert, Lewis; Davies, P.
For searching by author, you could tokenize the field, convert to lower case, and strip out punctuation:
schildt / herbert / wolpert / lewis / davies / p
For sorting, just use an untokenized field, converted to lower case, with punctuation stripped:
schildt herbert wolpert lewis davies p
Finally, for faceting, use the primary author only via a StrField:
Schildt, Herbert
Field Type Similarity
A field type may optionally specify a <similarity/> that will be used when scoring documents that refer to fields with this type, as long as the "global" similarity for the collection allows it.
By default, any field type which does not define a similarity, uses BM25Similarity.
For more details, and examples of configuring both global & per-type similarities, please see Similarity.