You might want to interpret some document fields in more than one way. Solr has a mechanism for making copies of fields so that you can apply several distinct field types to a single piece of incoming information.
The name of the field you want to copy is the source, and the name of the copy is the destination. In schema.xml
, it’s very simple to make copies of fields:
<copyField source="cat" dest="text" maxChars="30000" />
In this example, we want Solr to copy the cat
field to a field named text
. Fields are copied before analysis is done, meaning you can have two fields with identical original content, but which use different analysis chains and are stored in the index differently.
In the example above, if the text
destination field has data of its own in the input documents, the contents of the cat
field will be added as additional values – just as if all of the values had originally been specified by the client. Remember to configure your fields as multivalued="true"
if they will ultimately get multiple values (either from a multivalued source or from multiple copyField
directives).
A common usage for this functionality is to create a single "search" field that will serve as the default query field when users or clients do not specify a field to query. For example, title
, author
, keywords
, and body
may all be fields that should be searched by default, with copy field rules for each field to copy to a catchall
field (for example, it could be named anything). Later you can set a rule in solrconfig.xml
to search the catchall
field by default. One caveat to this is your index will grow when using copy fields. However, whether this becomes problematic for you and the final size will depend on the number of fields being copied, the number of destination fields being copied to, the analysis in use, and the available disk space.
The maxChars
parameter, an int
parameter, establishes an upper limit for the number of characters to be copied from the source value when constructing the value added to the destination field. This limit is useful for situations in which you want to copy some data from the source field, but also control the size of index files.
Both the source and the destination of copyField
can contain either leading or trailing asterisks, which will match anything. For example, the following line will copy the contents of all incoming fields that match the wildcard pattern *_t
to the text field.:
<copyField source="*_t" dest="text" maxChars="25000" />
The |
Copying is done at the stream source level and no copy feeds into another copy. This means that copy fields cannot be chained i.e., you cannot copy from here
to there
and then from there
to elsewhere
. However, the same source field can be copied to multiple destination fields:
<copyField source="here" dest="there"/>
<copyField source="here" dest="elsewhere"/>