The ExternalFileField type makes it possible to specify the values for a field in a file outside the Solr index. For such a field, the file contains mappings from a key field to the field value. Another way to think of this is that, instead of specifying the field in documents as they are indexed, Solr finds values for this field in the external file.
External fields are not searchable. They can be used only for function queries or display. For more information on function queries, see the section on Function Queries.
The ExternalFileField type is handy for cases where you want to update a particular field in many documents more often than you want to update the rest of the documents. For example, suppose you have implemented a document rank based on the number of views. You might want to update the rank of all the documents daily or hourly, while the rest of the contents of the documents might be updated much less frequently. Without ExternalFileField, you would need to update each document just to change the rank. Using ExternalFileField is much more efficient because all document values for a particular field are stored in an external file that can be updated as frequently as you wish.
In schema.xml, the definition of this field type might look like this:
The keyField attribute defines the key that will be defined in the external file. It is usually the unique key for the index, but it doesn’t need to be as long as the keyField can be used to identify documents in the index. A defVal defines a default value that will be used if there is no entry in the external file for a particular document.
Format of the External File
The file itself is located in Solr’s index directory, which by default is $SOLR_HOME/data. The name of the file should be external_fieldname_ or external_fieldname_.*. For the example above, then, the file could be named external_entryRankFile or external_entryRankFile.txt.
If any files using the name pattern .* (such as .txt) appear, the last (after being sorted by name) will be used and previous versions will be deleted. This behavior supports implementations on systems where one may not be able to overwrite a file (for example, on Windows, if the file is in use).
The file contains entries that map a key field, on the left of the equals sign, to a value, on the right. Here are a few example entries:
The keys listed in this file do not need to be unique. The file does not need to be sorted, but Solr will be able to perform the lookup faster if it is.
Reloading an External File
It’s possible to define an event listener to reload an external file when either a searcher is reloaded or when a new searcher is started. See the section Query-Related Listeners for more information, but a sample definition in solrconfig.xml might look like this:
The PreAnalyzedField type provides a way to send to Solr serialized token streams, optionally with independent stored values of a field, and have this information stored and indexed without any additional text processing applied in Solr. This is useful if user wants to submit field content that was already processed by some existing external text processing pipeline (e.g., it has been tokenized, annotated, stemmed, synonyms inserted, etc.), while using all the rich attributes that Lucene’s TokenStream provides (per-token attributes).
The serialization format is pluggable using implementations of PreAnalyzedParser interface. There are two out-of-the-box implementations:
JsonPreAnalyzedParser: as the name suggests, it parses content that uses JSON to represent field’s content. This is the default parser to use if the field type is not configured otherwise.
SimplePreAnalyzedParser: uses a simple strict plain text format, which in some situations may be easier to create than JSON.
There is only one configuration parameter, parserImpl. The value of this parameter should be a fully qualified class name of a class that implements PreAnalyzedParser interface. The default value of this parameter is org.apache.solr.schema.JsonPreAnalyzedParser.
By default, the query-time analyzer for fields of this type will be the same as the index-time analyzer, which expects serialized pre-analyzed text. You must add a query type analyzer to your fieldType in order to perform analysis on non-pre-analyzed queries. In the example below, the index-time analyzer expects the default JSON serialization format, and the query-time analyzer will employ StandardTokenizer/LowerCaseFilter:
Note that unknown attributes and their values are ignored, so in this example, the “a” attribute on the first token and the " " (escaped space) attribute on the second token are ignored, along with their values, because they are not among the supported attribute names.