Near Real Time (NRT) search means that documents are available for search soon after being indexed. NRT searching is one of the main features of SolrCloud and is rarely attempted in master/slave configurations.
Document durability and searchability are controlled by
commits. The "Near" in "Near Real Time" is configurable to meet the needs of your application. Commits are either "hard" or "soft" and can be issued by a client (say SolrJ), via a REST call or configured to occur automatically in
solrconfig.xml. The recommendation usually gives is to configure your commit strategy in
solrconfig.xml (see below) and avoid issuing commits externally.
Typically in NRT applications, hard commits are configured with
openSearcher=false, and soft commits are configured to make documents visible for search.
When a commit occurs, various background tasks are initiated, segment merging for example. These background tasks do not block additional updates to the index nor do they delay the availability of the documents for search.
When configuring for NRT, pay special attention to cache and autowarm settings as they can have a significant impact on NRT performance. For extremely short autoCommit intervals, consider disabling caching and autowarming completely.
Commits and Searching
A hard commit calls
fsync on the index files to ensure they have been flushed to stable storage. The current transaction log is closed and a new one is opened. See the "transaction log" discussion below for how data is recovered in the absence of a hard commit. Optionally a hard commit can also make documents visible for search, but this is not recommended for NRT searching as it is more expensive than a soft commit.
A soft commit is faster since it only makes index changes visible and does not
fsync index files, start a new segment or start a new transaction log. Search collections that have NRT requirements will want to soft commit often enough to satisfy the visibility requirements of the application. A softCommit may be "less expensive" than a hard commit (openSearcher=true), but it is not free. It is recommended that this be set for as long as is reasonable given the application requirements.
Both hard and soft commits have two primary configuration parameters:
- Integer. Defines the number of updates to process before activating.
- Integer. The number of milliseconds to wait before activating.
If both of these parameters are specified, the first one to expire is honored. Generally, it is preferred to use
maxTime rather than
maxDocs, especially when indexing large numbers of documents in batches. Use
maxTime judiciously to fine-tune your commit strategies.
Hard commit has an additional parameter
- true|false, whether to make documents visible for search. For NRT applications this is usually set to
soft commitis configured to control when documents are visible for search.
Transaction Logs (tlogs)
Transaction logs are a "rolling window" of updates since the last hard commit. The current transaction log is closed and a new one opened each time any variety of hard commit occurs. Soft commits have no effect on the transaction log.
When tlogs are enabled, documents being added to the index are written to the tlog before the indexing call returns to the client. In the event of an un-graceful shutdown (power loss, JVM crash,
kill -9, etc.) any documents written to the tlog but not yet committed with a hard commit when Solr was stopped are replayed on startup. Therefore the data is not lost.
When Solr is shut down gracefully (using the
bin/solr stop command) Solr will close the tlog file and index segments so no replay will be necessary on startup.
One point of confusion is how much data is contained in a transaction log. A tlog does not contain all documents, only the ones since the last hard commit. Older transaction log files are deleted when no longer needed.
|Implicit in the above is that transaction logs will grow forever if hard commits are disabled. Therefore it is important that hard commits be enabled when indexing.
As mentioned above, it is usually preferable to configure your commits (both hard and soft) in
solrconfig.xml and avoid sending commits from an external source. Check your
solrconfig.xml file since the defaults are likely not tuned to your needs. Here is an example NRT configuration for the two flavors of commit, a hard commit every 60 seconds and a soft commit every 30 seconds. Note that these are not the values in some of the examples!
These parameters can be overridden at run time by defining Java "system variables", for example specifying
`-Dsolr.autoCommit.maxTime=15000 would override the hard commit interval with a value of 15 seconds.
The choices for
autoSoftCommit have different consequences. In the event of un-graceful shutdown, it can take up to the time specified in
autoCommit for Solr to replay the uncommitted documents from the transaction log.
The time chosen for
autoSoftCommit determines the maximum time after a document is sent to Solr before it becomes searchable and does not affect the transaction log. Choose as long an interval as your application can tolerate for this value, often 15-60 seconds is reasonable, or even longer depending on the requirements. In situations where the the time is set to a very short interval (say 1 second), consider disabling your caches (queryResultCache and filterCache especially) as they will have little utility.
For extremely high bulk indexing, especially for the initial load if there is no searching, consider turning off
autoSoftCommit by specifying a value of
-1 for the maxTime parameter.
Advanced Commit Options
All varieties of commits can be invoked from a SolrJ client or via a URL. The usual recommendation is to not call commits externally. For those cases where it is desirable, see Update Commands. These options are listed for XML update commands that can be issued from a browser or curl, etc., and the equivalents are available from a SolrJ client.