UpdateHandlers in SolrConfig
The settings in this section are configured in the <updateHandler>
element in solrconfig.xml
and may affect the performance of index updates. These settings affect how updates are done internally. <updateHandler>
configurations do not affect the higher level configuration of RequestHandlers that process client update requests.
<updateHandler class="solr.DirectUpdateHandler2">
...
</updateHandler>
Commits
Data sent to Solr is not searchable until it has been committed to the index. The reason for this is that in some cases commits can be slow and they should be done in isolation from other possible commit requests to avoid overwriting data. So, it’s preferable to provide control over when data is committed. Several options are available to control the timing of commits.
commit and softCommit
In Solr, a commit
is an action which asks Solr to "commit" those changes to the Lucene index files. By default commit actions result in a "hard commit" of all the Lucene index files to stable storage (disk). When a client includes a commit=true
parameter with an update request, this ensures that all index segments affected by the adds & deletes on an update are written to disk as soon as index updates are completed.
If an additional flag softCommit=true
is specified, then Solr performs a 'soft commit', meaning that Solr will commit your changes to the Lucene data structures quickly but not guarantee that the Lucene index files are written to stable storage. This is an implementation of Near Real Time storage, a feature that boosts document visibility, since you don’t have to wait for background merges and storage (to ZooKeeper, if using SolrCloud) to finish before moving on to something else. A full commit means that, if a server crashes, Solr will know exactly where your data was stored; a soft commit means that the data is stored, but the location information isn’t yet stored. The tradeoff is that a soft commit gives you faster visibility because it’s not waiting for background merges to finish.
For more information about Near Real Time operations, see Near Real Time Searching.
autoCommit
These settings control how often pending updates will be automatically pushed to the index. An alternative to autoCommit
is to use commitWithin
, which can be defined when making the update request to Solr (i.e., when pushing documents), or in an update RequestHandler.
maxDocs
- The number of updates that have occurred since the last commit.
maxTime
- The number of milliseconds since the oldest uncommitted update.
maxSize
- The maximum size of the transaction log (tlog) on disk, after which a hard commit is triggered. This is useful when the size of documents is unknown and the intention is to restrict the size of the transaction log to reasonable size. Valid values can be bytes (default with no suffix), kilobytes (if defined with a
k
suffix, as in25k
), megabytes (m
) or gigabytes (g
). openSearcher
- Whether to open a new searcher when performing a commit. If this is
false
, the commit will flush recent index changes to stable storage, but does not cause a new searcher to be opened to make those changes visible. The default istrue
.
If any of the maxDocs
, maxTime
, or maxSize
limits are reached, Solr automatically performs a commit operation. If the autoCommit
tag is missing, then only explicit commits will update the index. The decision whether to use auto-commit or not depends on the needs of your application.
Determining the best auto-commit settings is a tradeoff between performance and accuracy. Settings that cause frequent updates will improve the accuracy of searches because new content will be searchable more quickly, but performance may suffer because of the frequent updates. Less frequent updates may improve performance but it will take longer for updates to show up in queries.
<autoCommit>
<maxDocs>10000</maxDocs>
<maxTime>30000</maxTime>
<maxSize>512m</maxSize>
<openSearcher>false</openSearcher>
</autoCommit>
You can also specify 'soft' autoCommits in the same way that you can specify 'soft' commits, except that instead of using autoCommit
you set the autoSoftCommit
tag.
<autoSoftCommit>
<maxTime>60000</maxTime>
</autoSoftCommit>
commitWithin
The commitWithin
settings allow forcing document commits to happen in a defined time period. This is used most frequently with Near Real Time Searching, and for that reason the default is to perform a soft commit. This does not, however, replicate new documents to slave servers in a master/slave environment. If that’s a requirement for your implementation, you can force a hard commit by adding a parameter, as in this example:
<commitWithin>
<softCommit>false</softCommit>
</commitWithin>
With this configuration, when you call commitWithin
as part of your update message, it will automatically perform a hard commit every time.
Event Listeners
The UpdateHandler section is also where update-related event listeners can be configured. These can be triggered to occur after any commit (event="postCommit"
) or only after optimize commands (event="postOptimize"
).
Users can write custom update event listener classes in Solr plugins. As of Solr 7.1,
RunExecutableListener
was removed for security reasons.
Transaction Log
As described in the section RealTime Get, a transaction log is required for that feature. It is configured in the updateHandler
section of solrconfig.xml
.
Realtime Get currently relies on the update log feature, which is enabled by default. It relies on an update log, which is configured in solrconfig.xml
, in a section like:
<updateLog>
<str name="dir">${solr.ulog.dir:}</str>
</updateLog>
Three additional expert-level configuration settings affect indexing performance and how far a replica can fall behind on updates before it must enter into full recovery - see the section on write side fault tolerance for more information:
numRecordsToKeep
- The number of update records to keep per log. The default is
100
. maxNumLogsToKeep
- The maximum number of logs keep. The default is
10
. numVersionBuckets
- The number of buckets used to keep track of max version values when checking for re-ordered updates; increase this value to reduce the cost of synchronizing access to version buckets during high-volume indexing, this requires
(8 bytes (long) * numVersionBuckets)
of heap space per Solr core. The default is65536
.
An example, to be included under <config><updateHandler>
in solrconfig.xml
, employing the above advanced settings:
<updateLog>
<str name="dir">${solr.ulog.dir:}</str>
<int name="numRecordsToKeep">500</int>
<int name="maxNumLogsToKeep">20</int>
<int name="numVersionBuckets">65536</int>
</updateLog>
Other Options
In some cases complex updates (such as spatial/shape) may take very long time to complete. In the default configuration other updates that fall into the same internal version bucket will wait indefinitely and eventually these outstanding requests may pile up and lead to thread exhaustion and eventually to OutOfMemory errors.
The option versionBucketLockTimeoutMs
in the updateHandler
section helps to prevent that by
specifying a limited timeout for such extremely long running update requests. If this limit
is reached this update will fail but it won’t block forever all other updates. See SOLR-12833 for more details.
There’s a memory cost associated with this setting. Values greater than the default 0 (meaning unlimited timeout) cause Solr to use a different internal implementation of the version bucket, which increases memory consumption from ~1.5MB to ~6.8MB per Solr core.
An example of specifying this option under <config>
section of solrconfig.xml
:
<updateHandler class="solr.DirectUpdateHandler2">
...
<int name="versionBucketLockTimeoutMs">10000</int>
</updateHandler>