Major Changes in Solr 8
Solr 8.0 is a major new release of Solr.
This page highlights the biggest changes, including new features you may want to be aware of, and changes in default behavior and deprecated features that have been removed.
Solr 8 Upgrade Planning
Before staring an upgrade to Solr 8, please take the time to review all information about changes from the version you are currently on up to Solr 8.
You should also consider all changes that have been made to Solr in any version you have not upgraded to already. For example, if you are currently using Solr 7.4, you should review changes made in all subsequent 7.x releases in addition to changes for 8.0.
A thorough review of the list in Major Changes in Earlier 7.x Versions, below, as well as the CHANGES.txt in your Solr instance will help you plan your migration to Solr 8.
Upgrade Prerequisites
If using SolrCloud, you must be on Solr 7.3.0 or higher. Solr’s LeaderInRecovery (LIR) functionality changed significantly in Solr 7.3. While these changes were back-compatible for all subsequent 7.x releases, that compatibility has been removed in 8.0. In order to upgrade to Solr 8.x, all nodes of your cluster must be running Solr 7.3 or higher. If an upgrade is attempted with nodes running versions earlier than 7.3, documents could be lost.
If you are not using Solr in SolrCloud mode (you use Standalone Mode instead), we expect you can upgrade to Solr 8 from any 7.x version without major issues.
Rolling Upgrades with Solr 8
If you are planning to upgrade your cluster using a rolling upgrade model (upgrade each node in succession, as opposed to standing up a brand new 8.x cluster), please read the following carefully.
Solr nodes can listen and serve HTTP/2 or HTTP/1 requests. By default, most internal requests are sent using HTTP/2. This means, though, that by default Solr 8.0 nodes cannot communicate with nodes running pre-8.0 versions of Solr.
However you can start Solr 8.0 with a parameter to force HTTP/1.1 communication until all nodes of the cluster have been upgraded. These are the steps to do rolling updates:
Do rolling updates as normally, but start the Solr 8.0 nodes with
-Dsolr.http1=true
as startup parameter. When using this parameter internal requests are sent by using HTTP/1.1../bin/solr start -c -Dsolr.http1=true -z localhost:2481/solr -s /path/to/solr/home
Note the above command must be customized for your environment. The section Solr Control Script Reference has all the possible options. If you are running Solr as a service, you may prefer to review the section Upgrading a Solr Cluster.
- When all nodes have been upgraded to 8.0, restart each one without the
-Dsolr.http1
parameter.
Reindexing After Upgrades
It is always strongly recommended that you fully reindex your documents after a major version upgrade.
Solr has a new section of the Reference Guide, Reindexing which covers several strategies for how to reindex.
New Features & Enhancements
HTTP/2 Support
As of Solr 8, Solr nodes support HTTP/2 requests.
Until now, Solr was limited to HTTP/1.1 only. HTTP/1.1 practically allows only one outstanding request per TCP connection which means that for sending multiple requests at the same time multiple TCP connections must be established. This leads to a waste of resources on both-sides and long garbage collection (GC) pauses.
Solr 8 with HTTP/2 support overcomes that problem by allowing multiple requests to be sent in parallel using a same TCP connection.
SSL Support with HTTP/2
In order to support SSL over HTTP/2 connections, Solr uses ALPN.
Java 8 does not include an implementation of ALPN, therefore Solr will start with HTTP/1 only when SSL is enabled and Java 8 is in use.
Client Changes for HTTP/2
Http2SolrClient
with HTTP/2 and async capabilities based on Jetty Client is introduced. This client replaced
HttpSolrClient
and ConcurrentUpdateSolrClient
for sending most internal requests (sent by
UpdateShardHandler
and HttpShardHandler
).
However this causes the following changes in configuration and authentication setup:
- The
updateShardHandler
parametermaxConnections
is no longer used and has been removed. - The
HttpShardHandler
parametermaxConnections
parameter is no longer being used and has been removed. - Custom
AuthenticationPlugin
implementations must provide their own setup forHttp2SolrClient
through implementingHttpClientBuilderPlugin.setup
, or internal requests will not be able to be authenticated.
Metrics Changes for HTTP/2
The Http2SolrClient
does not support exposing connection-related metrics. For this reason, the following metrics
are no longer available:
- Metrics from
QUERY.httpShardHandler
:availableConnections
leasedConnections
maxConnections
pendingConnections
- Metrics from
UPDATE.updateShardHandler
availableConnections
leasedConnections
maxConnections
pendingConnections
Nested Documents
Several improvements have been made for nested document support.
Solr now has the ability to store information about document relationships in the index. This stored information can be used for queries.
The Child Document Transformer can also now return children in nested form if the relationships have been properly stored in the index.
There are a few important changes to highlight in the context of upgrading to Solr 8:
When JSON data is sent to Solr with nested child documents split using the
split
parameter, the child documents will now be associated to their parents by the field/label string used in the JSON instead of anonymously.Most users probably won’t notice the distinction since the label is lost unless special fields are in the schema. This choice used to be toggleable with an internal/expert
anonChildDocs
parameter flag, which has been removed.Deleting (or updating) documents by their uniqueKey is now scoped to only consider root documents, not child/nested documents. Thus a delete-by-id won’t work on a child document (it will fail silently), and an attempt to update a child document by providing a new document with the same ID would add a new document (which will probably be erroneous).
Both these actions were and still are problematic. In-place-updates are safe though. If you want to delete certain child documents and if you know they don’t themselves have nested children then you must do so with a delete-by-query technique.
- Solr has a new field in the
_default
configSet, called_nest_path_
. This field stores the path of the document in the hierarchy for non-root documents.
See the sections Indexing Nested Documents and Searching Nested Documents for more information and configuration details.
Configuration and Default Parameter Changes
Schema Changes in 8.0
The following changes impact how fields behave.
Default Scoring (SimilarityFactory)
If you explicitly use
BM25SimilarityFactory
in your schema, the absolute scoring will be lower since Lucene changed the calculation of BM25 to remove a multiplication factor (for technical details, see LUCENE-8563 or SOLR-13025). Ordering of documents will not change in the normal case. UseLegacyBM25SimilarityFactory
if you need to force the old 6.x/7.x scoring.Note that if you have not specified any similarityFactory in the schema, or use the default
SchemaSimilarityFactory
, thenLegacyBM25Similarity
is automatically selected when the value forluceneMatchVersion
is lower than8.0.0
.See also the section Similarity for more information.
Memory Codecs Removed
Memory codecs have been removed from Lucene (
MemoryPostings
,MemoryDocValues
) and are no longer available in Solr. If you usedpostingsFormat="Memory"
ordocValuesFormat="Memory"
on any field or field type configuration then either remove that setting to use the default or experiment with one of the other options.For more information on defining a codec, see the section Codec Factory; for more information on field properties, see the section Field Type Definitions and Properties.
LowerCaseTokenizer
- The
LowerCaseTokenizer
has been deprecated and is likely to be removed in Solr 9. Users are encouraged to use theLetterTokenizer
and theLowerCaseFilter
instead.
Default Configset
- The
_default
configset now includes aignored_*
dynamic field rule.
Indexing Changes in 8.0
The following changes impact how documents are indexed.
Index-time Boosts
Index-time boosts were removed from Lucene in version 7.0, and in Solr 7.x the syntax was still allowed (although it logged a warning in the logs). The syntax was similar to:
{"id":"1", "val_s":{"value":"foo", "boost":2.0}}
This syntax has been removed entirely and if sent to Solr it will now produce an error. This was done in conjunction with the improvements for nested document support.
ParseDateFieldUpdateProcessorFactory
The date format patterns used by
ParseDateFieldUpdateProcessorFactory
(used by default in "schemaless mode") are now interpreted by Java 8’sjava.time.DateTimeFormatter
instead of Joda Time. The pattern language is very similar but not the same. Typically, simply update the pattern by changing an uppercase 'Z' to lowercase 'z' and that’s it.For the current recommended set of patterns in schemaless mode, see the section Schemaless Mode, or simply examine the
_default
configSet (found inserver/solr/configsets
).Also note that the default set of date patterns (formats) have expanded from previous releases to subsume those patterns previously handled by the "extract" contrib (Solr Cell / Tika).
Solr Cell
- The extraction contrib (Solr Cell) no longer does any date parsing, and thus no longer supports the
date.formats
parameter. To ensure date strings are properly parsed, use theParseDateFieldUpdateProcessorFactory
in your update chain. This update request processor is found by default with the "parse-date" update processor when running Solr in "schemaless mode".
Langid Contrib
- The
LanguageIdentifierUpdateProcessor
base class in the langid contrib (found incontrib/langid
) changed some method signatures. If you have a custom language identifier implementation you will need to adapt your code. See the Jira issue SOLR-11774 for details of the changes.
Query Changes in 8.0
The following changes impact query behavior.
Highlighting
- The Unified Highlighter parameter
hl.weightMatches
now defaults totrue
. See the section Highlighting for more information about Highlighter parameters.
eDisMax Query Parser
- The eDisMax query parser will now thrown an error when the
qf
parameter refers to a nonexistent field.
Function Query Parser
- The Function Query Parser now returns scores that are equal to zero (0) when a negative value is produced. This change is due to the fact that Lucene now requires scores to be positive.
Authentication & Security Changes in 8.0
- Authentication plugins can now intercept internode requests on a per-request basis.
- The Basic Authentication plugin now has an option
forwardCredentials
to prevent Basic Auth headers from being sent for inter-node requests. - Metrics are now reported for authentication requests.
UI Changes in 8.0
- The Radial Graph view of a Solr cluster when running in SolrCloud mode has been removed.
- The Nodes view introduced in Solr 7.5 is now the default when choosing the "Cloud" tab in the left navigation menu.
Autoscaling Changes in 8.0
The default replica placement strategy used in Solr has been reverted to the "legacy" policy used by Solr 7.4 and previous versions. This is due to multiple bugs in the autoscaling based replica placement strategy that was made default in Solr 7.5 which causes multiple replicas of the same shard to be placed on the same node in addition to the
maxShardsPerNode
andcreateNodeSet
parameters being ignored.Although the default has changed, autoscaling will continue to be used if a cluster policy or preference is specified or a collection level policy is in use.
The default replica placement strategy can be changed to use autoscaling again by setting a cluster property:
curl -X POST -H 'Content-type:application/json' --data-binary ' { "set-obj-property": { "defaults" : { "cluster": { "useLegacyReplicaAssignment":false } } } }' http://$SOLR_HOST:$SOLR_PORT/api/cluster
- A new command-line option is available via
bin/solr autoscaling
to calculate autoscaling policy suggestions and diagnostic information outside of the running Solr cluster. This option can use the existing autoscaling policy, or test the impact of a new one from a file located on the server filesystem.
Dependency Updates in 8.0
- All Hadoop dependencies have been upgraded to Hadoop 3.2.0 (from 2.7.2).
Major Changes in Earlier 7.x Versions
The following is a list of major changes released between Solr 7.1 and 7.7.
Please be sure to review this list so you understand what may have changed between the version of Solr you are currently running and Solr 8.0.
Solr 7.7
See the 7.7 Release Notes for an overview of the main new features in Solr 7.7.
When upgrading to Solr 7.7.x, users should be aware of the following major changes from v7.6:
Admin UI
The Admin UI now presents a login screen for any users with authentication enabled on their cluster. Clusters with Basic Authentication will prompt users to enter a username and password. On clusters configured to use Kerberos Authentication, users will be directed to configure their browser to provide an appropriate Kerberos ticket.
The login screen’s purpose is cosmetic only - Admin UI-triggered Solr requests were subject to authentication prior to 7.7 and still are today. The login screen changes only the user experience of providing this authentication.
Distributed Requests
The
shards
parameter, used to manually select the shards and replicas that receive distributed requests, now checks nodes against a whitelist of acceptable values for security reasons.In SolrCloud mode this whitelist is automatically configured to contain all live nodes. In standalone mode the whitelist is empty by default. Upgrading users who use the
shards
parameter in standalone mode can correct this value by setting theshardsWhitelist
property in anyshardHandler
configurations in theirsolrconfig.xml
file.For more information, see the Distributed Request documentation.
Solr 7.6
See the 7.6 Release Notes for an overview of the main new features in Solr 7.6.
When upgrading to Solr 7.6, users should be aware of the following major changes from v7.5:
Collections
The JSON parameter to set cluster-wide default cluster properties with the CLUSTERPROP command has changed.
The old syntax nested the defaults into a property named
clusterDefaults
. The new syntax uses onlydefaults
. The command to use is stillset-obj-property
.An example of the new syntax is:
{ "set-obj-property": { "defaults" : { "collection": { "numShards": 2, "nrtReplicas": 1, "tlogReplicas": 1, "pullReplicas": 1 } } } }
The old syntax will be supported until at least Solr 9, but users are advised to begin using the new syntax as soon as possible.
- The parameter
min_rf
has been deprecated and no longer needs to be provided in order to see the achieved replication factor. This information will now always be returned to the client with the response.
Autoscaling
An autoscaling policy is now used as the default strategy for selecting nodes on which new replicas or replicas of new collections are created.
A default policy is now in place for all users, which will sort nodes by the number of cores and available freedisk, which means by default a node with the fewest number of cores already on it and the highest available freedisk will be selected for new core creation.
- The change described above has two additional impacts on the
maxShardsPerNode
parameter:- It removes the restriction against using
maxShardsPerNode
when an autoscaling policy is in place. This parameter can now always be set when creating a collection. It removes the default setting of
maxShardsPerNode=1
when an autoscaling policy is in place. It will be set correctly (if required) regardless of whether an autoscaling policy is in place or not.The default value of
maxShardsPerNode
is still1
. It can be set to-1
if the old behavior of unlimitedmaxSharedsPerNode
is desired.
- It removes the restriction against using
DirectoryFactory
Lucene has introduced the
ByteBuffersDirectoryFactory
as a replacement for theRAMDirectoryFactory
, which will be removed in Solr 9.While most users are still encouraged to use the
NRTCachingDirectoryFactory
, which allows Lucene to select the best directory factory to use, if you have explicitly configured Solr to use theRAMDirectoryFactory
, you are encouraged to switch to the new implementation as soon as possible before Solr 9 is released.For more information about the new directory factory, see the Jira issue LUCENE-8438.
For more information about the directory factory configuration in Solr, see the section DataDir and DirectoryFactory in SolrConfig.
Solr 7.5
See the 7.5 Release Notes for an overview of the main new features in Solr 7.5.
When upgrading to Solr 7.5, users should be aware of the following major changes from v7.4:
Schema Changes
- Since Solr 7.0, Solr’s schema field-guessing has created
_str
fields for all_txt
fields, and returned those by default with queries. As of 7.5,_str
fields will no longer be returned by default. They will still be available and can be requested with thefl
parameter on queries. See also the section on field guessing for more information about how schema field guessing works. - The Standard Filter, which has been non-operational since at least Solr v4, has been removed.
Index Merge Policy
When using the
TieredMergePolicy
, the default merge policy for Solr,optimize
andexpungeDeletes
now respect themaxMergedSegmentMB
configuration parameter, which defaults to5000
(5GB).If it is absolutely necessary to control the number of segments present after optimize, specify
maxSegments
as a positive integer. SettingmaxSegments
higher than1
are honored on a "best effort" basis.The
TieredMergePolicy
will also reclaim resources from segments that exceedmaxMergedSegmentMB
more aggressively than earlier.
UIMA Removed
- The UIMA contrib has been removed from Solr and is no longer available.
Logging
- Solr’s logging configuration file is now located in
server/resources/log4j2.xml
by default. - A bug for Windows users has been corrected. When using Solr’s examples (
bin/solr start -e
) log files will now be put in the correct location (example/
instead ofserver
). See also Solr Examples and Solr Control Script Reference for more information.
Solr 7.4
See the 7.4 Release Notes for an overview of the main new features in Solr 7.4.
When upgrading to Solr 7.4, users should be aware of the following major changes from v7.3:
Logging
- Solr now uses Log4j v2.11. The Log4j configuration is now in
log4j2.xml
rather thanlog4j.properties
files. This is a server side change only and clients using SolrJ won’t need any changes. Clients can still use any logging implementation which is compatible with SLF4J. We now let Log4j handle rotation of Solr logs at startup, andbin/solr
start scripts will no longer attempt this nor move existing console or garbage collection logs intologs/archived
either. See Configuring Logging for more details about Solr logging. - Configuring
slowQueryThresholdMillis
now logs slow requests to a separate file namedsolr_slow_requests.log
. Previously they would get logged in thesolr.log
file.
Legacy Scaling (non-SolrCloud)
- In the master-slave model of scaling Solr, a slave no longer commits an empty index when a completely new index is detected on master during replication. To return to the previous behavior pass
false
toskipCommitOnMasterVersionZero
in the slave section of replication handler configuration, or pass it to thefetchindex
command.
If you are upgrading from a version earlier than Solr 7.3, please see previous version notes below.
Solr 7.3
See the 7.3 Release Notes for an overview of the main new features in Solr 7.3.
When upgrading to Solr 7.3, users should be aware of the following major changes from v7.2:
ConfigSets
- Collections created without specifying a configset name have used a copy of the
_default
configset since Solr 7.0. Before 7.3, the copied configset was named the same as the collection name, but from 7.3 onwards it will be named with a new ".AUTOCREATED" suffix. This is to prevent overwriting custom configset names.
Learning to Rank
- The
rq
parameter used with Learning to Rankrerank
query parsing no longer considers thedefType
parameter. See Running a Rerank Query for more information about this parameter.
Autoscaling & AutoAddReplicas
- The behaviour of the autoscaling system will now pause all triggers from execution between the start of actions and the end of a cool down period. The triggers will resume after the cool down period expires. Previously, the cool down period was a fixed period started after actions for a trigger event completed and during this time all triggers continued to run but any events were rejected and tried later.
- The throttling mechanism used to limit the rate of autoscaling events processed has been removed. This deprecates the
actionThrottlePeriodSeconds
setting in theset-properties
Autoscaling API which is now non-operational. Use thetriggerCooldownPeriodSeconds
parameter instead to pause event processing. - The default value of
autoReplicaFailoverWaitAfterExpiration
, used with the AutoAddReplicas feature, has increased to 120 seconds from the previous default of 30 seconds. This affects how soon Solr adds new replicas to replace the replicas on nodes which have either crashed or shutdown.
Logging
- The default Solr log file size and number of backups have been raised to 32MB and 10 respectively. See the section Configuring Logging for more information about how to configure logging.
SolrCloud
The old Leader-In-Recovery implementation (implemented in Solr 4.9) is now deprecated and replaced. Solr will support rolling upgrades from old 7.x versions of Solr to future 7.x releases until the last release of the 7.x major version.
This means to upgrade to Solr 8 in the future, you will need to be on Solr 7.3 or higher.
- Replicas which are not up-to-date are no longer allowed to become leader. Use the FORCELEADER command of the Collections API to allow these replicas become leader.
Spatial
- If you are using the spatial JTS library with Solr, you must upgrade to 1.15.0. This new version of JTS is now dual-licensed to include a BSD style license. See the section on Spatial Search for more information.
Highlighting
- The top-level
<highlighting>
element insolrconfig.xml
is now officially deprecated in favour of the equivalent<searchComponent>
syntax. This element has been out of use in default Solr installations for several releases already.
If you are upgrading from a version earlier than Solr 7.2, please see previous version notes below.
Solr 7.2
See the 7.2 Release Notes for an overview of the main new features in Solr 7.2.
When upgrading to Solr 7.2, users should be aware of the following major changes from v7.1:
Local Parameters
Starting a query string with local parameters
{!myparser …}
is used to switch from one query parser to another, and is intended for use by Solr system developers, not end users doing searches. To reduce negative side-effects of unintended hack-ability, Solr now limits the cases when local parameters will be parsed to only contexts in which the default parser is "lucene" or "func".So, if
defType=edismax
thenq={!myparser …}
won’t work. In that example, put the desired query parser into thedefType
parameter.Another example is if
deftype=edismax
thenhl.q={!myparser …}
won’t work for the same reason. In this example, either put the desired query parser into thehl.qparser
parameter or sethl.qparser=lucene
. Most users won’t run into these cases but some will need to change.If you must have full backwards compatibility, use
luceneMatchVersion=7.1.0
or an earlier version.
eDisMax Query Parser
The eDisMax parser by default no longer allows subqueries that specify a Solr parser using either local parameters, or the older
_query_
magic field trick.For example,
{!prefix f=myfield v=enterp}
or_query_:"{!prefix f=myfield v=enterp}"
are not supported by default any longer. If you want to allow power-users to do this, setuf=* query
or some other value that includes_query_
.If you need full backwards compatibility for the time being, use
luceneMatchVersion=7.1.0
or something earlier.
If you are upgrading from a version earlier than Solr 7.1, please see previous version notes below.
Solr 7.1
See the 7.1 Release Notes for an overview of the main new features of Solr 7.1.
When upgrading to Solr 7.1, users should be aware of the following major changes from v7.0:
AutoAddReplicas
The feature to automatically add replicas if a replica goes down, previously available only when storing indexes in HDFS, has been ported to the autoscaling framework. Due to this,
autoAddReplicas
is now available to all users even if their indexes are on local disks.Existing users of this feature should not have to change anything. However, they should note these changes:
- Behavior: Changing the
autoAddReplicas
property from disabled (false
) to enabled (true
) using MODIFYCOLLECTION API no longer replaces down replicas for the collection immediately. Instead, replicas are only added if a node containing them went down whileautoAddReplicas
was enabled. The parametersautoReplicaFailoverBadNodeExpiration
andautoReplicaFailoverWorkLoopDelay
are no longer used. Deprecations: Enabling/disabling autoAddReplicas cluster-wide with the API will be deprecated; use suspend/resume trigger APIs with
name=".auto_add_replicas"
instead.More information about the changes to this feature can be found in the section SolrCloud Automatically Adding Replicas.
- Behavior: Changing the
Metrics Reporters
- Shard and cluster metric reporter configuration now require a
class
attribute.- If a reporter configures the
group="shard"
attribute then please also configure theclass="org.apache.solr.metrics.reporters.solr.SolrShardReporter"
attribute. If a reporter configures the
group="cluster"
attribute then please also configure theclass="org.apache.solr.metrics.reporters.solr.SolrClusterReporter"
attribute.See the section Shard and Cluster Reporters for more information.
- If a reporter configures the
Streaming Expressions
- All Stream Evaluators in
solrj.io.eval
have been refactored to have a simpler and more robust structure. This simplifies and condenses the code required to implement a new Evaluator and makes it much easier for evaluators to handle differing data types (primitives, objects, arrays, lists, and so forth).
ReplicationHandler
- In the ReplicationHandler, the
master.commitReserveDuration
sub-element is deprecated. Instead please configure a directcommitReserveDuration
element for use in all modes (master, slave, cloud).
RunExecutableListener
- The
RunExecutableListener
was removed for security reasons. If you want to listen to events caused by updates, commits, or optimize, write your own listener as native Java class as part of a Solr plugin.
XML Query Parser
- In the XML query parser (
defType=xmlparser
or{!xmlparser … }
) the resolving of external entities is now disallowed by default.
If you are upgrading from a version earlier than Solr 7.0, please see Major Changes in Solr 7 before starting your upgrade.