Major Changes in Solr 8

Solr 8.0 is a major new release of Solr.

This page highlights the biggest changes, including new features you may want to be aware of, and changes in default behavior and deprecated features that have been removed.

Solr 8 Upgrade Planning

Before starting an upgrade to Solr 8, please take the time to review all information about changes from the version you are currently on up to Solr 8.

You should also consider all changes that have been made to Solr in any version you have not upgraded to already. For example, if you are currently using Solr 7.4, you should review changes made in all subsequent 7.x releases in addition to changes for 8.0.

A thorough review of the list in Major Changes in Earlier 7.x Versions, below, as well as the CHANGES.txt in your Solr instance will help you plan your migration to Solr 8.

Upgrade Prerequisites

If using SolrCloud, you must be on Solr 7.3.0 or higher. Solr’s LeaderInRecovery (LIR) functionality changed significantly in Solr 7.3. While these changes were back-compatible for all subsequent 7.x releases, that compatibility has been removed in 8.0. In order to upgrade to Solr 8.x, all nodes of your cluster must be running Solr 7.3 or higher. If an upgrade is attempted with nodes running versions earlier than 7.3, documents could be lost.

If you are not using Solr in SolrCloud mode (you use Standalone Mode instead), we expect you can upgrade to Solr 8 from any 7.x version without major issues.

Rolling Upgrades with Solr 8

If you are planning to upgrade your cluster using a rolling upgrade model (upgrade each node in succession, as opposed to standing up a brand new 8.x cluster), please read the following carefully.

Solr nodes can listen and serve HTTP/2 or HTTP/1 requests. By default, most internal requests are sent using HTTP/2. This means, though, that by default Solr 8.0 nodes cannot communicate with nodes running pre-8.0 versions of Solr.

However you can start Solr 8.0 with a parameter to force HTTP/1.1 communication until all nodes of the cluster have been upgraded. These are the steps to do rolling updates:

  1. Do rolling updates as normally, but start the Solr 8.0 nodes with -Dsolr.http1=true as startup parameter. When using this parameter internal requests are sent by using HTTP/1.1.

    ./bin/solr start -c -Dsolr.http1=true -z localhost:2481/solr -s /path/to/solr/home

    Note the above command must be customized for your environment. The section Solr Control Script Reference has all the possible options. If you are running Solr as a service, you may prefer to review the section Upgrading a Solr Cluster.

  2. When all nodes have been upgraded to 8.0, restart each one without the -Dsolr.http1 parameter.

Reindexing After Upgrades

It is always strongly recommended that you fully reindex your documents after a major version upgrade.

Solr has a new section of the Reference Guide, Reindexing which covers several strategies for how to reindex.

New Features & Enhancements

HTTP/2 Support

As of Solr 8, Solr nodes support HTTP/2 requests.

Until now, Solr was limited to HTTP/1.1 only. HTTP/1.1 practically allows only one outstanding request per TCP connection which means that for sending multiple requests at the same time multiple TCP connections must be established. This leads to a waste of resources on both-sides and long garbage collection (GC) pauses.

Solr 8 with HTTP/2 support overcomes that problem by allowing multiple requests to be sent in parallel using a same TCP connection.

SSL Support with HTTP/2

In order to support SSL over HTTP/2 connections, Solr uses ALPN.

Java 8 does not include an implementation of ALPN, therefore Solr will start with HTTP/1 only when SSL is enabled and Java 8 is in use.

Client Changes for HTTP/2

Http2SolrClient with HTTP/2 and async capabilities based on Jetty Client is introduced. This client replaced HttpSolrClient and ConcurrentUpdateSolrClient for sending most internal requests (sent by UpdateShardHandler and HttpShardHandler).

However this causes the following changes in configuration and authentication setup:

  • The updateShardHandler parameter maxConnections is no longer used and has been removed.

  • The HttpShardHandler parameter maxConnections parameter is no longer being used and has been removed.

  • Custom AuthenticationPlugin implementations must provide their own setup for Http2SolrClient through implementing HttpClientBuilderPlugin.setup, or internal requests will not be able to be authenticated.

Metrics Changes for HTTP/2

The Http2SolrClient does not support exposing connection-related metrics. For this reason, the following metrics are no longer available:

  • Metrics from QUERY.httpShardHandler:

    • availableConnections

    • leasedConnections

    • maxConnections

    • pendingConnections

  • Metrics from UPDATE.updateShardHandler

    • availableConnections

    • leasedConnections

    • maxConnections

    • pendingConnections

Nested Documents

Several improvements have been made for nested document support.

Solr now has the ability to store information about document relationships in the index. This stored information can be used for queries.

The Child Document Transformer can also now return children in nested form if the relationships have been properly stored in the index.

There are a few important changes to highlight in the context of upgrading to Solr 8:

  • When JSON data is sent to Solr with nested child documents split using the split parameter, the child documents will now be associated to their parents by the field/label string used in the JSON instead of anonymously.

    Most users probably won’t notice the distinction since the label is lost unless special fields are in the schema. This choice used to be toggleable with an internal/expert anonChildDocs parameter flag, which has been removed.

  • Deleting (or updating) documents by their uniqueKey is now scoped to only consider root documents, not child/nested documents. Thus a delete-by-id won’t work on a child document (it will fail silently), and an attempt to update a child document by providing a new document with the same ID would add a new document (which will probably be erroneous).

    Both these actions were and still are problematic. In-place-updates are safe though. If you want to delete certain child documents and if you know they don’t themselves have nested children then you must do so with a delete-by-query technique.

  • Solr has a new field in the _default configset, called _nest_path_. This field stores the path of the document in the hierarchy for non-root documents.

See the sections Indexing Nested Documents and Searching Nested Documents for more information and configuration details.

Configuration and Default Parameter Changes

Schema Changes in 8.0

The following changes impact how fields behave.

Default Scoring (SimilarityFactory)

  • If you explicitly use BM25SimilarityFactory in your schema, the absolute scoring will be lower since Lucene changed the calculation of BM25 to remove a multiplication factor (for technical details, see LUCENE-8563 or SOLR-13025). Ordering of documents will not change in the normal case. Use LegacyBM25SimilarityFactory if you need to force the old 6.x/7.x scoring.

    Note that if you have not specified any similarityFactory in the schema, or use the default SchemaSimilarityFactory, then LegacyBM25Similarity is automatically selected when the value for luceneMatchVersion is lower than 8.0.0.

    See also the section Similarity for more information.

Memory Codecs Removed

  • Memory codecs have been removed from Lucene (MemoryPostings, MemoryDocValues) and are no longer available in Solr. If you used postingsFormat="Memory" or docValuesFormat="Memory" on any field or field type configuration then either remove that setting to use the default or experiment with one of the other options.

    For more information on defining a codec, see the section Codec Factory; for more information on field properties, see the section Field Type Definitions and Properties.

LowerCaseTokenizer

  • The LowerCaseTokenizer has been deprecated and is likely to be removed in Solr 9. Users are encouraged to use the LetterTokenizer and the LowerCaseFilter instead.

Default Configset

  • The _default configset now includes a ignored_* dynamic field rule.

Indexing Changes in 8.0

The following changes impact how documents are indexed.

Index-time Boosts

  • Index-time boosts were removed from Lucene in version 7.0, and in Solr 7.x the syntax was still allowed (although it logged a warning in the logs). The syntax was similar to:

    {"id":"1", "val_s":{"value":"foo", "boost":2.0}}

    This syntax has been removed entirely and if sent to Solr it will now produce an error. This was done in conjunction with the improvements for nested document support.

ParseDateFieldUpdateProcessorFactory

  • The date format patterns used by ParseDateFieldUpdateProcessorFactory (used by default in "schemaless mode") are now interpreted by Java 8’s java.time.DateTimeFormatter instead of Joda Time. The pattern language is very similar but not the same. Typically, simply update the pattern by changing an uppercase 'Z' to lowercase 'z' and that’s it.

    For the current recommended set of patterns in schemaless mode, see the section Schemaless Mode, or simply examine the _default configset (found in server/solr/configsets).

    Also note that the default set of date patterns (formats) have expanded from previous releases to subsume those patterns previously handled by the "extract" contrib (Solr Cell / Tika).

Solr Cell

  • The extraction contrib (Solr Cell) no longer does any date parsing, and thus no longer supports the date.formats parameter. To ensure date strings are properly parsed, use the ParseDateFieldUpdateProcessorFactory in your update chain. This update request processor is found by default with the "parse-date" update processor when running Solr in "schemaless mode".

Langid Contrib

  • The LanguageIdentifierUpdateProcessor base class in the langid contrib (found in contrib/langid) changed some method signatures. If you have a custom language identifier implementation you will need to adapt your code. See the Jira issue SOLR-11774 for details of the changes.

Query Changes in 8.0

The following changes impact query behavior.

Highlighting

  • The Unified Highlighter parameter hl.weightMatches now defaults to true. See the section Highlighting for more information about Highlighter parameters.

eDisMax Query Parser

  • The eDisMax query parser will now thrown an error when the qf parameter refers to a nonexistent field.

Function Query Parser

  • The Function Query Parser now returns scores that are equal to zero (0) when a negative value is produced. This change is due to the fact that Lucene now requires scores to be positive.

Authentication & Security Changes in 8.0

  • Authentication plugins can now intercept internode requests on a per-request basis.

  • The Basic Authentication plugin now has an option forwardCredentials to let Basic Auth headers be forwarded on inter-node requests in case of distributed search, instead of falling back to PKI.

  • Metrics are now reported for authentication requests.

UI Changes in 8.0

  • The Radial Graph view of a Solr cluster when running in SolrCloud mode has been removed.

  • The Nodes view introduced in Solr 7.5 is now the default when choosing the "Cloud" tab in the left navigation menu.

Autoscaling Changes in 8.0

  • The default replica placement strategy used in Solr has been reverted to the "legacy" policy used by Solr 7.4 and previous versions. This is due to multiple bugs in the autoscaling based replica placement strategy that was made default in Solr 7.5 which causes multiple replicas of the same shard to be placed on the same node in addition to the maxShardsPerNode and createNodeSet parameters being ignored.

    Although the default has changed, autoscaling will continue to be used if a cluster policy or preference is specified or a collection level policy is in use.

    The default replica placement strategy can be changed to use autoscaling again by setting a cluster property:

    curl -X POST -H 'Content-type:application/json' --data-binary '
      {
        "set-obj-property": {
          "defaults" : {
            "cluster": {
              "useLegacyReplicaAssignment":false
            }
          }
        }
      }' http://$SOLR_HOST:$SOLR_PORT/api/cluster

  • A new command-line option is available via bin/solr autoscaling to calculate autoscaling policy suggestions and diagnostic information outside of the running Solr cluster. This option can use the existing autoscaling policy, or test the impact of a new one from a file located on the server filesystem.

    These options have been documented in the section Testing Autoscaling Configuration and Suggestions.

Dependency Updates in 8.0

  • All Hadoop dependencies have been upgraded to Hadoop 3.2.0 (from 2.7.2).

Major Changes in Earlier 7.x Versions

The following is a list of major changes released between Solr 7.1 and 7.7.

Please be sure to review this list so you understand what may have changed between the version of Solr you are currently running and Solr 8.0.

Solr 7.7

See the 7.7 Release Notes for an overview of the main new features in Solr 7.7.

When upgrading to Solr 7.7.x, users should be aware of the following major changes from v7.6:

Admin UI

  • The Admin UI now presents a login screen for any users with authentication enabled on their cluster. Clusters with Basic Authentication will prompt users to enter a username and password. On clusters configured to use Kerberos Authentication, authentication is handled transparently by the browser as before, but if authentication fails, users will be directed to configure their browser to provide an appropriate Kerberos ticket.

    The login screen’s purpose is cosmetic only - Admin UI-triggered Solr requests were subject to authentication prior to 7.7 and still are today. The login screen changes only the user experience of providing this authentication.

Distributed Requests

  • The shards parameter, used to manually select the shards and replicas that receive distributed requests, now checks nodes against a whitelist of acceptable values for security reasons.

    In SolrCloud mode this whitelist is automatically configured to contain all live nodes. In standalone mode the whitelist is empty by default. Upgrading users who use the shards parameter in standalone mode can correct this value by setting the shardsWhitelist property in any shardHandler configurations in their solrconfig.xml file.

    For more information, see the Distributed Request documentation.

Solr 7.6

See the 7.6 Release Notes for an overview of the main new features in Solr 7.6.

When upgrading to Solr 7.6, users should be aware of the following major changes from v7.5:

Collections

  • The JSON parameter to set cluster-wide default cluster properties with the CLUSTERPROP command has changed.

    The old syntax nested the defaults into a property named clusterDefaults. The new syntax uses only defaults. The command to use is still set-obj-property.

    An example of the new syntax is:

    {
      "set-obj-property": {
        "defaults" : {
          "collection": {
            "numShards": 2,
            "nrtReplicas": 1,
            "tlogReplicas": 1,
            "pullReplicas": 1
          }
        }
      }
    }

    The old syntax will be supported until at least Solr 9, but users are advised to begin using the new syntax as soon as possible.

  • The parameter min_rf has been deprecated and no longer needs to be provided in order to see the achieved replication factor. This information will now always be returned to the client with the response.

Autoscaling

  • An autoscaling policy is now used as the default strategy for selecting nodes on which new replicas or replicas of new collections are created.

    A default policy is now in place for all users, which will sort nodes by the number of cores and available freedisk, which means by default a node with the fewest number of cores already on it and the highest available freedisk will be selected for new core creation.

  • The change described above has two additional impacts on the maxShardsPerNode parameter:

    1. It removes the restriction against using maxShardsPerNode when an autoscaling policy is in place. This parameter can now always be set when creating a collection.
    2. It removes the default setting of maxShardsPerNode=1 when an autoscaling policy is in place. It will be set correctly (if required) regardless of whether an autoscaling policy is in place or not.

      The default value of maxShardsPerNode is still 1. It can be set to -1 if the old behavior of unlimited maxShardsPerNode is desired.

DirectoryFactory

  • Lucene has introduced the ByteBuffersDirectoryFactory as a replacement for the RAMDirectoryFactory, which will be removed in Solr 9.

    While most users are still encouraged to use the NRTCachingDirectoryFactory, which allows Lucene to select the best directory factory to use, if you have explicitly configured Solr to use the RAMDirectoryFactory, you are encouraged to switch to the new implementation as soon as possible before Solr 9 is released.

    For more information about the new directory factory, see the Jira issue LUCENE-8438.

    For more information about the directory factory configuration in Solr, see the section DataDir and DirectoryFactory in SolrConfig.

Solr 7.5

See the 7.5 Release Notes for an overview of the main new features in Solr 7.5.

When upgrading to Solr 7.5, users should be aware of the following major changes from v7.4:

Schema Changes

  • Since Solr 7.0, Solr’s schema field-guessing has created _str fields for all _txt fields, and returned those by default with queries. As of 7.5, _str fields will no longer be returned by default. They will still be available and can be requested with the fl parameter on queries. See also the section on field guessing for more information about how schema field guessing works.

  • The Standard Filter, which has been non-operational since at least Solr v4, has been removed.

Index Merge Policy

  • When using the TieredMergePolicy, the default merge policy for Solr, optimize and expungeDeletes now respect the maxMergedSegmentMB configuration parameter, which defaults to 5000 (5GB).

    If it is absolutely necessary to control the number of segments present after optimize, specify maxSegments as a positive integer. Setting maxSegments higher than 1 are honored on a "best effort" basis.

    The TieredMergePolicy will also reclaim resources from segments that exceed maxMergedSegmentMB more aggressively than earlier.

UIMA Removed

  • The UIMA contrib has been removed from Solr and is no longer available.

Logging

  • Solr’s logging configuration file is now located in server/resources/log4j2.xml by default.

  • A bug for Windows users has been corrected. When using Solr’s examples (bin/solr start -e) log files will now be put in the correct location (example/ instead of server). See also Solr Examples and Solr Control Script Reference for more information.

Solr 7.4

See the 7.4 Release Notes for an overview of the main new features in Solr 7.4.

When upgrading to Solr 7.4, users should be aware of the following major changes from v7.3:

Logging

  • Solr now uses Log4j v2.11. The Log4j configuration is now in log4j2.xml rather than log4j.properties files. This is a server side change only and clients using SolrJ won’t need any changes. Clients can still use any logging implementation which is compatible with SLF4J. We now let Log4j handle rotation of Solr logs at startup, and bin/solr start scripts will no longer attempt this nor move existing console or garbage collection logs into logs/archived either. See Configuring Logging for more details about Solr logging.

  • Configuring slowQueryThresholdMillis now logs slow requests to a separate file named solr_slow_requests.log. Previously they would get logged in the solr.log file.

Legacy Scaling (non-SolrCloud)

  • In the leader-follower model of scaling Solr, a follower no longer commits an empty index when a completely new index is detected on leader during replication. To return to the previous behavior pass false to skipCommitOnLeaderVersionZero in the follower section of replication handler configuration, or pass it to the fetchindex command.

If you are upgrading from a version earlier than Solr 7.3, please see previous version notes below.

Solr 7.3

See the 7.3 Release Notes for an overview of the main new features in Solr 7.3.

When upgrading to Solr 7.3, users should be aware of the following major changes from v7.2:

Configsets

  • Collections created without specifying a configset name have used a copy of the _default configset since Solr 7.0. Before 7.3, the copied configset was named the same as the collection name, but from 7.3 onwards it will be named with a new ".AUTOCREATED" suffix. This is to prevent overwriting custom configset names.

Learning to Rank

  • The rq parameter used with Learning to Rank rerank query parsing no longer considers the defType parameter. See Running a Rerank Query for more information about this parameter.

Autoscaling & AutoAddReplicas

  • The behaviour of the autoscaling system will now pause all triggers from execution between the start of actions and the end of a cool down period. The triggers will resume after the cool down period expires. Previously, the cool down period was a fixed period started after actions for a trigger event completed and during this time all triggers continued to run but any events were rejected and tried later.

  • The throttling mechanism used to limit the rate of autoscaling events processed has been removed. This deprecates the actionThrottlePeriodSeconds setting in the set-properties Autoscaling API which is now non-operational. Use the triggerCooldownPeriodSeconds parameter instead to pause event processing.

  • The default value of autoReplicaFailoverWaitAfterExpiration, used with the AutoAddReplicas feature, has increased to 120 seconds from the previous default of 30 seconds. This affects how soon Solr adds new replicas to replace the replicas on nodes which have either crashed or shutdown.

Logging

  • The default Solr log file size and number of backups have been raised to 32MB and 10 respectively. See the section Configuring Logging for more information about how to configure logging.

SolrCloud

  • The old Leader-In-Recovery implementation (implemented in Solr 4.9) is now deprecated and replaced. Solr will support rolling upgrades from old 7.x versions of Solr to future 7.x releases until the last release of the 7.x major version.

    This means to upgrade to Solr 8 in the future, you will need to be on Solr 7.3 or higher.

  • Replicas which are not up-to-date are no longer allowed to become leader. Use the FORCELEADER command of the Collections API to allow these replicas become leader.

Spatial

  • If you are using the spatial JTS library with Solr, you must upgrade to 1.15.0. This new version of JTS is now dual-licensed to include a BSD style license. See the section on Spatial Search for more information.

Highlighting

  • The top-level <highlighting> element in solrconfig.xml is now officially deprecated in favour of the equivalent <searchComponent> syntax. This element has been out of use in default Solr installations for several releases already.

If you are upgrading from a version earlier than Solr 7.2, please see previous version notes below.

Solr 7.2

See the 7.2 Release Notes for an overview of the main new features in Solr 7.2.

When upgrading to Solr 7.2, users should be aware of the following major changes from v7.1:

Local Parameters

  • Starting a query string with local parameters {!myparser …​} is used to switch from one query parser to another, and is intended for use by Solr system developers, not end users doing searches. To reduce negative side-effects of unintended hack-ability, Solr now limits the cases when local parameters will be parsed to only contexts in which the default parser is "lucene" or "func".

    So, if defType=edismax then q={!myparser …​} won’t work. In that example, put the desired query parser into the defType parameter.

    Another example is if deftype=edismax then hl.q={!myparser …​} won’t work for the same reason. In this example, either put the desired query parser into the hl.qparser parameter or set hl.qparser=lucene. Most users won’t run into these cases but some will need to change.

    If you must have full backwards compatibility, use luceneMatchVersion=7.1.0 or an earlier version.

eDisMax Query Parser

  • The eDisMax parser by default no longer allows subqueries that specify a Solr parser using either local parameters, or the older _query_ magic field trick.

    For example, {!prefix f=myfield v=enterp} or _query_:"{!prefix f=myfield v=enterp}" are not supported by default any longer. If you want to allow power-users to do this, set uf=* query or some other value that includes _query_.

    If you need full backwards compatibility for the time being, use luceneMatchVersion=7.1.0 or something earlier.

If you are upgrading from a version earlier than Solr 7.1, please see previous version notes below.

Solr 7.1

See the 7.1 Release Notes for an overview of the main new features of Solr 7.1.

When upgrading to Solr 7.1, users should be aware of the following major changes from v7.0:

AutoAddReplicas

  • The feature to automatically add replicas if a replica goes down, previously available only when storing indexes in HDFS, has been ported to the autoscaling framework. Due to this, autoAddReplicas is now available to all users even if their indexes are on local disks.

    Existing users of this feature should not have to change anything. However, they should note these changes:

    • Behavior: Changing the autoAddReplicas property from disabled (false) to enabled (true) using MODIFYCOLLECTION API no longer replaces down replicas for the collection immediately. Instead, replicas are only added if a node containing them went down while autoAddReplicas was enabled. The parameters autoReplicaFailoverBadNodeExpiration and autoReplicaFailoverWorkLoopDelay are no longer used.

    • Deprecations: Enabling/disabling autoAddReplicas cluster-wide with the API will be deprecated; use suspend/resume trigger APIs with name=".auto_add_replicas" instead.

      More information about the changes to this feature can be found in the section SolrCloud Automatically Adding Replicas.

Metrics Reporters

  • Shard and cluster metric reporter configuration now require a class attribute.

    • If a reporter configures the group="shard" attribute then please also configure the class="org.apache.solr.metrics.reporters.solr.SolrShardReporter" attribute.

    • If a reporter configures the group="cluster" attribute then please also configure the class="org.apache.solr.metrics.reporters.solr.SolrClusterReporter" attribute.

      See the section Shard and Cluster Reporters for more information.

Streaming Expressions

  • All stream evaluators in solrj.io.eval have been refactored to have a simpler and more robust structure. This simplifies and condenses the code required to implement a new Evaluator and makes it much easier for evaluators to handle differing data types (primitives, objects, arrays, lists, and so forth).

ReplicationHandler

  • In the ReplicationHandler, the leader.commitReserveDuration sub-element is deprecated. Instead please configure a direct commitReserveDuration element for use in all modes (leader, follower, cloud).

RunExecutableListener

  • The RunExecutableListener was removed for security reasons. If you want to listen to events caused by updates, commits, or optimize, write your own listener as native Java class as part of a Solr plugin.

XML Query Parser

  • In the XML query parser (defType=xmlparser or {!xmlparser …​ }) the resolving of external entities is now disallowed by default.

If you are upgrading from a version earlier than Solr 7.0, please see Major Changes in Solr 7 before starting your upgrade.