Major Changes in Solr 7
Solr 7 is a major new release of Solr which introduces new features and a number of other changes that may impact your existing installation.
Upgrade Planning
There are major changes in Solr 7 to consider before starting to migrate your configurations and indexes. This page is designed to highlight the biggest changes - new features you may want to be aware of, but also changes in default behavior and deprecated features that have been removed.
There are many hundreds of changes in Solr 7, however, so a thorough review of the Solr Upgrade Notes as well as the CHANGES.txt file in your Solr instance will help you plan your migration to Solr 7. This section attempts to highlight some of the major changes you should be aware of.
You should also consider all changes that have been made to Solr in any version you have not upgraded to already. For example, if you are currently using Solr 6.2, you should review changes made in all subsequent 6.x releases in addition to changes for 7.0.
Reindexing your data is considered the best practice and you should try to do so if possible. However, if reindexing is not feasible, keep in mind you can only upgrade one major version at a time. Thus, Solr 6.x indexes will be compatible with Solr 7 but Solr 5.x indexes will not be.
If you do not reindex now, keep in mind that you will need to either reindex your data or upgrade your indexes before you will be able to move to Solr 8 when it is released in the future. See the section IndexUpgrader Tool for more details on how to upgrade your indexes.
See also the section Upgrading a Solr Cluster for details on how to upgrade a SolrCloud cluster.
New Features & Enhancements
Replication Modes
Until Solr 7, the SolrCloud model for replicas has been to allow any replica to become a leader when a leader is lost. This is highly effective for most users, providing reliable failover in case of issues in the cluster. However, it comes at a cost in large clusters because all replicas must be in sync at all times.
To provide additional flexibility, two new types of replicas have been added, named TLOG & PULL. These new types provide options to have replicas which only sync with the leader by copying index segments from the leader. The TLOG type has an additional benefit of maintaining a transaction log (the "tlog" of its name), which would allow it to recover and become a leader if necessary; the PULL type does not maintain a transaction log, so cannot become a leader.
As part of this change, the traditional type of replica is now named NRT. If you do not explicitly define a number of TLOG or PULL replicas, Solr defaults to creating NRT replicas. If this model is working for you, you will not have to change anything.
See the section Types of Replicas for more details on the new replica modes, and how define the replica type in your cluster.
Autoscaling
Solr autoscaling is a new suite of features in Solr to make managing a SolrCloud cluster easier and more automated.
At its core, Solr autoscaling provides users with a rule syntax to define preferences and policies for how to distribute nodes and shards in a cluster, with the goal of maintaining a balance in the cluster. As of Solr 7, Solr will take any policy or preference rules into account when determining where to place new shards and replicas created or moved with various Collections API commands.
See the section SolrCloud Autoscaling for details on the options available in 7.0. Expect more features to be released in subsequent 7.x releases in this area.
Other Features & Enhancements
- The Analytics Component has been refactored.
- The documentation for this component is in progress; until it is available, please refer to SOLR-11144 for more details.
- There were several other new features released in earlier 6.x releases, which you may have missed:
- Learning to Rank
- Unified Highlighter
- Metrics API. See also information about related deprecations in the section JMX Support and MBeans below.
- Payload queries
- Streaming Evaluators
- /v2 API
- Graph streaming expressions
Configuration and Default Changes
New Default ConfigSet
Several changes have been made to configsets that ship with Solr; not only their content but how Solr behaves in regard to them:
- The
data_driven_configset
andbasic_configset
have been removed, and replaced by the_default
configset. Thesample_techproducts_configset
also remains, and is designed for use with the example documents shipped with Solr in theexample/exampledocs
directory. - When creating a new collection, if you do not specify a configset, the
_default
will be used.- If you use SolrCloud, the
_default
configset will be automatically uploaded to ZooKeeper. - If you use standalone mode, the instanceDir will be created automatically, using the
_default
configset as it’s basis.
- If you use SolrCloud, the
Schemaless Improvements
To improve the functionality of Schemaless Mode, Solr now behaves differently when it detects that data in an incoming field should have a text-based field type.
- Incoming fields will be indexed as
text_general
by default (you can change this). The name of the field will be the same as the field name defined in the document. - A copy field rule will be inserted into your schema to copy the new
text_general
field to a new field with the name<name>_str
. This field’s type will be astrings
field (to allow for multiple values). The first 256 characters of the text field will be inserted to the newstrings
field.
This behavior can be customized if you wish to remove the copy field rule, or to change the number of characters inserted to the string field, or the field type used. See the section Schemaless Mode for details.
Because copy field rules can slow indexing and increase index size, it’s recommended you only use copy fields when you need to. If you do not need to sort or facet on a field, you should remove the automatically-generated copy field rule. |
Automatic field creation can be disabled with the update.autoCreateFields
property. To do this, you can use the Config API with a command such as:
V1 API
curl http://host:8983/solr/mycollection/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'
V2 API
curl http://host:8983/api/collections/mycollection/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'
Changes to Default Behaviors
- JSON is now the default response format. If you rely on XML responses, you must now define
wt=xml
in your request. In addition, line indentation is enabled by default (indent=on
). - The
sow
parameter (short for "Split on Whitespace") now defaults tofalse
, which allows support for multi-word synonyms out of the box. This parameter is used with the eDismax and standard/"lucene" query parsers. If this parameter is not explicitly specified astrue
, query text will not be split on whitespace before analysis. The
legacyCloud
parameter now defaults tofalse
. If an entry for a replica does not exist instate.json
, that replica will not get registered.This may affect users who bring up replicas and they are automatically registered as a part of a shard. It is possible to fall back to the old behavior by setting the property
legacyCloud=true
, in the cluster properties using the following command:./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:2181 -cmd clusterprop -name legacyCloud -val true
- The eDismax query parser parameter
lowercaseOperators
now defaults tofalse
if theluceneMatchVersion
insolrconfig.xml
is 7.0.0 or above. Behavior forluceneMatchVersion
lower than 7.0.0 is unchanged (so,true
). This means that clients must sent boolean operators (such as AND, OR and NOT) in upper case in order to be recognized, or you must explicitly set this parameter totrue
. The
handleSelect
parameter insolrconfig.xml
now defaults tofalse
if theluceneMatchVersion
is 7.0.0 or above. This causes Solr to ignore theqt
parameter if it is present in a request. If you have request handlers without a leading '/', you can sethandleSelect="true"
or consider migrating your configuration.The
qt
parameter is still used as a SolrJ special parameter that specifies the request handler (tail URL path) to use.- The lucenePlusSort query parser (aka the "Old Lucene Query Parser") has been deprecated and is no longer implicitly defined. If you wish to continue using this parser until Solr 8 (when it will be removed), you must register it in your
solrconfig.xml
, as in:<queryParser name="lucenePlusSort" class="solr.OldLuceneQParserPlugin"/>
. - The name of
TemplateUpdateRequestProcessorFactory
is changed totemplate
fromTemplate
and the name ofAtomicUpdateProcessorFactory
is changed toatomic
fromAtomic
- Also,
TemplateUpdateRequestProcessorFactory
now uses{}
instead of${}
fortemplate
.
- Also,
Deprecations and Removed Features
Point Fields Are Default Numeric Types
Solr has implemented *PointField types across the board, to replace Trie* based numeric fields. All Trie* fields are now considered deprecated, and will be removed in Solr 8.
If you are using Trie* fields in your schema, you should consider moving to PointFields as soon as feasible. Changing to the new PointField types will require you to reindex your data.
Spatial Fields
The following spatial-related fields have been deprecated:
LatLonType
GeoHashField
SpatialVectorFieldType
SpatialTermQueryPrefixTreeFieldType
Choose one of these field types instead:
LatLonPointSpatialField
SpatialRecursivePrefixTreeField
RptWithGeometrySpatialField
See the section Spatial Search for more information.
JMX Support and MBeans
The
<jmx>
element insolrconfig.xml
has been removed in favor of<metrics><reporter>
elements defined insolr.xml
.Limited back-compatibility is offered by automatically adding a default instance of
SolrJmxReporter
if it’s missing AND when a local MBean server is found. A local MBean server can be activated either viaENABLE_REMOTE_JMX_OPTS
insolr.in.sh
or via system properties, e.g.,-Dcom.sun.management.jmxremote
. This default instance exports all Solr metrics from all registries as hierarchical MBeans.This behavior can be also disabled by specifying a
SolrJmxReporter
configuration with a boolean init argumentenabled
set tofalse
. For a more fine-grained control users should explicitly specify at least oneSolrJmxReporter
configuration.See also the section The <metrics><reporters> Element, which describes how to set up Metrics Reporters in
solr.xml
. Note that back-compatibility support may be removed in Solr 8.- MBean names and attributes now follow the hierarchical names used in metrics. This is reflected also in
/admin/mbeans
and/admin/plugins
output, and can be observed in the UI Plugins tab, because now all these APIs get their data from the metrics API. The old (mostly flat) JMX view has been removed.
SolrJ
The following changes were made in SolrJ.
HttpClientInterceptorPlugin
is nowHttpClientBuilderPlugin
and must work with aSolrHttpClientBuilder
rather than anHttpClientConfigurer
.HttpClientUtil
now allows configuringHttpClient
instances viaSolrHttpClientBuilder
rather than anHttpClientConfigurer
. Use of env variableSOLR_AUTHENTICATION_CLIENT_CONFIGURER
no longer works, please useSOLR_AUTHENTICATION_CLIENT_BUILDER
SolrClient
implementations now use their own internal configuration for socket timeouts, connect timeouts, and allowing redirects rather than what is set as the default when building theHttpClient
instance. Use the appropriate setters on theSolrClient
instance.HttpSolrClient#setAllowCompression
has been removed and compression must be enabled as a constructor parameter.HttpSolrClient#setDefaultMaxConnectionsPerHost
andHttpSolrClient#setMaxTotalConnections
have been removed. These now default very high and can only be changed via parameter when creating an HttpClient instance.
Other Deprecations and Removals
- The
defaultOperator
parameter in the schema is no longer supported. Use theq.op
parameter instead. This option had been deprecated for several releases. See the section Standard Query Parser Parameters for more information. - The
defaultSearchField
parameter in the schema is no longer supported. Use thedf
parameter instead. This option had been deprecated for several releases. See the section Standard Query Parser Parameters for more information. - The
mergePolicy
,mergeFactor
andmaxMergeDocs
parameters have been removed and are no longer supported. You should define amergePolicyFactory
instead. See the section the mergePolicyFactory for more information. - The PostingsSolrHighlighter has been deprecated. It’s recommended that you move to using the UnifiedHighlighter instead. See the section Unified Highlighter for more information about this highlighter.
- Index-time boosts have been removed from Lucene, and are no longer available from Solr. If any boosts are provided, they will be ignored by the indexing chain. As a replacement, index-time scoring factors should be indexed in a separate field and combined with the query score using a function query. See the section Function Queries for more information.
- The
StandardRequestHandler
is deprecated. UseSearchHandler
instead. - To improve parameter consistency in the Collections API, the parameter names
fromNode
for the MOVEREPLICA command andsource
,target
for the REPLACENODE command have been deprecated and replaced withsourceNode
andtargetNode
instead. The old names will continue to work for back-compatibility but they will be removed in Solr 8. - The unused
valType
option has been removed from ExternalFileField, if you have this in your schema you can safely remove it.
Major Changes in Earlier 6.x Versions
The following summary of changes in earlier 6.x releases highlights significant changes released between Solr 6.0 and 6.6 that were listed in earlier versions of this Guide. Mentions of deprecations are likely superseded by removal in Solr 7, as noted in the above sections.
Note again that this is not a complete list of all changes that may impact your installation, so a thorough review of CHANGES.txt is highly recommended if upgrading from any version earlier than 6.6.
- The Solr contribs map-reduce, morphlines-core and morphlines-cell have been removed.
- JSON Facet API now uses hyper-log-log for numBuckets cardinality calculation and calculates cardinality before filtering buckets by any
mincount
greater than 1. - If you use historical dates, specifically on or before the year 1582, you should reindex for better date handling.
- If you use the JSON Facet API (json.facet) with
method=stream
, you must now setsort='index asc'
to get the streaming behavior; otherwise it won’t stream. Reminder:method
is a hint that doesn’t change defaults of other parameters. - If you use the JSON Facet API (json.facet) to facet on a numeric field and if you use
mincount=0
or if you set the prefix, you will now get an error as these options are incompatible with numeric faceting. - Solr’s logging verbosity at the INFO level has been greatly reduced, and you may need to update the log configs to use the DEBUG level to see all the logging messages you used to see at INFO level before.
- We are no longer backing up
solr.log
andsolr_gc.log
files in date-stamped copies forever. If you relied on thesolr_log_<date>
orsolr_gc_log_<date>
being in the logs folder that will no longer be the case. See the section Configuring Logging for details on how log rotation works as of Solr 6.3. - The create/deleteCollection methods on
MiniSolrCloudCluster
have been deprecated. Clients should instead use theCollectionAdminRequest
API. In addition,MiniSolrCloudCluster#uploadConfigDir(File, String)
has been deprecated in favour of#uploadConfigSet(Path, String)
. - The
bin/solr.in.sh
(bin/solr.in.cmd
on Windows) is now completely commented by default. Previously, this wasn’t so, which had the effect of masking existing environment variables. - The
_version_
field is no longer indexed and is now defined withindexed=false
by default, because the field has DocValues enabled. - The
/export
handler has been changed so it no longer returns zero (0) for numeric fields that are not in the original document. One consequence of this change is that you must be aware that some tuples will not have values if there were none in the original document. - Metrics-related classes in
org.apache.solr.util.stats
have been removed in favor of the Dropwizard metrics library. Any custom plugins using these classes should be changed to use the equivalent classes from the metrics library. As part of this, the following changes were made to the output of Overseer Status API:- The "totalTime" metric has been removed because it is no longer supported.
- The metrics "75thPctlRequestTime", "95thPctlRequestTime", "99thPctlRequestTime" and "999thPctlRequestTime" in Overseer Status API have been renamed to "75thPcRequestTime", "95thPcRequestTime" and so on for consistency with stats output in other parts of Solr.
- The metrics "avgRequestsPerMinute", "5minRateRequestsPerMinute" and "15minRateRequestsPerMinute" have been replaced by corresponding per-second rates viz. "avgRequestsPerSecond", "5minRateRequestsPerSecond" and "15minRateRequestsPerSecond" for consistency with stats output in other parts of Solr.
- A new highlighter named UnifiedHighlighter has been added. You are encouraged to try out the UnifiedHighlighter by setting
hl.method=unified
and report feedback. It’s more efficient/faster than the other highlighters, especially compared to the original Highlighter. SeeHighlightParams.java
for a listing of highlight parameters annotated with which highlighters use them.hl.useFastVectorHighlighter
is now considered deprecated in lieu ofhl.method=fastVector
. - The
maxWarmingSearchers
parameter now defaults to 1, and more importantly commits will now block if this limit is exceeded instead of throwing an exception (a good thing). Consequently there is no longer a risk in overlapping commits. Nonetheless users should continue to avoid excessive committing. Users are advised to remove any pre-existingmaxWarmingSearchers
entries from theirsolrconfig.xml
files. - The Complex Phrase query parser now supports leading wildcards. Beware of its possible heaviness, users are encouraged to use ReversedWildcardFilter in index time analysis.
- The JMX metric "avgTimePerRequest" (and the corresponding metric in the metrics API for each handler) used to be a simple non-decaying average based on total cumulative time and the number of requests. The Codahale Metrics implementation applies exponential decay to this value, which heavily biases the average towards the last 5 minutes.
- Parallel SQL now uses Apache Calcite as its SQL framework. As part of this change the default aggregation mode has been changed to
facet
rather thanmap_reduce
. There have also been changes to the SQL aggregate response and some SQL syntax changes. Consult the Parallel SQL Interface documentation for full details.