These are common terms used with Solr.
Where possible, terms are linked to relevant parts of the Solr Reference Guide for more information.
Jump to a letter:
- Atomic updates
An approach to updating only one or more fields of a document, instead of reindexing the entire document.
In Solr, one or more Documents grouped together in a single logical index using a single configuration and Schema.
In SolrCloud a collection may be divided up into multiple logical shards, which may in turn be distributed across many nodes.
Single-node installations and user-managed clusters use instead the concept of a Core.
To make document changes permanent in the index. In the case of added documents, they would be searchable after a commit.
An individual Solr instance (represents a logical index). Multiple cores can run on a single node. See also SolrCloud.
- Core reload
To re-initialize a Solr core after changes to the schema file,
solrconfig.xmlor other configuration files.
- Distributed search
Distributed search is one where queries are processed across more than one Shard.
A group of fields and their values. Documents are the basic unit of data in a collection. Documents are assigned to shards using standard hashing, or by specifically assigning a shard within the document ID. Documents are versioned after each write operation.
A ZooKeeper term to indicate multiple ZooKeeper instances running simultaneously and in coordination with each other for fault tolerance.
- Inverse document frequency (IDF)
A measure of the general importance of a term. It is calculated as the number of total Documents divided by the number of Documents that a particular word occurs in the collection. See http://en.wikipedia.org/wiki/Tf-idf and the Lucene TFIDFSimilarity javadocs for more info on TF-IDF based scoring and Lucene scoring in particular. See also Term frequency.
- Inverted index
A way of creating a searchable index that lists every word and the documents that contain those words, similar to an index in the back of a book which lists words and the pages on which they can be found. When performing keyword searches, this method is considered more efficient than the alternative, which would be to create a list of documents paired with every word used in each document. Since users search using terms they expect to be in documents, finding the term before the document saves processing resources and time.
A single Replica for each Shard that takes charge of coordinating index updates (document additions or deletions) to other replicas in the same shard. This is a transient responsibility assigned to a node via an election, if the current Shard Leader goes down, a new node will automatically be elected to take its place. See also SolrCloud.
- Optimistic concurrency
Also known as "optimistic locking", this is an approach that allows for updates to documents currently in the index while retaining locking or version control.
A single node in SolrCloud that is responsible for processing and coordinating actions involving the entire cluster. It keeps track of the state of existing nodes, collections, shards, and replicas, and assigns new replicas to nodes. This is a transient responsibility assigned to a node via an election, if the current Overseer goes down, a new node will be automatically elected to take its place. See also SolrCloud.
The ability of a search engine to retrieve all of the possible matches to a user’s query.
The appropriateness of a document to the search conducted by the user.
A method of copying a leader index from one server to one or more "follower" or "child" servers.
Logic and configuration parameters that tell Solr how to handle incoming "requests", whether the requests are to return search results, to index documents, or to handle other custom situations.
Logic and configuration parameters used by request handlers to process query requests. Examples of search components include faceting, highlighting, and "more like this" functionality.
In SolrCloud, a logical partition of a single Collection. Every shard consists of at least one physical Replica, but there may be multiple Replicas distributed across multiple Nodes for fault tolerance. See also SolrCloud.
- Solr Schema (managed-schema.xml or schema.xml)
The Solr index Schema defines the fields to be indexed and the type for the field (text, integers, etc.). By default schema data can be "managed" at run time using the Schema API and is typically kept in a file named
managed-schema.xmlwhich Solr modifies as needed, but a collection may be configured to use a static Schema, which is only loaded on startup from a human edited configuration file - typically named
schema.xml. See Schema Factory Configuration for details.
- SolrConfig (solrconfig.xml)
The Apache Solr configuration file. Defines indexing options, RequestHandlers, highlighting, spellchecking and various other configurations. The file,
solrconfig.xml, is located in the Solr home
- Spell Check
The ability to suggest alternative spellings of search terms to a user, as a check against spelling errors causing few or zero results.
Generally, words that have little meaning to a user’s search but which may have been entered as part of a natural language query. Stopwords are generally very small pronouns, conjunctions and prepositions (such as, "the", "with", or "and")
Functionality in Solr that provides the ability to suggest possible query terms to users as they type.
Synonyms generally are terms which are near to each other in meaning and may substitute for one another. In a search engine implementation, synonyms may be abbreviations as well as words, or terms that are not consistently hyphenated. Examples of synonyms in this context would be "Inc." and "Incorporated" or "iPod" and "i-pod".
- Term frequency
The number of times a word occurs in a given document. See http://en.wikipedia.org/wiki/Tf-idf and the Lucene TFIDFSimilarity javadocs for more info on TF-IDF based scoring and Lucene scoring in particular. See also Inverse document frequency (IDF).
- Transaction log
An append-only log of write operations maintained by each Replica. This log is required with SolrCloud implementations and is created and managed automatically by Solr.
Also known as Apache ZooKeeper. The system used by SolrCloud to keep track of configuration files and node names for a cluster. A ZooKeeper cluster is used as the central configuration store for the cluster, a coordinator for operations requiring distributed synchronization, and the system of record for cluster topology. See also SolrCloud.