Dense Vector Search

Solr’s Dense Vector Search adds support for indexing and searching dense numerical vectors.

Deep learning can be used to produce a vector representation of both the query and the documents in a corpus of information.

These neural network-based techniques are usually referred to as neural search, an industry derivation from the academic field of Neural information Retrieval.

Important Concepts

Dense Vector Representation

A traditional tokenized inverted index can be considered to model text as a "sparse" vector, in which each term in the corpus corresponds to one vector dimension. In such a model, the number of dimensions is generally quite high (corresponding to the term dictionary cardinality), and the vector for any given document contains mostly zeros (hence it is sparse, as only a handful of terms that exist in the overall index will be present in any given document).

Dense vector representation contrasts with term-based sparse vector representation in that it distills approximate semantic meaning into a fixed (and limited) number of dimensions.

The number of dimensions in this approach is generally much lower than the sparse case, and the vector for any given document is dense, as most of its dimensions are populated by non-zero values.

In contrast to the sparse approach (for which tokenizers are used to generate sparse vectors directly from text input) the task of generating vectors must be handled in application logic external to Apache Solr.

There may be cases where it makes sense to directly search data that natively exists as a vector (e.g., scientific data); but in a text search context, it is likely that users will leverage deep learning models such as BERT to encode textual information as dense vectors, supplying the resulting vectors to Apache Solr explicitly at index and query time.

For additional information you can refer to this blog post.

Dense Retrieval

Given a dense vector v that models the information need, the easiest approach for providing dense vector retrieval would be to calculate the distance (euclidean, dot product, etc.) between v and each vector d that represents a document in the corpus of information.

This approach is quite expensive, so many approximate strategies are currently under active research.

The strategy implemented in Apache Lucene and used by Apache Solr is based on Navigable Small-world graph.

It provides efficient approximate nearest neighbor search for high dimensional vectors.

See Approximate nearest neighbor algorithm based on navigable small world graphs (2014) and Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs (2018) for details.

Index Time

This is the Apache Solr field type designed to support dense vector search:

DenseVectorField

The dense vector field gives the possibility of indexing and searching dense vectors of float elements.

For example:

[1.0, 2.5, 3.7, 4.1]

Here’s how DenseVectorField should be configured in the schema:

<fieldType name="knn_vector" class="solr.DenseVectorField" vectorDimension="4" similarityFunction="cosine"/>
<field name="vector" type="knn_vector" indexed="true" stored="true"/>

vectorDimension

Required

Default: none

The dimension of the dense vector to pass in.

Accepted values: Any integer.

similarityFunction

Optional

Default: euclidean

Vector similarity function; used in search to return top K most similar vectors to a target vector.

Accepted values: euclidean, dot_product or cosine.

euclidean: Euclidean distance
dot_product: Dot product

this similarity is intended as an optimized way to perform cosine similarity. In order to use it, all vectors must be of unit length, including both document and query vectors. Using dot product with vectors that are not unit length can result in errors or poor search results.

cosine: Cosine similarity

the cosine similarity scores returned by Solr are normalized like this : (1 + cosine_similarity) / 2.

the preferred way to perform cosine similarity is to normalize all vectors to unit length, and instead use DOT_PRODUCT. You should only use this function if you need to preserve the original vectors and cannot normalize them in advance.

The HNSW parameters hnswM and hnswEfConstruction, previously known as hnswMaxConnections and hnswBeamWidth respectively.

To use the following advanced parameters that customise the codec format and the hyperparameter of the HNSW algorithm, make sure the Schema Codec Factory, is in use.

Here’s how DenseVectorField can be configured with the advanced hyperparameters:

<fieldType name="knn_vector" class="solr.DenseVectorField" vectorDimension="4" similarityFunction="cosine" knnAlgorithm="hnsw" hnswM="10" hnswEfConstruction="40"/>
<field name="vector" type="knn_vector" indexed="true" stored="true"/>

knnAlgorithm

Optional

Default: hnsw

(advanced) Specifies the underlying knn algorithm to use

Accepted values: hnsw, cagra_hnsw (requires GPU acceleration setup).

Please note that the knnAlgorithm accepted values may change in future releases.

vectorEncoding

Optional

Default: FLOAT32

(advanced) Specifies the underlying encoding of the dense vector elements. This affects memory/disk impact for both the indexed and stored fields (if enabled)

Accepted values: FLOAT32, BYTE.

hnswM

Optional

Default: 16

(advanced) This parameter is specific for the hnsw knn algorithm:

Controls how many of the nearest neighbor candidates are connected to the new node.

For more details, refer to the official 2018 paper, where this parameter M is defined.

Accepted values: Any integer.

hnswEfConstruction

Optional

Default: 100

(advanced) This parameter is specific for the hnsw knn algorithm:

It is the number of nearest neighbor candidates to track while searching the graph for each newly inserted node.

For more details, refer to the official 2018 paper, where this parameter efConstruction is defined.

Accepted values: Any integer.

DenseVectorField supports the attributes: indexed, stored.

currently multivalue is not supported

Here’s how a DenseVectorField should be indexed:

JSON
XML
SolrJ

[{ "id": "1",
"vector": [1.0, 2.5, 3.7, 4.1]
},
{ "id": "2",
"vector": [1.5, 5.5, 6.7, 65.1]
}
]

<add>
<doc>
<field name="id">1</field>
<field name="vector">1.0</field>
<field name="vector">2.5</field>
<field name="vector">3.7</field>
<field name="vector">4.1</field>
</doc>
<doc>
<field name="id">2</field>
<field name="vector">1.5</field>
<field name="vector">5.5</field>
<field name="vector">6.7</field>
<field name="vector">65.1</field>
</doc>
</add>

final SolrClient client = getSolrClient();

final SolrInputDocument d1 = new SolrInputDocument();
d1.setField("id", "1");
d1.setField("vector", Arrays.asList(1.0f, 2.5f, 3.7f, 4.1f));


final SolrInputDocument d2 = new SolrInputDocument();
d2.setField("id", "2");
d2.setField("vector", Arrays.asList(1.5f, 5.5f, 6.7f, 65.1f));

client.add(Arrays.asList(d1, d2));

ScalarQuantizedDenseVectorField

Because dense vectors can have a costly size, it may be worthwhile to use a technique called "quantization" which creates a compressed representation of the original vectors. This allows more of the index to be stored in faster memory at the cost of some precision.

This dense vector type uses a conversion that projects a 32 bit float precision feature down to an 8 bit int (or smaller) by linearly mapping the float range of each dimension down to evenly sized "buckets" of values that fit into an int. For example: with 8 bits we can store up to 256 discrete values, so a float dimension with values from 0.0 to 1.0 may be mapped as

[0.0, 0.0039) ⇒ 0, [0.0039, 0.0078) ⇒ 1 … etc

As a specific type of DenseVectorField, this field type supports all the same configurable properties outlined above as well as some additional ones.

Here is how a ScalarQuantizedDenseVectorField can be defined in the schema:

<fieldType name="scalar_quantized_vector" class="solr.ScalarQuantizedDenseVectorField" vectorDimension="4" similarityFunction="cosine"/>
<field name="vector" type="scalar_quantized_vector" indexed="true" stored="true"/>

bits

Optional

Default: 7

The number of bits to use for each quantized dimension value

Accepted values: 4 (half byte) or 7 (unsigned byte).

confidenceInterval

Optional

Default: dimension-scaled

Statistically, outlier values are rarely meaningfully relevant to searches, so to increase the size of each bucket for quantization (and therefore information gain) we can scale the quantization intervals to the middle n % of values and place the remaining outliers in the outermost intervals.

For example: 0.9 means scale interval sizes to the middle 90% of values

If this param is omitted a default is used; scaled to the number of dimensions according to 1-1/(vector_dimensions + 1)

Accepted values: FLOAT32 (within 0.9 and 1.0)

dynamicConfidenceInterval

Optional

Default: false

If set to true, enables dynamically determining confidence interval (per dimension) by sampling values each time a merge occurs.

NOTE: when this is enabled, it will take precedence over any value configured for confidenceInterval

Accepted values: BOOLEAN

compress

Optional

Default: false

If set to true, this will further pack multiple dimension values within a one byte alignment. This further decreases the quantized vector disk storage size by 50% at some decode penalty. This does not affect the raw vector which is always preserved when stored is true.

NOTE: this can only be enabled when bits=4

Accepted values: BOOLEAN

BinaryQuantizedDenseVectorField

Binary quantization is a quantization technique that extends scalar quantization, and is even more aggressive in its compression; able to reduce in-memory representation of each vector dimension from a 32 bit float down to a single bit. This is done by normalizing each dimension of a vector relative to a centroid (mid-point pre-calculated against all vectors in the index) with the stored bit representing whether the actual value is "above" or "below" the centroid’s value. A further "corrective factor" is also computed and stored to help compensate accuracy in the estimated distance. At query time asymmetric quantization is applied to the query vector (reducing its dimension values down to 4 bits each), but allowing comparison with the stored binary quantized vector via bit arithmetic.

This implementation comprises of LVQ, proposed in Similarity Search in the Blink of an Eye With Compressed Indices by Cecilia Aguerrebere et al., previous work on globally optimized scalar quantization in Apache Lucene, and ideas from Accelerating Large-Scale Inference with Anisotropic Vector Quantization by Ruiqi Guo et al.

This vector type is best utilized for data sets consisting of large amounts of high dimensionality vectors.

Here is how a BinaryQuantizedDenseVectorField can be defined in the schema:

<fieldType name="binary_quantized_vector" class="solr.BinaryQuantizedDenseVectorField" vectorDimension="4"/>
<field name="vector" type="binary_quantized_vector" indexed="true" stored="true"/>

BinaryQuantizedDenseVectorField accepts the same parameters as DenseVectorField with the only notable exception being similarityFunction. Bit quantization uses its own distance calculation and so does not require nor use the similarityFunction param.

Query Time

Apache Solr provides three query parsers that work with dense vector fields, that each support different ways of matching documents based on vector similarity: The knn query parser, the vectorSimilarity query parser and the knn_text_to_vector query parser.

All parsers return scores for retrieved documents that are the approximate distance to the target vector (defined by the similarityFunction configured at indexing time) and both support "Pre-Filtering" the document graph to reduce the number of candidate vectors evaluated (without needing to compute their vector similarity distances).

Common parameters for both query parsers are:

f

Required

Default: none

The DenseVectorField to search in.

preFilter

Optional

Default: Depends on usage, see below.

Specifies an explicit list of Pre-Filter query strings to use.

includeTags

Optional

Default: none

Indicates that only fq filters with the specified tag should be considered for implicit Pre-Filtering. Must not be combined with preFilter.

excludeTags

Optional

Default: none

Indicates that fq filters with the specified tag should be excluded from consideration for implicit Pre-Filtering. Must not be combined with preFilter.

knn Query Parser

The knn k-nearest neighbors query parser matches k-nearest documents to the target vector.

In addition to the common parameters described above, it takes the following parameters:

topK

Optional

Default: 10

How many k-nearest results to return.

Here’s an example of a simple knn search:

?q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]

The search results retrieved are the k=10 nearest documents to the vector in input [1.0, 2.0, 3.0, 4.0], ranked by the similarityFunction configured at indexing time.

earlyTermination

Optional

Default: false

Early termination is an HNSW optimization. Solr relies on the Lucene’s implementation of early termination for kNN queries, based on Patience in Proximity: A Simple Early Termination Strategy for HNSW Graph Traversal in Approximate k-Nearest Neighbor Search (2025).

When enabled (true), the search may exit early when the HNSW candidate queue remains saturated over a threshold (saturationThreshold) for more than a given number of iterations (patience). Refer to the two parameters below for more details.

Enabling early termination typically reduces query latency and resource usage, with a potential small trade-off in recall.

saturationThreshold

Optional

Default: 0.995

(advanced) The early exit saturation threshold.

Our recommendation is to rely on the default value and change this parameter only if you are confident about its impact. Using values that are too low can cause the search to terminate prematurely, leading to poor recall.

This parameter must be used together with patience; either specify both to customize the behavior, or omit both to rely on the default values.

patience

Optional

Default: max(7, topK * 0.3)

(advanced) The number of consecutive iterations the search will continue after the candidate queue is considered saturated. The default value is not a fixed value (integer) but a formula based on the topK parameter.

Our recommendation is to rely on the default value and change this parameter only if you are confident about its impact:

Using values that are too low can make the search stop too aggressively, reducing recall.
Using values that are too high reduces the benefit of early termination, since the search runs nearly as long as without it.

This parameter must be used together with saturationThreshold; either specify both to customize the behavior, or omit both to rely on the default values.

efSearchScaleFactor

Optional

Default: 1.0

(advanced) Multiplier factor for calculating how many candidates the HNSW algorithm examines during search.

The effective efSearch value is calculated internally as efSearchScaleFactor * topK. Lower values fetch fewer candidates for faster performance but may miss some good matches. Higher values fetch more candidates, improving recall but slowing down the search.

Accepted values: Any float >= 1.0.

Here’s an example of a knn search using the early termination with input parameters:

?q={!knn f=vector topK=10 earlyTermination=true saturationThreshold=0.989 patience=10 efSearchScaleFactor=3.0}[1.0, 2.0, 3.0, 4.0]

seedQuery

Optional

Default: none

A query seed to initiate the vector search, i.e. entry points in the HNSW graph exploration. Solr relies on Lucene’s implementation of SeededKnnVectorQuery based on Lexically-Accelerated Dense Retrieval (2023).

The seedQuery is primarily intended to be a lexical query, guiding the vector search in a hybrid-like way through traditional query logic. Although a knn query can also be used as a seed — which might make sense in specific scenarios and has been verified by a dedicated test — this approach is not considered a best practice.

The seedQuery can also be used in combination with earlyTermination.

Here is an example of a knn search using a seedQuery:

?q={!knn f=vector topK=10 seedQuery='id:(1 4 10)'}[1.0, 2.0, 3.0, 4.0]

The search results retrieved are the k=10 nearest documents to the vector in input [1.0, 2.0, 3.0, 4.0]. Documents matching the query id:(1 4 10) are used as entry points for the ANN search. If no documents match the seed, Solr falls back to a regular knn search without seeding, starting instead from random entry points.

filteredSearchThreshold

Optional

Default: Lucene default

An integer value from 0 to 100

ACORN is an algorithm designed to make hybrid searches consisting of a filter and a vector search more efficient. This approach tackles both the performance limitations of pre- and post- filtering. It modifies the construction of the HNSW graph and the search on it. Based on ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data (2024).

Solr relies on Lucene’s implementation of the filteredSearchThreshold in the KnnSearchStrategy.

A suggested value is 60 based on a benchmark you can read more about in this Github comment.

The filteredSearchThreshold regulates this behavior. If the percentage of documents that satisfies the filter is less than the threshold ACORN will be used.

Here is an example of a knn search using a filteredSearchThreshold:

?q={!knn f=vector topK=10 filteredSearchThreshold=60}[1.0, 2.0, 3.0, 4.0]

knn_text_to_vector Query Parser

The knn_text_to_vector query parser encode a textual query to a vector using a dedicated Large Language Model(fine tuned for the task of encoding text to vector for sentence similarity) and matches k-nearest neighbours documents to such query vector.

In addition to the parameters in common with the other dense-retrieval query parsers, it takes the following:

model

Required

Default: none

The model to use to encode the text to a vector. Must reference an existing model loaded into the /schema/text-to-vector-model-store.

topK

Optional

Default: 10

How many k-nearest results to return.

Here’s an example of a simple knn_text_to_vector search:

?q={!knn_text_to_vector model=a-model f=vector topK=10}hello world query

The search results retrieved are the k=10 nearest documents to the vector encoded from the query hello world query, using the model a-model.

For more details on how to work with vectorise text in Apache Solr, please refer to the dedicated page: Text to Vector

vectorSimilarity Query Parser

The vectorSimilarity vector similarity query parser matches documents whose similarity with the target vector is a above a minimum threshold.

In addition to the common parameters described above, it takes the following parameters:

minReturn

Required

Default: none

Minimum similarity threshold of nodes in the graph to be returned as matches

minTraverse

Optional

Default: -Infinity

Minimum similarity of nodes in the graph to continue traversal of their neighbors

Here’s an example of a simple vectorSimilarity search:

?q={!vectorSimilarity f=vector minReturn=0.7}[1.0, 2.0, 3.0, 4.0]

The search results retrieved are all documents whose similarity with the input vector [1.0, 2.0, 3.0, 4.0] is at least 0.7 based on the similarityFunction configured at indexing time

Which one to use?

Let’s see when to use each of the dense retrieval query parsers available:

knn Query Parser

You should use the knn query parser when:

you search for the top-K closest vectors to a query vector
you work directly with vectors (no text encoding is involved)
you want to a have a fine-grained control over the way you encode text to vector and prefer to do it outside of Apache Solr

knn_text_to_vector Query Parser

You should use the knn_text_to_vector query parser when:

you search for the top-K closest vectors to a query text
you work directly with text and want Solr to handle the encoding to vector behind the scenes
you are building demos/prototypes

Apache Solr uses LangChain4j to interact with Large Language Models. The integration is experimental and we are going to improve our stress-test and benchmarking coverage of this query parser in future iterations: if you care about raw performance you may prefer to encode the text outside of Solr

vectorSimilarity Query Parser

You should use the vectorSimilarity query parser when:

you search for the closest vectors to a query vector within a similarity threshold
you work directly with vectors (no text encoding is involved)
you want to a have a fine-grained control over the way you encode text to vector and prefer to do it outside of Apache Solr

Graph Pre-Filtering

Pre-Filtering the set of candidate documents considered when walking the graph can be specified either explicitly, or implicitly (based on existing fq params) depending on how and when these dense vector query parsers are used.

Explicit Pre-Filtering

The preFilter parameter can be specified explicitly to reduce the number of candidate documents evaluated for the distance calculation:

?q={!vectorSimilarity f=vector minReturn=0.7 preFilter=inStock:true}[1.0, 2.0, 3.0, 4.0]

In the above example, only documents matching the Pre-Filter inStock:true will be candidates for consideration when evaluating the vectorSimilarity search against the specified vector.

The preFilter parameter may be blank (ex: preFilter="") to indicate that no Pre-Filtering should be performed; or it may be multi-valued — either through repetition, or via duplicated Parameter References.

These two examples are equivalent:

?q={!knn f=vector topK=10 preFilter=category:AAA preFilter=inStock:true}[1.0, 2.0, 3.0, 4.0]

?q={!knn f=vector topK=10 preFilter=$knnPreFilter}[1.0, 2.0, 3.0, 4.0]
&knnPreFilter=category:AAA
&knnPreFilter=inStock:true

Implicit Pre-Filtering

While the preFilter parameter may be explicitly specified on any usage of the knn or vectorSimilarity query parsers, the default Pre-Filtering behavior (when no preFilter parameter is specified) will vary based on how the query parser is used:

When used as the main q param: fq filters in the request (that are not Solr Post Filters) will be combined to form an implicit Graph Pre-Filter.
- This default behavior optimizes the number of vector distance calculations considered, eliminating documents that would eventually be excluded by an fq filter anyway.
- includeTags and excludeTags may be used to limit the set of fq filters used in the Pre-Filter.
When a vector search query parser is used as an fq param, or as a subquery clause in a larger query: No implicit Pre-Filter is used.
- includeTags and excludeTags must not be used in these situations.

The example request below shows two usages of vector query parsers that will get no implicit Pre-Filtering from any of the fq parameters, because neither usage is as the main q param:

?q=(color_str:red OR {!vectorSimilarity f=color_vector minReturn=0.7 v="[1.0, 2.0, 3.0, 4.0]"})
&fq={!knn f=title_vector topK=10}[9.0, 8.0, 7.0, 6.0]
&fq=inStock:true

However, the next example shows a basic request where all fq parameters will be used as implicit Pre-Filters on the main knn query:

?q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
&fq=category:AAA
&fq=inStock:true

If we modify the above request to add tags to the fq parameters, we can specify an includeTags option on the knn parser to limit which fq filters are used for Pre-Filtering:

?q={!knn f=vector topK=10 includeTags=for_knn}[1.0, 2.0, 3.0, 4.0]
&fq=category:AAA
&fq={!tag=for_knn}inStock:true

In this example, only the inStock:true filter will be used for Pre-Filtering to find the topK=10 documents, and the category:AAA filter will be applied independently; possibly resulting in less then 10 total matches.

Some use cases where includeTags and/or excludeTags may be more useful then an explicit preFilter parameters:

You have some fq parameters that are re-used on many requests (even when you don’t use search dense vector fields) that you wish to be used as Pre-Filters when you do search dense vector fields.
You typically want all fq params to be used as graph Pre-Filters on your knn queries, but when users "drill down" on Facets, you want the fq parameters you add to be excluded from the Pre-Filtering so that the result set gets smaller; instead of just computing a new topK set.

Usage in Re-Ranking Query

Both dense vector search query parsers can be used to rerank first pass query results:

&q=id:(3 4 9 2)&rq={!rerank reRankQuery=$rqq reRankDocs=4 reRankWeight=1}&rqq={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]

When using knn in re-ranking pay attention to the topK parameter.

The second pass score(deriving from knn) is calculated only if the document d from the first pass is within the k-nearest neighbors(in the whole index) of the target vector to search.

This means the second pass knn is executed on the whole index anyway, which is a current limitation.

The final ranked list of results will have the first pass score(main query q) added to the second pass score(the approximated similarityFunction distance to the target vector to search) multiplied by a multiplicative factor(reRankWeight).

Details about using the ReRank Query Parser can be found in the Query Re-Ranking section.

GPU Acceleration

This is feature is currently experimental.

Building HNSW graphs, esp. with high dimensions and cardinality, is usually slow. If you have a NVIDIA GPU, then building HNSW based indexes can be sped up manifold. This is powered by the cuVS-Lucene library, a pluggable vectors format for Apache Lucene. It uses the state of the art CAGRA algorithm for quickly building a fixed degree connected graph, which is then serialized into a HNSW graph. CUDA 13.0+ and JDK 22 are required to use this feature.

You can know more about Nvidia’s cuVS library here: https://developer.nvidia.com/cuvs

To try this out, first copy the module jar files (found in the regular Solr tarball, not the slim one) before starting Solr.

cp modules/cuvs/lib/*.jar server/solr-webapp/webapp/WEB-INF/lib/

Define the fieldType in the schema, with knnAlgorithm set to cagra_hnsw:

<fieldType name="knn_vector" class="solr.DenseVectorField" vectorDimension="8" knnAlgorithm="cagra_hnsw" similarityFunction="cosine" cuvsWriterThreads="32" cuvsIntGraphDegree="128" cuvsGraphDegree="64" cuvsHnswLayers="1" cuvsHnswM="16" cuvsHnswEfConstruction="100"/>

Define the codecFactory in solrconfig.xml

<codecFactory name="CuVSCodecFactory" class="org.apache.solr.cuvs.CuVSCodecFactory"/>

Where:

cuvsWriterThreads - number of threads to use
cuvsIntGraphDegree - Intermediate graph degree for building the CAGRA index
cuvsGraphDegree - Graph degree for building the CAGRA index
cuvsHnswLayers - Number of HNSW graph layers to construct while building the HNSW index
cuvsHnswM - cuvsHnswM parameter passed to the fallback Lucene99HnswVectorsWriter
cuvsHnswEfConstruction - cuvsHnswEfConstruction parameter passed to the fallback Lucene99HnswVectorsWriter

Example

Following is a complete example of setting up a collection with cuVS.

Install CUDA 13.0

Ubuntu 22.04 LTS
Ubuntu 24.04 LTS
Fedora 39+

# Install CUDA 13.0 from NVIDIA's repository
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install -y cuda-toolkit-13

# Set up environment variables
echo 'export PATH=/usr/local/cuda-13/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-13/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

# Verify installation
nvcc --version

# Install CUDA 13.0 from NVIDIA's repository
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install -y cuda-toolkit-13

# Set up environment variables
echo 'export PATH=/usr/local/cuda-13/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-13/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

# Verify installation
nvcc --version

# Install CUDA 13.0 from NVIDIA's repository
# For Fedora 39, 40, and newer versions:
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/fedora39/x86_64/cuda-fedora39.repo
sudo dnf clean all
sudo dnf -y install cuda-toolkit-13

# Set up environment variables
echo 'export PATH=/usr/local/cuda-13/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-13/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

# Verify installation
nvcc --version

Fetch libcuvs native libraries

# Create virtual environment and install libcuvs-cu13 from NVIDIA's RAPIDS repositories
python3 -m venv libcuvs-env
source libcuvs-env/bin/activate

# Install libcuvs-cu13 from NVIDIA's RAPIDS wheels (fetches latest 25.10.x artifact)
pip install "libcuvs-cu13<25.11.0" --pre --extra-index-url=https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/

# Set LD_LIBRARY_PATH to include libcuvs libraries
SITE_PACKAGES_PATH=$(pwd)/$(find libcuvs-env -name site-packages)
export VENV_LIB=$SITE_PACKAGES_PATH/libcuvs/lib64:$SITE_PACKAGES_PATH/librmm/lib64:$SITE_PACKAGES_PATH/rapids_logger/lib64
export LD_LIBRARY_PATH=$VENV_LIB:$LD_LIBRARY_PATH:/usr/local/cuda-13/lib64

# Verify libcuvs_c.so is available
find $LD_LIBRARY_PATH -name "libcuvs_c.so" | head -1

# Deactivate virtual environment (optional - libraries are now accessible via LD_LIBRARY_PATH)
deactivate

Copy the cuvs module jar files (before starting Solr).

cp modules/cuvs/lib/*.jar server/solr-webapp/webapp/WEB-INF/lib/

Create a configset

mkdir -p cuvs_configset/conf

cat > cuvs_configset/conf/solrconfig.xml << 'EOF'
<?xml version="1.0" ?>
<config>
    <luceneMatchVersion>10.0.0</luceneMatchVersion>
    <dataDir>${solr.data.dir:}</dataDir>
    <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>

    <updateHandler class="solr.DirectUpdateHandler2">
        <updateLog>
            <str name="dir">${solr.ulog.dir:}</str>
        </updateLog>
        <autoCommit>
            <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
            <openSearcher>false</openSearcher>
        </autoCommit>
        <autoSoftCommit>
            <maxTime>${solr.autoSoftCommit.maxTime:1000}</maxTime>
        </autoSoftCommit>
    </updateHandler>

    <codecFactory name="CuVSCodecFactory" class="org.apache.solr.cuvs.CuVSCodecFactory"/>

    <requestHandler name="/select" class="solr.SearchHandler">
        <lst name="defaults">
            <str name="echoParams">explicit</str>
            <int name="rows">10</int>
        </lst>
    </requestHandler>

    <requestHandler name="/update" class="solr.UpdateRequestHandler" />
</config>
EOF

cat > cuvs_configset/conf/managed-schema << 'EOF'
<?xml version="1.0" ?>
<schema name="schema-densevector" version="1.7">
    <fieldType name="string" class="solr.StrField" multiValued="true"/>
    <fieldType name="knn_vector" class="solr.DenseVectorField"
               vectorDimension="8"
               knnAlgorithm="cagra_hnsw"
               similarityFunction="cosine"
               cuvsWriterThreads="32"
               cuvsIntGraphDegree="128"
               cuvsGraphDegree="64"
               cuvsHnswLayers="1"
               cuvsHnswM="16"
               cuvsHnswEfConstruction="100"/>
    <fieldType name="plong" class="solr.LongPointField" useDocValuesAsStored="false"/>

    <field name="id" type="string" indexed="true" stored="true" multiValued="false" required="false"/>
    <field name="article_vector" type="knn_vector" indexed="true" stored="true"/>
    <field name="_version_" type="plong" indexed="true" stored="true" multiValued="false" />

    <uniqueKey>id</uniqueKey>
</schema>
EOF

Start Solr
```
./bin/solr start
```

Upload the configset and create a collection

./bin/solr zk upconfig -n cuvs_vectors -d cuvs_configset/conf && ./bin/solr create -c vectors -n cuvs_vectors

Index documents

curl -s -X POST "http://localhost:8983/solr/vectors/update?commit=true" \
     -H "Content-Type: application/json" \
     -d '[
       {"id": "doc1", "article_vector": [0.35648, 0.11664, 0.85660, 0.25043, 0.80778, 0.08031, 0.48444, 0.39083]},
       {"id": "doc2", "article_vector": [0.86821, 0.24947, 0.38601, 0.22615, 0.31498, 0.74612, 0.69403, 0.19691]},
       {"id": "doc3", "article_vector": [0.34098, 0.49236, 0.35950, 0.17840, 0.49470, 0.97242, 0.28249, 0.72526]},
       {"id": "doc4", "article_vector": [0.44979, 0.49473, 0.47197, 0.02869, 0.05262, 0.60855, 0.67370, 0.78656]},
       {"id": "doc5", "article_vector": [0.23235, 0.70062, 0.95036, 0.36251, 0.41233, 0.53170, 0.25459, 0.81606]}
     ]'

Query the index

curl -s 'http://localhost:8983/solr/vectors/select?q=%7B!knn%20f=article_vector%20topK=1%7D%5B0.84393,0.50073,0.57059,0.89899,-0.08722,0.26803,0.00807,0.09877%5D&fl=id,score&rows=3&omitHeader=true'

Should return the following

{
  "response":{
    "numFound":1,
    "start":0,
    "maxScore":0.8377289,
    "numFoundExact":true,
    "docs":[{
      "id":"doc2",
      "score":0.8377289
    }]
  }
}

Dense Vector Search

Important Concepts

Dense Vector Representation

Dense Retrieval

Index Time

DenseVectorField

ScalarQuantizedDenseVectorField

BinaryQuantizedDenseVectorField

Query Time

knn Query Parser

knn_text_to_vector Query Parser

vectorSimilarity Query Parser

Which one to use?

knn Query Parser

knn_text_to_vector Query Parser

vectorSimilarity Query Parser

Graph Pre-Filtering

Explicit Pre-Filtering

Implicit Pre-Filtering

Usage in Re-Ranking Query

GPU Acceleration

Example

Additional Resources