SolrJ

SolrJ is an API that makes it easy for applications written in Java (or any language based on the JVM) to talk to Solr. SolrJ hides a lot of the details of connecting to Solr and allows your application to interact with Solr with simple high-level methods. SolrJ supports most Solr APIs, and is highly configurable.

Building and Running SolrJ Applications

The SolrJ API ships with Solr, so you do not have to download or install anything else. But you will need to configure your build to include SolrJ and its dependencies.

Common Build Systems

Most mainstream build systems greatly simplify dependency management, making it easy to add SolrJ to your project.

For projects built with Ant (using Ivy), place the following in your ivy.xml:

<dependency org="org.apache.solr" name="solr-solrj" rev="9.2.1"/>

For projects built with Maven, place the following in your pom.xml:

<dependency>
  <groupId>org.apache.solr</groupId>
  <artifactId>solr-solrj</artifactId>
  <version>9.2.1</version>
</dependency>

For projects built with Gradle, place the following in your build.gradle:

compile group: 'org.apache.solr', name: 'solr-solrj', version: '9.2.1'

If you want to use CloudSolrClient and have it talk directly to ZooKeeper, you will need to add a dependency on the solr-solrj-zookeeper artifact.

If you are not using Streaming Expressions classes in your Java code, you can exclude the solr-solrj-streaming dependency.

Adding SolrJ to the Classpath Manually

If you are not using one of the above build system, it’s still easy to add SolrJ to your build.

At build time, all that is required is the SolrJ jar itself: solr-solrj-9.2.1.jar. To compile code manually that uses SolrJ, use a javac command similar to:

javac -cp .:$SOLR_TIP/server/solr-webapp/webapp/WEB-INF/lib/solr-solrj-9.2.1.jar ...

At runtime, you need a few of SolrJ’s dependencies, in addition to SolrJ itself. In the Solr distribution these dependencies are not separated from Solr’s dependencies, so you must include all or manually choose the exact set that is needed. Please refer to the maven release for the exact dependencies needed for your version. Run your project with a classpath like:

java -cp .:$SOLR_TIP/server/lib/ext:$SOLR_TIP/server/solr-webapp/webapp/WEB-INF/lib/* ...

If you are worried about the SolrJ libraries expanding the size of your client application, you can use a code obfuscator like ProGuard to remove APIs that you are not using.

SolrJ Overview

For all its flexibility, SolrJ is built around a few simple interfaces.

All requests to Solr are sent by a SolrClient. SolrClient’s are the main workhorses at the core of SolrJ. They handle the work of connecting to and communicating with Solr, and are where most of the user configuration happens.

Requests are sent in the form of SolrRequests, and are returned as SolrResponses.

Types of SolrClients

SolrClient has a few concrete implementations, each geared towards a different usage-pattern or resiliency model:

  • HttpSolrClient - geared towards query-centric workloads, though also a good general-purpose client. Communicates directly with a single Solr node.

  • Http2SolrClient - async, non-blocking and general-purpose client that leverage HTTP/2. This class is experimental therefore its API’s might change or be removed in minor versions of SolrJ.

  • LBHttpSolrClient - balances request load across a list of Solr nodes. Adjusts the list of "in-service" nodes based on node health.

  • LBHttp2SolrClient - just like LBHttpSolrClient but using Http2SolrClient instead. This class is experimental therefore its API’s might change or be removed in minor versions of SolrJ.

  • CloudSolrClient - geared towards communicating with SolrCloud deployments. Uses already-recorded ZooKeeper state to discover and route requests to healthy Solr nodes.

  • ConcurrentUpdateSolrClient - geared towards indexing-centric workloads. Buffers documents internally before sending larger batches to Solr.

  • ConcurrentUpdateHttp2SolrClient - just like ConcurrentUpdateSolrClient but using Http2SolrClient instead. This class is experimental therefore its API’s might change or be removed in minor versions of SolrJ.

Common Configuration Options

Most SolrJ configuration happens at the SolrClient level. The most common/important of these are discussed below. For comprehensive information on how to tweak your SolrClient, see the Javadocs for the involved client, and its corresponding builder object.

Base URLs

Most SolrClient implementations (except for CloudSolrClient and Http2SolrClient) require users to specify one or more Solr base URLs, which the client then uses to send HTTP requests to Solr. The path users include on the base URL they provide has an effect on the behavior of the created client from that point on.

  1. A URL with a path pointing to a specific core or collection (e.g., http://hostname:8983/solr/core1). When a core or collection is specified in the base URL, subsequent requests made with that client are not required to re-specify the affected collection. However, the client is limited to sending requests to that core/collection, and can not send requests to any others.

  2. A URL pointing to the root Solr path (e.g., http://hostname:8983/solr). When no core or collection is specified in the base URL, requests can be made to any core/collection, but the affected core/collection must be specified on all requests.

Generally speaking, if your SolrClient will only be used on a single core/collection, including that entity in the path is the most convenient. Where more flexibility is required, the collection/core should be excluded.

Base URLs of Http2SolrClient

The Http2SolrClient manages connections to different nodes efficiently. Http2SolrClient does not require a baseUrl. In case a baseUrl is not provided, then SolrRequest.basePath must be set, so Http2SolrClient knows which nodes to send requests to. If not an IllegalArgumentException will be thrown.

Base URLs of CloudSolrClient

It is also possible to specify base URLs for CloudSolrClient, but URLs are expected to point to the root Solr path (e.g., http://hostname:8983/solr). They should not include any collections, cores, or other path components.

final List<String> solrUrls = new ArrayList<>();
solrUrls.add("http://solr1:8983/solr");
solrUrls.add("http://solr2:8983/solr");
return new CloudSolrClient.Builder(solrUrls).build();

In case a baseUrl is not provided, then a list of ZooKeeper hosts (with ports) and ZooKeeper root must be provided. If no ZooKeeper root is used then java.util.Optional.empty() has to be provided as part of the method.

final List<String> zkServers = new ArrayList<>();
zkServers.add("zookeeper1:2181");
zkServers.add("zookeeper2:2181");
zkServers.add("zookeeper3:2181");
return new CloudSolrClient.Builder(zkServers, Optional.empty()).build();
final List<String> zkServers = new ArrayList<>();
zkServers.add("zookeeper1:2181");
zkServers.add("zookeeper2:2181");
zkServers.add("zookeeper3:2181");
return new CloudSolrClient.Builder(zkServers, Optional.of("/solr")).build();

Additionally, you will need to depend on the solr-solrj-zookeeper artifact or else you will get a ClassNotFoundException.

The ZooKeeper based connection is the most reliable and performant means for CloudSolrClient to work. On the other hand, it means exposing ZooKeeper more broadly than to Solr nodes, which is a security risk. It also adds more JAR dependencies.

Timeouts

All SolrClient implementations allow users to specify the connection and read timeouts for communicating with Solr. These are provided at client creation time, as in the example below:

final String solrUrl = "http://localhost:8983/solr";
return new HttpSolrClient.Builder(solrUrl)
    .withConnectionTimeout(10000)
    .withSocketTimeout(60000)
    .build();

When these values are not explicitly provided, SolrJ falls back to using the defaults for the OS/environment is running on.

ConcurrentUpdateSolrClient and its counterpart ConcurrentUpdateHttp2SolrClient also implement a stall prevention timeout that allows requests to non-responsive nodes to fail quicker than waiting for a socket timeout. The default value of this timeout is set to 15000 ms and can be adjusted by a system property solr.cloud.client.stallTime. This value should be smaller than solr.jetty.http.idleTimeout (Which is 120000 ms by default) and greater than the processing time of the largest update request.

Cloud Request Routing

The SolrJ CloudSolrClient implementations (CloudSolrClient and CloudHttp2SolrClient) respect the shards.preference parameter. Therefore requests sent to single-sharded collections, using either of the above clients, will route requests the same way that distributed requests are routed to individual shards. If no shards.preference parameter is provided, the clients will default to sorting replicas randomly.

For update requests, while the replicas are sorted in the order defined by the request, leader replicas will always be sorted first.

Querying in SolrJ

SolrClient has a number of query() methods for fetching results from Solr. Each of these methods takes in a SolrParams,an object encapsulating arbitrary query-parameters. And each method outputs a QueryResponse, a wrapper which can be used to access the result documents and other related metadata.

The following snippet uses a SolrClient to query Solr’s "techproducts" example collection, and iterate over the results.

final SolrClient client = getSolrClient();

final Map<String, String> queryParamMap = new HashMap<>();
queryParamMap.put("q", "*:*");
queryParamMap.put("fl", "id, name");
queryParamMap.put("sort", "id asc");
MapSolrParams queryParams = new MapSolrParams(queryParamMap);

final QueryResponse response = client.query("techproducts", queryParams);
final SolrDocumentList documents = response.getResults();

print("Found " + documents.getNumFound() + " documents");
for (SolrDocument document : documents) {
  final String id = (String) document.getFirstValue("id");
  final String name = (String) document.getFirstValue("name");

  print("id: " + id + "; name: " + name);
}

SolrParams has a SolrQuery subclass, which provides some convenience methods that greatly simplifies query creation. The following snippet shows how the query from the previous example can be built using some of the convenience methods in SolrQuery:

final SolrQuery query = new SolrQuery("*:*");
query.addField("id");
query.addField("name");
query.setSort("id", ORDER.asc);
query.setRows(numResultsToReturn);

Indexing in SolrJ

Indexing is also simple using SolrJ. Users build the documents they want to index as instances of SolrInputDocument, and provide them as arguments to one of the add() methods on SolrClient.

The following example shows how to use SolrJ to add a document to Solr’s "techproducts" example collection:

final SolrClient client = getSolrClient();

final SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", UUID.randomUUID().toString());
doc.addField("name", "Amazon Kindle Paperwhite");

final UpdateResponse updateResponse = client.add("techproducts", doc);
// Indexed documents must be committed
client.commit("techproducts");
The indexing examples above are intended to show syntax. For brevity, they break several Solr indexing best-practices. Under normal circumstances, documents should be indexed in larger batches, instead of one at a time. It is also suggested that Solr administrators commit documents using Solr’s autocommit settings, and not using explicit commit() invocations.

Java Object Binding

While the UpdateResponse and QueryResponse interfaces that SolrJ provides are useful, it is often more convenient to work with domain-specific objects that can more easily be understood by your application. Thankfully, SolrJ supports this by implicitly converting documents to and from any class that has been specially marked with Field annotations.

Each instance variable in a Java object can be mapped to a corresponding Solr field, using the Field annotation. The Solr field shares the name of the annotated variable by default, however, this can be overridden by providing the annotation with an explicit field name.

The example snippet below shows an annotated TechProduct class that can be used to represent results from Solr’s "techproducts" example collection.

public static class TechProduct {
  @Field public String id;
  @Field public String name;

  public TechProduct(String id, String name) {
    this.id = id;
    this.name = name;
  }

  public TechProduct() {}
}

Application code with access to the annotated TechProduct class above can index TechProduct objects directly without any conversion, as in the example snippet below:

final SolrClient client = getSolrClient();

final TechProduct kindle = new TechProduct("kindle-id-4", "Amazon Kindle Paperwhite");
final UpdateResponse response = client.addBean("techproducts", kindle);

client.commit("techproducts");

Similarly, search results can be converted directly into bean objects using the getBeans() method on QueryResponse:

final SolrClient client = getSolrClient();

final SolrQuery query = new SolrQuery("*:*");
query.addField("id");
query.addField("name");
query.setSort("id", ORDER.asc);

final QueryResponse response = client.query("techproducts", query);
final List<TechProduct> products = response.getBeans(TechProduct.class);

Other APIs

SolrJ allows more than just querying and indexing. It supports all of Solr’s APIs. Accessing Solr’s other APIs is as easy as finding the appropriate request object, providing any necessary parameters, and passing it to the request() method of your SolrClient. request() will return a NamedList: a generic object which mirrors the hierarchical structure of the JSON or XML returned by their request.

The example below shows how SolrJ users can call the CLUSTERSTATUS API of SolrCloud deployments, and manipulate the returned NamedList:

final SolrClient client = getSolrClient();

@SuppressWarnings({"rawtypes"})
final SolrRequest request = new CollectionAdminRequest.ClusterStatus();

final NamedList<Object> response = client.request(request);
@SuppressWarnings({"unchecked"})
final NamedList<Object> cluster = (NamedList<Object>) response.get("cluster");
@SuppressWarnings({"unchecked"})
final List<String> liveNodes = (List<String>) cluster.get("live_nodes");

print("Found " + liveNodes.size() + " live nodes");