Script Update Processor

The ScriptUpdateProcessorFactory allows Java scripting engines to be used during Solr document update processing, allowing dramatic flexibility in expressing custom document processing logic before being indexed.

It has hooks to the commit, delete, rollback, etc indexing actions, however add is the most common usage. It is implemented as an UpdateProcessor to be placed in an UpdateChain.

This used to be known as the StatelessScriptingUpdateProcessor and was renamed to clarify the key aspect of this update processor is it enables scripting.

The script can be written in any scripting language supported by your JVM (such as JavaScript), and executed dynamically so no pre-compilation is necessary.

Being able to run a script of your choice as part of the indexing pipeline is a really powerful tool, that I sometimes call the Get out of jail free card because you can solve some problems this way that you can’t in any other way. However, you are introducing some potential security vulnerabilities.

Module

This is provided via the scripting Solr Module that needs to be enabled before use.

Enabling the ScriptingUpdateProcessor and Scripting Engines

Java 11 and previous versions come with a JavaScript engine called Nashorn, but Java 12 will require you to add your own JavaScript engine. Other supported scripting engines like JRuby, Jython, Groovy, all require you to add JAR files to Solr.

Learn more about adding any other needed JAR files (depending on your scripting engine) into Solr’s Lib Directories.

Configuration

<updateRequestProcessorChain name="script">
   <processor class="org.apache.solr.scripting.update.ScriptUpdateProcessorFactory">
     <str name="script">update-script.js</str>
   </processor>
   <!--  optional parameters passed to script
     <lst name="params">
       <str name="config_param">example config parameter</str>
     </lst>
   -->
   <processor class="solr.LogUpdateProcessorFactory" />
   <processor class="solr.RunUpdateProcessorFactory" />
 </updateRequestProcessorChain>
The processor supports the defaults/appends/invariants concept for its config. However, it is also possible to skip this level and configure the parameters directly underneath the <processor> tag.

Below follows a list of each configuration parameters and their meaning:

script

Required

Default: none

The script file name. The script file must be placed in the conf/ directory. There can be one or more "script" parameters specified; multiple scripts are executed in the order specified.

engine

Optional

Default: none

Optionally specifies the scripting engine to use. This is only needed if the extension of the script file is not a standard mapping to the scripting engine. For example, if your script file was coded in JavaScript but the file name was called update-script.foo, use javascript as the engine name.

params

Optional

Default: none

Optional parameters that are passed into the script execution context. This is specified as a named list (<lst>) structure with nested typed parameters. If specified, the script context will get a "params" object, otherwise there will be no "params" object available.

Script Execution Context

Every script has some variables provided to it.

logger

Logger (org.slf4j.Logger) instance. This is useful for logging information from the script.

req

SolrQueryRequest instance.

rsp

SolrQueryResponse instance.

params

The "params" object, if any specified, from the configuration.

Try it Out

There is a JavaScript example update-script.js as part of the "techproducts" configset.

To try out scripting, enable the <updateRequestProcessorChain name="script"> configuration in the file ./server/solr/configsets/sample_techproducts_configs/conf/solrconfig.xml. Then start Solr via bin/solr start -e techproducts -Dsolr.modules=scripting.

INFO: update-script#processAdd: id=1

You can see the message recorded in the Solr logging UI.

Examples

The processAdd() and the other script methods can return false to skip further processing of the document. All methods must be defined, though generally the processAdd() method is where the action is.

Javascript

Note: Check solrconfig.xml and uncomment the update request processor definition to enable this feature.

function processAdd(cmd) {

  doc = cmd.solrDoc;  // org.apache.solr.common.SolrInputDocument
  id = doc.getFieldValue("id");
  logger.info("update-script#processAdd: id=" + id);

// Set a field value:
//  doc.setField("foo_s", "whatever");

// Get a configuration parameter:
//  config_param = params.get('config_param');  // "params" only exists if processor configured with <lst name="params">

// Get a request parameter:
// some_param = req.getParams().get("some_param")

// Add a field of field names that match a pattern:
//   - Potentially useful to determine the fields/attributes represented in a result set, via faceting on field_name_ss
//  field_names = doc.getFieldNames().toArray();
//  for(i=0; i < field_names.length; i++) {
//    field_name = field_names[i];
//    if (/attr_.*/.test(field_name)) { doc.addField("attribute_ss", field_names[i]); }
//  }

}

function processDelete(cmd) {
  // no-op
}

function processMergeIndexes(cmd) {
  // no-op
}

function processCommit(cmd) {
  // no-op
}

function processRollback(cmd) {
  // no-op
}

function finish() {
  // no-op
}

Ruby

Ruby support is implemented via the JRuby project. To use JRuby as the scripting engine, add jruby.jar to Solr.

Here’s an example of a JRuby update processing script (note that all variables passed in require prefixing with $, such as $logger):

def processAdd(cmd)
  doc = cmd.solrDoc  # org.apache.solr.common.SolrInputDocument
  id = doc.getFieldValue('id')

  $logger.info "update-script#processAdd: id=#{id}"

  doc.setField('source_s', 'ruby')

  $logger.info "update-script#processAdd: config_param=#{$params.get('config_param')}"
end

def processDelete(cmd)
  # no-op
end

def processMergeIndexes(cmd)
  # no-op
end

def processCommit(cmd)
  # no-op
end

def processRollback(cmd)
  # no-op
end

def finish()
  # no-op
end

Known Issues

The following in JRuby does not work as expected, though it does work properly in JavaScript:

#  $logger.info "update-script#processAdd: request_param=#{$req.params.get('request_param')}"
#  $rsp.add('script_processed',id)

Groovy

Add JARs from a Groovy distro’s lib/ directory to Solr. All JARs from Groovy’s distro probably aren’t required, but more than just the main groovy.jar file is needed (at least when this was tested using Groovy 2.0.6)

def processAdd(cmd) {
  doc = cmd.solrDoc  // org.apache.solr.common.SolrInputDocument
  id = doc.getFieldValue('id')

  logger.info "update-script#processAdd: id=" + id

  doc.setField('source_s', 'groovy')

  logger.info "update-script#processAdd: config_param=" + params.get('config_param')

  logger.info "update-script#processAdd: request_param=" + req.params.get('request_param')
  rsp.add('script_processed',id)
}

def processDelete(cmd) {
 //  no-op
}

def processMergeIndexes(cmd) {
 // no-op
}

def processCommit(cmd) {
 //  no-op
}

def processRollback(cmd) {
 // no-op
}

def finish() {
 // no-op
}

Python

Python support is implemented via the Jython project. Add the standalone jython.jar (the JAR that contains all the dependencies) into Solr.

def processAdd(cmd):
  doc = cmd.solrDoc
  id = doc.getFieldValue("id")
  logger.info("update-script#processAdd: id=" + id)

def processDelete(cmd):
    logger.info("update-script#processDelete")

def processMergeIndexes(cmd):
    logger.info("update-script#processMergeIndexes")

def processCommit(cmd):
    logger.info("update-script#processCommit")

def processRollback(cmd):
    logger.info("update-script#processRollback")

def finish():
    logger.info("update-script#finish")