Class DocumentCategorizerUpdateProcessorFactory

java.lang.Object
org.apache.solr.update.processor.UpdateRequestProcessorFactory
org.apache.solr.update.processor.DocumentCategorizerUpdateProcessorFactory
All Implemented Interfaces:
NamedListInitializedPlugin, SolrCoreAware

public class DocumentCategorizerUpdateProcessorFactory extends UpdateRequestProcessorFactory implements SolrCoreAware
Classifies text in fields using a model via OpenNLP modelFile from the values found in any matching source field into a configured dest field.

See the Tutorial for the step by step guide.

The source field(s) can be configured as either:

The dest field can be a single <str> containing the literal name of a destination field, or it may be a <lst> specifying a regex pattern and a replacement string. If the pattern + replacement option is used the pattern will be matched against all fields matched by the source selector, and the replacement string (including any capture groups specified from the pattern) will be evaluated a using Matcher.replaceAll(String) to generate the literal name of the destination field.

If the resolved dest field already exists in the document, then the named entities extracted from the source fields will be added to it.

In the example below:

  • Classification will be performed on the text field and added to the text_sentiment field
 <updateRequestProcessorChain name="sentimentClassifier">
   <processor class="solr.processor.DocumentCategorizerUpdateProcessorFactory">
     <str name="modelFile">models/sentiment/model.onnx</str>
     <str name="vocabFile">models/sentiment/vocab.txt</str>
     <str name="source">text</str>
     <str name="dest">text_sentiment</str>
   </processor>
   <processor class="solr.LogUpdateProcessorFactory" />
   <processor class="solr.RunUpdateProcessorFactory" />
 </updateRequestProcessorChain>
 
Since:
10.0.0