java.lang.Object
- org.apache.solr.update.processor.UpdateRequestProcessorFactory
- - org.apache.solr.update.processor.OpenNLPExtractNamedEntitiesUpdateProcessorFactory

All Implemented Interfaces:

NamedListInitializedPlugin, SolrCoreAware
```
public class OpenNLPExtractNamedEntitiesUpdateProcessorFactory
extends UpdateRequestProcessorFactory
implements SolrCoreAware
```
Extracts named entities using an OpenNLP NER modelFile from the values found in any matching source field into a configured dest field, after first tokenizing the source text using the index analyzer on the configured analyzerFieldType, which must include solr.OpenNLPTokenizerFactory as the tokenizer. E.g.:
```
   <fieldType name="opennlp-en-tokenization" class="solr.TextField">
     <analyzer>
       <tokenizer class="solr.OpenNLPTokenizerFactory"
                  sentenceModel="en-sent.bin"
                  tokenizerModel="en-tokenizer.bin"/>
     </analyzer>
   </fieldType>
 
```
See the OpenNLP website for information on downloading pre-trained models. Note that in order to use model files larger than 1MB on SolrCloud, ZooKeeper server and client configuration is required.
The source field(s) can be configured as either:
- One or more <str>
- An <arr> of <str>
- A <lst> containing FieldMutatingUpdateProcessorFactory style selector arguments
The dest field can be a single <str> containing the literal name of a destination field, or it may be a <lst> specifying a regex pattern and a replacement string. If the pattern + replacement option is used the pattern will be matched against all fields matched by the source selector, and the replacement string (including any capture groups specified from the pattern) will be evaluated a using Matcher.replaceAll(String) to generate the literal name of the destination field. Additionally, an occurrence of the string "{EntityType}" in the dest field specification, or in the replacement string, will be replaced with the entity type(s) returned for each entity by the OpenNLP NER model; as a result, if the model extracts more than one entity type, then more than one dest field will be populated.
If the resolved dest field already exists in the document, then the named entities extracted from the source fields will be added to it.
In the example below:
- Named entities will be extracted from the text field and added to the names_ss field
- Named entities will be extracted from both the title and subtitle fields and added into the titular_people field
- Named entities will be extracted from any field with a name ending in _txt -- except for notes_txt -- and added into the people_ss field
- Named entities will be extracted from any field with a name beginning with "desc" and ending in "s" (e.g. "descs" and "descriptions") and added to a field prefixed with "key_", not ending in "s", and suffixed with "_people". (e.g. "key_desc_people" or "key_description_people")
- Named entities will be extracted from the summary field and added to the summary_person_ss field, assuming that the modelFile only extracts entities of type "person".
```
 <updateRequestProcessorChain name="multiple-extract">
   <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
     <str name="modelFile">en-test-ner-person.bin</str>
     <str name="analyzerFieldType">opennlp-en-tokenization</str>
     <str name="source">text</str>
     <str name="dest">people_s</str>
   </processor>
   <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
     <str name="modelFile">en-test-ner-person.bin</str>
     <str name="analyzerFieldType">opennlp-en-tokenization</str>
     <arr name="source">
       <str>title</str>
       <str>subtitle</str>
     </arr>
     <str name="dest">titular_people</str>
   </processor>
   <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
     <str name="modelFile">en-test-ner-person.bin</str>
     <str name="analyzerFieldType">opennlp-en-tokenization</str>
     <lst name="source">
       <str name="fieldRegex">.*_txt$</str>
       <lst name="exclude">
         <str name="fieldName">notes_txt</str>
       </lst>
     </lst>
     <str name="dest">people_s</str>
   </processor>
   <processor class="solr.processor.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
     <str name="modelFile">en-test-ner-person.bin</str>
     <str name="analyzerFieldType">opennlp-en-tokenization</str>
     <lst name="source">
       <str name="fieldRegex">^desc(.*)s$</str>
     </lst>
     <lst name="dest">
       <str name="pattern">^desc(.*)s$</str>
       <str name="replacement">key_desc$1_people</str>
     </lst>
   </processor>
   <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
     <str name="modelFile">en-test-ner-person.bin</str>
     <str name="analyzerFieldType">opennlp-en-tokenization</str>
     <str name="source">summary</str>
     <str name="dest">summary_{EntityType}_s</str>
   </processor>
   <processor class="solr.LogUpdateProcessorFactory" />
   <processor class="solr.RunUpdateProcessorFactory" />
 </updateRequestProcessorChain>
 
```
Since:

7.3.0

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.solr.update.processor.UpdateRequestProcessorFactory
  UpdateRequestProcessorFactory.RunAlways

Field Summary

Fields
Modifier and Type	Field	Description
`static String`	`ANALYZER_FIELD_TYPE_PARAM`
`static String`	`DEST_PARAM`
`static String`	`ENTITY_TYPE`
`static String`	`MODEL_PARAM`
`static String`	`PATTERN_PARAM`
`static String`	`REPLACEMENT_PARAM`
`static String`	`SOURCE_PARAM`

Constructor Summary

Constructors
Constructor Description

OpenNLPExtractNamedEntitiesUpdateProcessorFactory()

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`UpdateRequestProcessor`	`getInstance(SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next)`
`protected FieldMutatingUpdateProcessor.FieldNameSelector`	`getSourceSelector()`
`void`	`inform(SolrCore core)`
`void`	`init(org.apache.solr.common.util.NamedList<?> args)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail
- SOURCE_PARAM
```
public static final String SOURCE_PARAM
```
  See Also:
  
  Constant Field Values
- DEST_PARAM
```
public static final String DEST_PARAM
```
  See Also:
  
  Constant Field Values
- PATTERN_PARAM
```
public static final String PATTERN_PARAM
```
  See Also:
  
  Constant Field Values
- REPLACEMENT_PARAM
```
public static final String REPLACEMENT_PARAM
```
  See Also:
  
  Constant Field Values
- MODEL_PARAM
```
public static final String MODEL_PARAM
```
  See Also:
  
  Constant Field Values
- ANALYZER_FIELD_TYPE_PARAM
```
public static final String ANALYZER_FIELD_TYPE_PARAM
```
  See Also:
  
  Constant Field Values
- ENTITY_TYPE
```
public static final String ENTITY_TYPE
```
  See Also:
  
  Constant Field Values

Constructor Detail
- OpenNLPExtractNamedEntitiesUpdateProcessorFactory
```
public OpenNLPExtractNamedEntitiesUpdateProcessorFactory()
```

Method Detail

getSourceSelector

protected final FieldMutatingUpdateProcessor.FieldNameSelector getSourceSelector()

init

public void init(org.apache.solr.common.util.NamedList<?> args)

Specified by:: init in interface NamedListInitializedPlugin

inform
```
public void inform(SolrCore core)
```
Specified by:

inform in interface SolrCoreAware

getInstance

public final UpdateRequestProcessor getInstance(SolrQueryRequest req,
                                                SolrQueryResponse rsp,
                                                UpdateRequestProcessor next)

Specified by:: getInstance in class UpdateRequestProcessorFactory

Class OpenNLPExtractNamedEntitiesUpdateProcessorFactory

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.solr.update.processor.UpdateRequestProcessorFactory

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

SOURCE_PARAM

DEST_PARAM

PATTERN_PARAM

REPLACEMENT_PARAM

MODEL_PARAM

ANALYZER_FIELD_TYPE_PARAM

ENTITY_TYPE

Constructor Detail

OpenNLPExtractNamedEntitiesUpdateProcessorFactory

Method Detail

getSourceSelector

init

inform

getInstance