Class TikaLanguageIdentifierUpdateProcessorFactory

  • All Implemented Interfaces:
    LangIdParams, NamedListInitializedPlugin, SolrCoreAware

    public class TikaLanguageIdentifierUpdateProcessorFactory
    extends UpdateRequestProcessorFactory
    implements SolrCoreAware, LangIdParams
    Identifies the language of a set of input fields using Tika's LanguageIdentifier. The tika-core-x.y.jar must be on the classpath

    The UpdateProcessorChain config entry can take a number of parameters which may also be passed as HTTP parameters on the update request and override the defaults. Here is the simplest processor config possible:

     <processor class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory">
       <str name="langid.fl">title,text</str>
       <str name="langid.langField">language_s</str>
     </processor>
     
    See http://wiki.apache.org/solr/LanguageDetection
    Since:
    3.5