Package org.apache.solr.update.processor
Class TikaLanguageIdentifierUpdateProcessor
- java.lang.Object
-
- org.apache.solr.update.processor.UpdateRequestProcessor
-
- org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor
-
- org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessor
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
,LangIdParams
public class TikaLanguageIdentifierUpdateProcessor extends LanguageIdentifierUpdateProcessor
Identifies the language of a set of input fields using Tika's LanguageIdentifier. The tika-core-x.y.jar must be on the classpath- Since:
- 3.5
-
-
Field Summary
-
Fields inherited from class org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor
allMapFieldsSet, docIdField, enabled, enableMapping, enforceSchema, fallbackFields, fallbackValue, inputFields, langField, langPattern, langsField, langWhitelist, lcMap, mapFields, mapIndividual, mapIndividualFieldsSet, mapKeepOrig, mapLcMap, mapOverwrite, mapPattern, mapReplaceStr, maxFieldValueChars, maxTotalChars, overwrite, schema, threshold, tikaSimilarityPattern
-
Fields inherited from class org.apache.solr.update.processor.UpdateRequestProcessor
next
-
Fields inherited from interface org.apache.solr.update.processor.LangIdParams
DOCID_FIELD_DEFAULT, DOCID_LANGFIELD_DEFAULT, DOCID_LANGSFIELD_DEFAULT, DOCID_PARAM, DOCID_THRESHOLD_DEFAULT, ENFORCE_SCHEMA, FALLBACK, FALLBACK_FIELDS, FIELDS_PARAM, LANG_FIELD, LANG_WHITELIST, LANGS_FIELD, LANGUAGE_ID, LCMAP, MAP_ENABLE, MAP_FL, MAP_INDIVIDUAL, MAP_INDIVIDUAL_FL, MAP_KEEP_ORIG, MAP_LCMAP, MAP_OVERWRITE, MAP_PATTERN, MAP_PATTERN_DEFAULT, MAP_REPLACE, MAP_REPLACE_DEFAULT, MAX_FIELD_VALUE_CHARS, MAX_FIELD_VALUE_CHARS_DEFAULT, MAX_TOTAL_CHARS, MAX_TOTAL_CHARS_DEFAULT, OVERWRITE, THRESHOLD
-
-
Constructor Summary
Constructors Constructor Description TikaLanguageIdentifierUpdateProcessor(SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected List<DetectedLanguage>
detectLanguage(SolrInputDocument doc)
Detects language(s) from a string.-
Methods inherited from class org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor
concatFields, getMappedField, isEnabled, normalizeLangCode, process, processAdd, resolveLanguage, resolveLanguage, setEnabled
-
Methods inherited from class org.apache.solr.update.processor.UpdateRequestProcessor
close, doClose, finish, processCommit, processDelete, processMergeIndexes, processRollback
-
-
-
-
Constructor Detail
-
TikaLanguageIdentifierUpdateProcessor
public TikaLanguageIdentifierUpdateProcessor(SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next)
-
-
Method Detail
-
detectLanguage
protected List<DetectedLanguage> detectLanguage(SolrInputDocument doc)
Description copied from class:LanguageIdentifierUpdateProcessor
Detects language(s) from a string. Classes wishing to implement their own language detection module should override this method.- Specified by:
detectLanguage
in classLanguageIdentifierUpdateProcessor
- Parameters:
doc
- The content to identify- Returns:
- List of detected language(s) according to RFC-3066
-
-