Class SolrContentHandler
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.apache.solr.handler.extraction.SolrContentHandler
-
- All Implemented Interfaces:
ExtractingParams,ContentHandler,DTDHandler,EntityResolver,ErrorHandler
public class SolrContentHandler extends DefaultHandler implements ExtractingParams
The class responsible for handling Tika events and translating them intoSolrInputDocuments. This class is not thread-safe.This class cannot be reused, you have to create a new instance per document!
User's may wish to override this class to provide their own functionality.
-
-
Field Summary
Fields Modifier and Type Field Description protected booleancaptureAttribsprotected StringBuildercatchAllBuilderstatic StringcontentFieldNameprotected StringdefaultFieldprotected org.apache.solr.common.SolrInputDocumentdocumentprotected Map<String,StringBuilder>fieldBuildersprotected booleanlowerNamesprotected org.apache.tika.metadata.Metadatametadataprotected org.apache.solr.common.params.SolrParamsparamsprotected IndexSchemaschemaprotected StringunknownFieldPrefix-
Fields inherited from interface org.apache.solr.handler.extraction.ExtractingParams
CAPTURE_ATTRIBUTES, CAPTURE_ELEMENTS, DEFAULT_FIELD, EXTRACT_FORMAT, EXTRACT_ONLY, IGNORE_TIKA_EXCEPTION, LITERALS_OVERRIDE, LITERALS_PREFIX, LOWERNAMES, MAP_PREFIX, PASSWORD_MAP_FILE, RESOURCE_NAME, RESOURCE_PASSWORD, STREAM_TYPE, UNKNOWN_FIELD_PREFIX, XPATH_EXPRESSION
-
-
Constructor Summary
Constructors Constructor Description SolrContentHandler(org.apache.tika.metadata.Metadata metadata, org.apache.solr.common.params.SolrParams params, IndexSchema schema)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidaddCapturedContent()Add the per field captured content to the Solr Document.protected voidaddContent()Add in the catch all content to the field.protected voidaddField(String fname, String fval, String[] vals)protected voidaddLiterals()Add in the literals to the document using theparamsand theExtractingParams.LITERALS_PREFIX.protected voidaddMetadata()Add in any metadata usingmetadataas the source.voidcharacters(char[] chars, int offset, int length)voidendElement(String uri, String localName, String qName)protected StringfindMappedName(String name)Get the name mappingvoidignorableWhitespace(char[] chars, int offset, int length)Treat the same as any other charactersorg.apache.solr.common.SolrInputDocumentnewDocument()This is called by a consumer when it is ready to deal with a new SolrInputDocument.voidstartElement(String uri, String localName, String qName, Attributes attributes)-
Methods inherited from class org.xml.sax.helpers.DefaultHandler
endDocument, endPrefixMapping, error, fatalError, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning
-
-
-
-
Field Detail
-
contentFieldName
public static final String contentFieldName
- See Also:
- Constant Field Values
-
document
protected final org.apache.solr.common.SolrInputDocument document
-
metadata
protected final org.apache.tika.metadata.Metadata metadata
-
params
protected final org.apache.solr.common.params.SolrParams params
-
catchAllBuilder
protected final StringBuilder catchAllBuilder
-
schema
protected final IndexSchema schema
-
fieldBuilders
protected final Map<String,StringBuilder> fieldBuilders
-
captureAttribs
protected final boolean captureAttribs
-
lowerNames
protected final boolean lowerNames
-
unknownFieldPrefix
protected final String unknownFieldPrefix
-
defaultField
protected final String defaultField
-
-
Constructor Detail
-
SolrContentHandler
public SolrContentHandler(org.apache.tika.metadata.Metadata metadata, org.apache.solr.common.params.SolrParams params, IndexSchema schema)
-
-
Method Detail
-
newDocument
public org.apache.solr.common.SolrInputDocument newDocument()
This is called by a consumer when it is ready to deal with a new SolrInputDocument. Overriding classes can use this hook to add in or change whatever they deem fit for the document at that time. The base implementation adds the metadata as fields, allowing for potential remapping.- Returns:
- The
SolrInputDocument. - See Also:
addMetadata(),addCapturedContent(),addContent(),addLiterals()
-
addCapturedContent
protected void addCapturedContent()
Add the per field captured content to the Solr Document. Default implementation uses thefieldBuildersinfo
-
addContent
protected void addContent()
Add in the catch all content to the field. Default impl. uses thecontentFieldNameand thecatchAllBuilder
-
addLiterals
protected void addLiterals()
Add in the literals to the document using theparamsand theExtractingParams.LITERALS_PREFIX.
-
addMetadata
protected void addMetadata()
Add in any metadata usingmetadataas the source.
-
startElement
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException
- Specified by:
startElementin interfaceContentHandler- Overrides:
startElementin classDefaultHandler- Throws:
SAXException
-
endElement
public void endElement(String uri, String localName, String qName) throws SAXException
- Specified by:
endElementin interfaceContentHandler- Overrides:
endElementin classDefaultHandler- Throws:
SAXException
-
characters
public void characters(char[] chars, int offset, int length) throws SAXException- Specified by:
charactersin interfaceContentHandler- Overrides:
charactersin classDefaultHandler- Throws:
SAXException
-
ignorableWhitespace
public void ignorableWhitespace(char[] chars, int offset, int length) throws SAXExceptionTreat the same as any other characters- Specified by:
ignorableWhitespacein interfaceContentHandler- Overrides:
ignorableWhitespacein classDefaultHandler- Throws:
SAXException
-
-