Class SolrContentHandler
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.apache.solr.handler.extraction.SolrContentHandler
-
- All Implemented Interfaces:
ExtractingParams
,ContentHandler
,DTDHandler
,EntityResolver
,ErrorHandler
public class SolrContentHandler extends DefaultHandler implements ExtractingParams
The class responsible for handling Tika events and translating them intoSolrInputDocument
s. This class is not thread-safe.This class cannot be reused, you have to create a new instance per document!
User's may wish to override this class to provide their own functionality.
-
-
Field Summary
Fields Modifier and Type Field Description protected boolean
captureAttribs
protected StringBuilder
catchAllBuilder
static String
contentFieldName
protected String
defaultField
protected org.apache.solr.common.SolrInputDocument
document
protected Map<String,StringBuilder>
fieldBuilders
protected boolean
lowerNames
protected org.apache.tika.metadata.Metadata
metadata
protected org.apache.solr.common.params.SolrParams
params
protected IndexSchema
schema
protected String
unknownFieldPrefix
-
Fields inherited from interface org.apache.solr.handler.extraction.ExtractingParams
CAPTURE_ATTRIBUTES, CAPTURE_ELEMENTS, DEFAULT_FIELD, EXTRACT_FORMAT, EXTRACT_ONLY, IGNORE_TIKA_EXCEPTION, LITERALS_OVERRIDE, LITERALS_PREFIX, LOWERNAMES, MAP_PREFIX, PASSWORD_MAP_FILE, RESOURCE_NAME, RESOURCE_PASSWORD, STREAM_TYPE, UNKNOWN_FIELD_PREFIX, XPATH_EXPRESSION
-
-
Constructor Summary
Constructors Constructor Description SolrContentHandler(org.apache.tika.metadata.Metadata metadata, org.apache.solr.common.params.SolrParams params, IndexSchema schema)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
addCapturedContent()
Add the per field captured content to the Solr Document.protected void
addContent()
Add in the catch all content to the field.protected void
addField(String fname, String fval, String[] vals)
protected void
addLiterals()
Add in the literals to the document using theparams
and theExtractingParams.LITERALS_PREFIX
.protected void
addMetadata()
Add in any metadata usingmetadata
as the source.void
characters(char[] chars, int offset, int length)
void
endElement(String uri, String localName, String qName)
protected String
findMappedName(String name)
Get the name mappingvoid
ignorableWhitespace(char[] chars, int offset, int length)
Treat the same as any other charactersorg.apache.solr.common.SolrInputDocument
newDocument()
This is called by a consumer when it is ready to deal with a new SolrInputDocument.void
startElement(String uri, String localName, String qName, Attributes attributes)
-
Methods inherited from class org.xml.sax.helpers.DefaultHandler
endDocument, endPrefixMapping, error, fatalError, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning
-
-
-
-
Field Detail
-
contentFieldName
public static final String contentFieldName
- See Also:
- Constant Field Values
-
document
protected final org.apache.solr.common.SolrInputDocument document
-
metadata
protected final org.apache.tika.metadata.Metadata metadata
-
params
protected final org.apache.solr.common.params.SolrParams params
-
catchAllBuilder
protected final StringBuilder catchAllBuilder
-
schema
protected final IndexSchema schema
-
fieldBuilders
protected final Map<String,StringBuilder> fieldBuilders
-
captureAttribs
protected final boolean captureAttribs
-
lowerNames
protected final boolean lowerNames
-
unknownFieldPrefix
protected final String unknownFieldPrefix
-
defaultField
protected final String defaultField
-
-
Constructor Detail
-
SolrContentHandler
public SolrContentHandler(org.apache.tika.metadata.Metadata metadata, org.apache.solr.common.params.SolrParams params, IndexSchema schema)
-
-
Method Detail
-
newDocument
public org.apache.solr.common.SolrInputDocument newDocument()
This is called by a consumer when it is ready to deal with a new SolrInputDocument. Overriding classes can use this hook to add in or change whatever they deem fit for the document at that time. The base implementation adds the metadata as fields, allowing for potential remapping.- Returns:
- The
SolrInputDocument
. - See Also:
addMetadata()
,addCapturedContent()
,addContent()
,addLiterals()
-
addCapturedContent
protected void addCapturedContent()
Add the per field captured content to the Solr Document. Default implementation uses thefieldBuilders
info
-
addContent
protected void addContent()
Add in the catch all content to the field. Default impl. uses thecontentFieldName
and thecatchAllBuilder
-
addLiterals
protected void addLiterals()
Add in the literals to the document using theparams
and theExtractingParams.LITERALS_PREFIX
.
-
addMetadata
protected void addMetadata()
Add in any metadata usingmetadata
as the source.
-
startElement
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException
- Specified by:
startElement
in interfaceContentHandler
- Overrides:
startElement
in classDefaultHandler
- Throws:
SAXException
-
endElement
public void endElement(String uri, String localName, String qName) throws SAXException
- Specified by:
endElement
in interfaceContentHandler
- Overrides:
endElement
in classDefaultHandler
- Throws:
SAXException
-
characters
public void characters(char[] chars, int offset, int length) throws SAXException
- Specified by:
characters
in interfaceContentHandler
- Overrides:
characters
in classDefaultHandler
- Throws:
SAXException
-
ignorableWhitespace
public void ignorableWhitespace(char[] chars, int offset, int length) throws SAXException
Treat the same as any other characters- Specified by:
ignorableWhitespace
in interfaceContentHandler
- Overrides:
ignorableWhitespace
in classDefaultHandler
- Throws:
SAXException
-
-