public class SolrContentHandler extends DefaultHandler implements ExtractingParams
SolrInputDocuments.
This class is not thread-safe.
User's may wish to override this class to provide their own functionality.| Modifier and Type | Field and Description |
|---|---|
protected boolean |
captureAttribs |
protected StringBuilder |
catchAllBuilder |
protected String |
contentFieldName |
protected Collection<String> |
dateFormats |
protected String |
defaultField |
protected SolrInputDocument |
document |
protected Map<String,StringBuilder> |
fieldBuilders |
protected boolean |
lowerNames |
protected org.apache.tika.metadata.Metadata |
metadata |
protected SolrParams |
params |
protected IndexSchema |
schema |
protected String |
unknownFieldPrefix |
BOOST_PREFIX, CAPTURE_ATTRIBUTES, CAPTURE_ELEMENTS, DEFAULT_FIELD, EXTRACT_FORMAT, EXTRACT_ONLY, IGNORE_TIKA_EXCEPTION, LITERALS_PREFIX, LOWERNAMES, MAP_PREFIX, RESOURCE_NAME, STREAM_TYPE, UNKNOWN_FIELD_PREFIX, XPATH_EXPRESSION| Constructor and Description |
|---|
SolrContentHandler(org.apache.tika.metadata.Metadata metadata,
SolrParams params,
IndexSchema schema) |
SolrContentHandler(org.apache.tika.metadata.Metadata metadata,
SolrParams params,
IndexSchema schema,
Collection<String> dateFormats) |
| Modifier and Type | Method and Description |
|---|---|
protected void |
addCapturedContent()
Add the per field captured content to the Solr Document.
|
protected void |
addContent()
Add in the catch all content to the field.
|
protected void |
addField(String fname,
String fval,
String[] vals) |
protected void |
addLiterals()
Add in the literals to the document using the
params and the ExtractingParams.LITERALS_PREFIX. |
protected void |
addMetadata()
Add in any metadata using
metadata as the source. |
void |
characters(char[] chars,
int offset,
int length) |
void |
endElement(String uri,
String localName,
String qName) |
protected String |
findMappedName(String name)
Get the name mapping
|
protected float |
getBoost(String name)
Get the value of any boost factor for the mapped name.
|
SolrInputDocument |
newDocument()
This is called by a consumer when it is ready to deal with a new SolrInputDocument.
|
void |
startDocument() |
void |
startElement(String uri,
String localName,
String qName,
Attributes attributes) |
protected String |
transformValue(String val,
SchemaField schFld)
Can be used to transform input values based on their
SchemaField
This implementation only formats dates using the DateUtil. |
endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startPrefixMapping, unparsedEntityDecl, warningprotected SolrInputDocument document
protected Collection<String> dateFormats
protected org.apache.tika.metadata.Metadata metadata
protected SolrParams params
protected StringBuilder catchAllBuilder
protected IndexSchema schema
protected Map<String,StringBuilder> fieldBuilders
protected boolean captureAttribs
protected boolean lowerNames
protected String contentFieldName
protected String unknownFieldPrefix
protected String defaultField
public SolrContentHandler(org.apache.tika.metadata.Metadata metadata,
SolrParams params,
IndexSchema schema)
public SolrContentHandler(org.apache.tika.metadata.Metadata metadata,
SolrParams params,
IndexSchema schema,
Collection<String> dateFormats)
public SolrInputDocument newDocument()
SolrInputDocument.addMetadata(),
addCapturedContent(),
addContent(),
addLiterals()protected void addCapturedContent()
fieldBuilders infoprotected void addContent()
contentFieldName
and the catchAllBuilderprotected void addLiterals()
params and the ExtractingParams.LITERALS_PREFIX.protected void addMetadata()
metadata as the source.public void startDocument()
throws SAXException
startDocument in interface ContentHandlerstartDocument in class DefaultHandlerSAXExceptionpublic void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException
startElement in interface ContentHandlerstartElement in class DefaultHandlerSAXExceptionpublic void endElement(String uri, String localName, String qName) throws SAXException
endElement in interface ContentHandlerendElement in class DefaultHandlerSAXExceptionpublic void characters(char[] chars,
int offset,
int length)
throws SAXException
characters in interface ContentHandlercharacters in class DefaultHandlerSAXExceptionprotected String transformValue(String val, SchemaField schFld)
SchemaField
This implementation only formats dates using the DateUtil.val - The value to transformschFld - The SchemaFieldprotected float getBoost(String name)
name - The name of the field to see if there is a boost specified