public interface ExtractingParams
Modifier and Type | Field and Description |
---|---|
static String |
BOOST_PREFIX
The boost value for the name of the field.
|
static String |
CAPTURE_ATTRIBUTES
Capture attributes separately according to the name of the element, instead of just adding them to the string buffer
|
static String |
CAPTURE_ELEMENTS
Capture the specified fields (and everything included below it that isn't capture by some other capture field) separately from the default.
|
static String |
DEFAULT_FIELD
Optional.
|
static String |
EXTRACT_FORMAT
Content output format if extractOnly is true.
|
static String |
EXTRACT_ONLY
Only extract and return the content, do not index it.
|
static String |
IGNORE_TIKA_EXCEPTION
if true, ignore TikaException (give up to extract text but index meta data)
|
static String |
LITERALS_PREFIX
Pass in literal values to be added to the document, as in
literal.myField=Foo
|
static String |
LOWERNAMES
Map all generated attribute names to field names with lowercase and underscores.
|
static String |
MAP_PREFIX
The param prefix for mapping Tika metadata to Solr fields.
|
static String |
RESOURCE_NAME
Optional.
|
static String |
STREAM_TYPE
The type of the stream.
|
static String |
UNKNOWN_FIELD_PREFIX
Optional.
|
static String |
XPATH_EXPRESSION
Restrict the extracted parts of a document to be indexed
by passing in an XPath expression.
|
static final String LOWERNAMES
static final String IGNORE_TIKA_EXCEPTION
static final String MAP_PREFIX
fmap.title=solr.titleIn this example, the tika "title" metadata value will be added to a Solr field named "solr.title"
static final String BOOST_PREFIX
map.title=solr.title boost.solr.title=2.5will boost the solr.title field for this document by 2.5
static final String LITERALS_PREFIX
literal.myField=Foo
static final String XPATH_EXPRESSION
SolrContentHandler
.
See Tika's docs for what the extracted document looks like.
CAPTURE_ELEMENTS
,
Constant Field Valuesstatic final String EXTRACT_ONLY
static final String EXTRACT_FORMAT
static final String CAPTURE_ATTRIBUTES
static final String CAPTURE_ELEMENTS
SolrContentHandler
by Tika, not to be confused by the mapped field. The field name can then
be mapped into the index schema.
For instance, a Tika document may look like:
<html> ... <body> <p>some text here. <div>more text</div></p> Some more text </body>By passing in the p tag, you could capture all P tags separately from the rest of the t Thus, in the example, the capture of the P tag would be: "some text here. more text"
static final String STREAM_TYPE
static final String RESOURCE_NAME
static final String UNKNOWN_FIELD_PREFIX
static final String DEFAULT_FIELD