Class PhrasesIdentificationComponent
- All Implemented Interfaces:
AutoCloseable,SolrInfoBean,SolrMetricProducer,NamedListInitializedPlugin
QueryComponent to
identify & score "phrases" found in the input string, based on shingles in indexed fields.
The most common way to use this component is in conjunction with field that use ShingleFilterFactory on both the index and query analyzers. An example
field type configuration would be something like this...
<fieldType name="phrases" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="3" outputUnigrams="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="7" outputUnigramsIfNoShingles="true" outputUnigrams="true"/>
</analyzer>
</fieldType>
...where the query analyzer's maxShingleSize="7" determines the
maximum possible phrase length that can be hueristically deduced, the index
analyzer's maxShingleSize="3" determines the accuracy of phrases identified. The
large the indexed maxShingleSize the higher the accuracy. Both analyzers must
include minShingleSize="2" outputUnigrams="true".
With a field type like this, one or more fields can be specified (with weights) via a
phrases.fields param to request that this component identify possible phrases in the input
q param, or an alternative phrases.q override param. The identified
phrases will include their scores relative each field specified, as well an overal weighted score
based on the field weights provided by the client. Higher score values indicate a greater
confidence in the Phrase.
NOTE: In a distributed request, this component uses a single phase (piggy backing on
the ShardRequest.PURPOSE_GET_TOP_IDS generated by QueryComponent if it is in use)
to collect all field & shingle stats. No "refinement" requests are used.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final classModel the data known about a single (candidate) Phrase -- which may or may not be indexedstatic final classSimple container for all request options and data this component needs to store in the Request ContextNested classes/interfaces inherited from interface org.apache.solr.core.SolrInfoBean
SolrInfoBean.Category, SolrInfoBean.Group -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringName, also used as a request param to identify whether the user query concerns this componentstatic final Stringstatic final Stringstatic final Stringstatic final Stringstatic final Stringstatic final Stringstatic final Stringstatic final intThe only shard purpose that will cause this component to do work & return data during shard reqFields inherited from class org.apache.solr.handler.component.SearchComponent
solrMetricsContext, standard_componentsFields inherited from interface org.apache.solr.metrics.SolrMetricProducer
CATEGORY_ATTR, HANDLER_ATTR, NAME_ATTR, OPERATION_ATTR, PLUGIN_NAME_ATTR, RESULT_ATTR, TYPE_ATTR -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionintProcess for a distributed search.voidCalled after all responses have been received for this stage.Simple one or two line descriptionstatic intgetMaxShingleSize(org.apache.lucene.analysis.Analyzer analyzer) Helper method, public for testing purposes only.voidPrepare the response.voidProcess the request for this componentMethods inherited from class org.apache.solr.handler.component.SearchComponent
getCategory, getName, getSolrMetricsContext, handleResponses, initializeMetrics, modifyRequest, setNameMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.solr.util.plugin.NamedListInitializedPlugin
initMethods inherited from interface org.apache.solr.metrics.SolrMetricProducer
close
-
Field Details
-
SHARD_PURPOSE
public static final int SHARD_PURPOSEThe only shard purpose that will cause this component to do work & return data during shard req- See Also:
-
COMPONENT_NAME
Name, also used as a request param to identify whether the user query concerns this component- See Also:
-
PHRASE_INPUT
- See Also:
-
PHRASE_FIELDS
- See Also:
-
PHRASE_ANALYSIS_FIELD
- See Also:
-
PHRASE_SUMMARY_PRE
- See Also:
-
PHRASE_SUMMARY_POST
- See Also:
-
PHRASE_INDEX_MAXLEN
- See Also:
-
PHRASE_QUERY_MAXLEN
- See Also:
-
-
Constructor Details
-
PhrasesIdentificationComponent
public PhrasesIdentificationComponent()
-
-
Method Details
-
prepare
Description copied from class:SearchComponentPrepare the response. Guaranteed to be called before any SearchComponentSearchComponent.process(org.apache.solr.handler.component.ResponseBuilder)method. Called for every incoming request.The place to do initialization that is request dependent.
- Specified by:
preparein classSearchComponent- Parameters:
rb- TheResponseBuilder- Throws:
IOException- If there is a low-level I/O error.
-
distributedProcess
Description copied from class:SearchComponentProcess for a distributed search.- Overrides:
distributedProcessin classSearchComponent- Returns:
- the next stage for this component
-
finishStage
Description copied from class:SearchComponentCalled after all responses have been received for this stage. Useful when different requests are sent to each shard.- Overrides:
finishStagein classSearchComponent
-
process
Description copied from class:SearchComponentProcess the request for this component- Specified by:
processin classSearchComponent- Parameters:
rb- TheResponseBuilder- Throws:
IOException- If there is a low-level I/O error.
-
getDescription
Description copied from interface:SolrInfoBeanSimple one or two line description- Specified by:
getDescriptionin interfaceSolrInfoBean- Specified by:
getDescriptionin classSearchComponent
-
getMaxShingleSize
public static int getMaxShingleSize(org.apache.lucene.analysis.Analyzer analyzer) Helper method, public for testing purposes only.Given an analyzer, inspects it to determine if:
- it is a
TokenizerChain - it contains exactly one instance of
ShingleFilterFactory
If these these conditions are met, then this method returns the
maxShingleSizein effect for this analyzer, otherwise returns -1.- Parameters:
analyzer- An analyzer inspect- Returns:
maxShingleSizeif available- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
- it is a
-