Class PhrasesIdentificationComponent
- java.lang.Object
-
- org.apache.solr.handler.component.SearchComponent
-
- org.apache.solr.handler.component.PhrasesIdentificationComponent
-
- All Implemented Interfaces:
AutoCloseable,SolrInfoBean,SolrMetricProducer,NamedListInitializedPlugin
public class PhrasesIdentificationComponent extends SearchComponent
A component that can be used in isolation, or in conjunction withQueryComponentto identify & score "phrases" found in the input string, based on shingles in indexed fields.The most common way to use this component is in conjunction with field that use
ShingleFilterFactoryon both theindexandqueryanalyzers. An example field type configuration would be something like this...<fieldType name="phrases" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="3" outputUnigrams="true"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="7" outputUnigramsIfNoShingles="true" outputUnigrams="true"/> </analyzer> </fieldType>...where the
queryanalyzer'smaxShingleSize="7"determines the maximum possible phrase length that can be hueristically deduced, theindexanalyzer'smaxShingleSize="3"determines the accuracy of phrases identified. The large the indexedmaxShingleSizethe higher the accuracy. Both analyzers must includeminShingleSize="2" outputUnigrams="true".With a field type like this, one or more fields can be specified (with weights) via a
phrases.fieldsparam to request that this component identify possible phrases in the inputqparam, or an alternativephrases.qoverride param. The identified phrases will include their scores relative each field specified, as well an overal weighted score based on the field weights provided by the client. Higher score values indicate a greater confidence in the Phrase.NOTE: In a distributed request, this component uses a single phase (piggy backing on the
ShardRequest.PURPOSE_GET_TOP_IDSgenerated byQueryComponentif it is in use) to collect all field & shingle stats. No "refinement" requests are used.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classPhrasesIdentificationComponent.PhraseModel the data known about a single (candidate) Phrase -- which may or may not be indexedstatic classPhrasesIdentificationComponent.PhrasesContextDataSimple container for all request options and data this component needs to store in the Request Context-
Nested classes/interfaces inherited from interface org.apache.solr.core.SolrInfoBean
SolrInfoBean.Category, SolrInfoBean.Group
-
-
Field Summary
Fields Modifier and Type Field Description static StringCOMPONENT_NAMEName, also used as a request param to identify whether the user query concerns this componentstatic StringPHRASE_ANALYSIS_FIELDstatic StringPHRASE_FIELDSstatic StringPHRASE_INDEX_MAXLENstatic StringPHRASE_INPUTstatic StringPHRASE_QUERY_MAXLENstatic StringPHRASE_SUMMARY_POSTstatic StringPHRASE_SUMMARY_PREstatic intSHARD_PURPOSEThe only shard purpose that will cause this component to do work & return data during shard req-
Fields inherited from class org.apache.solr.handler.component.SearchComponent
solrMetricsContext, standard_components
-
-
Constructor Summary
Constructors Constructor Description PhrasesIdentificationComponent()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description intdistributedProcess(ResponseBuilder rb)Process for a distributed search.voidfinishStage(ResponseBuilder rb)Called after all responses have been received for this stage.StringgetDescription()Simple one or two line descriptionstatic intgetMaxShingleSize(org.apache.lucene.analysis.Analyzer analyzer)Helper method, public for testing purposes only.voidprepare(ResponseBuilder rb)Prepare the response.voidprocess(ResponseBuilder rb)Process the request for this component-
Methods inherited from class org.apache.solr.handler.component.SearchComponent
getCategory, getName, getSolrMetricsContext, handleResponses, initializeMetrics, modifyRequest, setName
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.solr.util.plugin.NamedListInitializedPlugin
init
-
Methods inherited from interface org.apache.solr.metrics.SolrMetricProducer
close
-
-
-
-
Field Detail
-
SHARD_PURPOSE
public static final int SHARD_PURPOSE
The only shard purpose that will cause this component to do work & return data during shard req- See Also:
- Constant Field Values
-
COMPONENT_NAME
public static final String COMPONENT_NAME
Name, also used as a request param to identify whether the user query concerns this component- See Also:
- Constant Field Values
-
PHRASE_INPUT
public static final String PHRASE_INPUT
- See Also:
- Constant Field Values
-
PHRASE_FIELDS
public static final String PHRASE_FIELDS
- See Also:
- Constant Field Values
-
PHRASE_ANALYSIS_FIELD
public static final String PHRASE_ANALYSIS_FIELD
- See Also:
- Constant Field Values
-
PHRASE_SUMMARY_PRE
public static final String PHRASE_SUMMARY_PRE
- See Also:
- Constant Field Values
-
PHRASE_SUMMARY_POST
public static final String PHRASE_SUMMARY_POST
- See Also:
- Constant Field Values
-
PHRASE_INDEX_MAXLEN
public static final String PHRASE_INDEX_MAXLEN
- See Also:
- Constant Field Values
-
PHRASE_QUERY_MAXLEN
public static final String PHRASE_QUERY_MAXLEN
- See Also:
- Constant Field Values
-
-
Method Detail
-
prepare
public void prepare(ResponseBuilder rb) throws IOException
Description copied from class:SearchComponentPrepare the response. Guaranteed to be called before any SearchComponentSearchComponent.process(org.apache.solr.handler.component.ResponseBuilder)method. Called for every incoming request.The place to do initialization that is request dependent.
- Specified by:
preparein classSearchComponent- Parameters:
rb- TheResponseBuilder- Throws:
IOException- If there is a low-level I/O error.
-
distributedProcess
public int distributedProcess(ResponseBuilder rb)
Description copied from class:SearchComponentProcess for a distributed search.- Overrides:
distributedProcessin classSearchComponent- Returns:
- the next stage for this component
-
finishStage
public void finishStage(ResponseBuilder rb)
Description copied from class:SearchComponentCalled after all responses have been received for this stage. Useful when different requests are sent to each shard.- Overrides:
finishStagein classSearchComponent
-
process
public void process(ResponseBuilder rb) throws IOException
Description copied from class:SearchComponentProcess the request for this component- Specified by:
processin classSearchComponent- Parameters:
rb- TheResponseBuilder- Throws:
IOException- If there is a low-level I/O error.
-
getDescription
public String getDescription()
Description copied from interface:SolrInfoBeanSimple one or two line description- Specified by:
getDescriptionin interfaceSolrInfoBean- Specified by:
getDescriptionin classSearchComponent
-
getMaxShingleSize
public static int getMaxShingleSize(org.apache.lucene.analysis.Analyzer analyzer)
Helper method, public for testing purposes only.Given an analyzer, inspects it to determine if:
- it is a
TokenizerChain - it contains exactly one instance of
ShingleFilterFactory
If these these conditions are met, then this method returns the
maxShingleSizein effect for this analyzer, otherwise returns -1.- Parameters:
analyzer- An analyzer inspect- Returns:
maxShingleSizeif available- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
- it is a
-
-