Class PhrasesIdentificationComponent
- java.lang.Object
-
- org.apache.solr.handler.component.SearchComponent
-
- org.apache.solr.handler.component.PhrasesIdentificationComponent
-
- All Implemented Interfaces:
AutoCloseable
,SolrInfoBean
,SolrMetricProducer
,NamedListInitializedPlugin
public class PhrasesIdentificationComponent extends SearchComponent
A component that can be used in isolation, or in conjunction withQueryComponent
to identify & score "phrases" found in the input string, based on shingles in indexed fields.The most common way to use this component is in conjunction with field that use
ShingleFilterFactory
on both theindex
andquery
analyzers. An example field type configuration would be something like this...<fieldType name="phrases" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="3" outputUnigrams="true"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="7" outputUnigramsIfNoShingles="true" outputUnigrams="true"/> </analyzer> </fieldType>
...where the
query
analyzer'smaxShingleSize="7"
determines the maximum possible phrase length that can be hueristically deduced, theindex
analyzer'smaxShingleSize="3"
determines the accuracy of phrases identified. The large the indexedmaxShingleSize
the higher the accuracy. Both analyzers must includeminShingleSize="2" outputUnigrams="true"
.With a field type like this, one or more fields can be specified (with weights) via a
phrases.fields
param to request that this component identify possible phrases in the inputq
param, or an alternativephrases.q
override param. The identified phrases will include their scores relative each field specified, as well an overal weighted score based on the field weights provided by the client. Higher score values indicate a greater confidence in the Phrase.NOTE: In a distributed request, this component uses a single phase (piggy backing on the
ShardRequest.PURPOSE_GET_TOP_IDS
generated byQueryComponent
if it is in use) to collect all field & shingle stats. No "refinement" requests are used.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
PhrasesIdentificationComponent.Phrase
Model the data known about a single (candidate) Phrase -- which may or may not be indexedstatic class
PhrasesIdentificationComponent.PhrasesContextData
Simple container for all request options and data this component needs to store in the Request Context-
Nested classes/interfaces inherited from interface org.apache.solr.core.SolrInfoBean
SolrInfoBean.Category, SolrInfoBean.Group
-
-
Field Summary
Fields Modifier and Type Field Description static String
COMPONENT_NAME
Name, also used as a request param to identify whether the user query concerns this componentstatic String
PHRASE_ANALYSIS_FIELD
static String
PHRASE_FIELDS
static String
PHRASE_INDEX_MAXLEN
static String
PHRASE_INPUT
static String
PHRASE_QUERY_MAXLEN
static String
PHRASE_SUMMARY_POST
static String
PHRASE_SUMMARY_PRE
static int
SHARD_PURPOSE
The only shard purpose that will cause this component to do work & return data during shard req-
Fields inherited from class org.apache.solr.handler.component.SearchComponent
solrMetricsContext, standard_components
-
-
Constructor Summary
Constructors Constructor Description PhrasesIdentificationComponent()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description int
distributedProcess(ResponseBuilder rb)
Process for a distributed search.void
finishStage(ResponseBuilder rb)
Called after all responses have been received for this stage.String
getDescription()
Simple one or two line descriptionstatic int
getMaxShingleSize(org.apache.lucene.analysis.Analyzer analyzer)
Helper method, public for testing purposes only.void
prepare(ResponseBuilder rb)
Prepare the response.void
process(ResponseBuilder rb)
Process the request for this component-
Methods inherited from class org.apache.solr.handler.component.SearchComponent
getCategory, getName, getSolrMetricsContext, handleResponses, initializeMetrics, modifyRequest, setName
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.solr.util.plugin.NamedListInitializedPlugin
init
-
Methods inherited from interface org.apache.solr.metrics.SolrMetricProducer
close
-
-
-
-
Field Detail
-
SHARD_PURPOSE
public static final int SHARD_PURPOSE
The only shard purpose that will cause this component to do work & return data during shard req- See Also:
- Constant Field Values
-
COMPONENT_NAME
public static final String COMPONENT_NAME
Name, also used as a request param to identify whether the user query concerns this component- See Also:
- Constant Field Values
-
PHRASE_INPUT
public static final String PHRASE_INPUT
- See Also:
- Constant Field Values
-
PHRASE_FIELDS
public static final String PHRASE_FIELDS
- See Also:
- Constant Field Values
-
PHRASE_ANALYSIS_FIELD
public static final String PHRASE_ANALYSIS_FIELD
- See Also:
- Constant Field Values
-
PHRASE_SUMMARY_PRE
public static final String PHRASE_SUMMARY_PRE
- See Also:
- Constant Field Values
-
PHRASE_SUMMARY_POST
public static final String PHRASE_SUMMARY_POST
- See Also:
- Constant Field Values
-
PHRASE_INDEX_MAXLEN
public static final String PHRASE_INDEX_MAXLEN
- See Also:
- Constant Field Values
-
PHRASE_QUERY_MAXLEN
public static final String PHRASE_QUERY_MAXLEN
- See Also:
- Constant Field Values
-
-
Method Detail
-
prepare
public void prepare(ResponseBuilder rb) throws IOException
Description copied from class:SearchComponent
Prepare the response. Guaranteed to be called before any SearchComponentSearchComponent.process(org.apache.solr.handler.component.ResponseBuilder)
method. Called for every incoming request.The place to do initialization that is request dependent.
- Specified by:
prepare
in classSearchComponent
- Parameters:
rb
- TheResponseBuilder
- Throws:
IOException
- If there is a low-level I/O error.
-
distributedProcess
public int distributedProcess(ResponseBuilder rb)
Description copied from class:SearchComponent
Process for a distributed search.- Overrides:
distributedProcess
in classSearchComponent
- Returns:
- the next stage for this component
-
finishStage
public void finishStage(ResponseBuilder rb)
Description copied from class:SearchComponent
Called after all responses have been received for this stage. Useful when different requests are sent to each shard.- Overrides:
finishStage
in classSearchComponent
-
process
public void process(ResponseBuilder rb) throws IOException
Description copied from class:SearchComponent
Process the request for this component- Specified by:
process
in classSearchComponent
- Parameters:
rb
- TheResponseBuilder
- Throws:
IOException
- If there is a low-level I/O error.
-
getDescription
public String getDescription()
Description copied from interface:SolrInfoBean
Simple one or two line description- Specified by:
getDescription
in interfaceSolrInfoBean
- Specified by:
getDescription
in classSearchComponent
-
getMaxShingleSize
public static int getMaxShingleSize(org.apache.lucene.analysis.Analyzer analyzer)
Helper method, public for testing purposes only.Given an analyzer, inspects it to determine if:
- it is a
TokenizerChain
- it contains exactly one instance of
ShingleFilterFactory
If these these conditions are met, then this method returns the
maxShingleSize
in effect for this analyzer, otherwise returns -1.- Parameters:
analyzer
- An analyzer inspect- Returns:
maxShingleSize
if available- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
- it is a
-
-