Class PhrasesIdentificationComponent.Phrase
- java.lang.Object
-
- org.apache.solr.handler.component.PhrasesIdentificationComponent.Phrase
-
- Enclosing class:
- PhrasesIdentificationComponent
public static final class PhrasesIdentificationComponent.Phrase extends Object
Model the data known about a single (candidate) Phrase -- which may or may not be indexed- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static List<PhrasesIdentificationComponent.Phrase>
extractPhrases(String input, SchemaField analysisField, int maxIndexedPositionLength, int maxQueryPositionLength)
Factory method for constructing a list of Phrases given the specified input and using the analyzer for the specified field.static List<org.apache.solr.common.util.NamedList<Object>>
formatShardResponse(List<PhrasesIdentificationComponent.Phrase> phrases)
Format the phrases suitable for returning in a shard responselong
getConjunctionDocCount(String field)
Returns the number of documents that contain all of thegetIndividualIndexedTerms()
that make up this Phrase, in the specified field.org.apache.solr.common.util.NamedList<Object>
getDetails()
long
getDocFreq(String field)
Returns the number of documents that contain this (indexed) Phrase as term in the specified field.double
getFieldScore(String field)
Returns the score for this Phrase in this given field.List<PhrasesIdentificationComponent.Phrase>
getIndexedSuperPhrases()
Returns all phrases larger then this phrase, which fully include this phrase, and are indexed.List<PhrasesIdentificationComponent.Phrase>
getIndividualIndexedTerms()
Returns the list of "individual" (ie:getPositionLength()==1
terms.List<PhrasesIdentificationComponent.Phrase>
getLargestIndexedSubPhrases()
Returns the list of (overlapping) sub phrases that have the largest possible size based on the effective value ofPhrasesIdentificationComponent.PhrasesContextData.maxIndexedPositionLength
.int
getOffsetEnd()
int
getOffsetStart()
int
getPositionEnd()
NOTE: positions start at '1'int
getPositionLength()
BitSet
getPositionsBitSet()
Each set bit identifies a position filled by this Phraseint
getPositionStart()
NOTE: positions start at '1'CharSequence
getSubSequence()
The characters from the original input that corrispond with this Phrasedouble
getTotalScore()
Returns the overall score for this Phrase.long
getTTF(String field)
Returns the number of total TTF of this (indexed) Phrase as term in the specified field.static void
populateScores(List<PhrasesIdentificationComponent.Phrase> phrases, Map<String,Double> fieldWeights, int maxIndexedPositionLength, int maxQueryPositionLength)
Public for testing purposesstatic void
populateScores(PhrasesIdentificationComponent.PhrasesContextData contextData)
Uses the previously popuated stats to populate each Phrase with it's scores for the specified fields, and it's over all (weighted) total score.static void
populateStats(List<PhrasesIdentificationComponent.Phrase> phrases, Collection<String> fieldNames, SolrIndexSearcher searcher)
Populates the phrases with stats from the local index for the specified fieldsstatic void
populateStats(List<PhrasesIdentificationComponent.Phrase> phrases, List<org.apache.solr.common.util.NamedList<Object>> shardData)
Populates the phrases with (merged) stats from a remote shardString
toString()
-
-
-
Method Detail
-
extractPhrases
public static List<PhrasesIdentificationComponent.Phrase> extractPhrases(String input, SchemaField analysisField, int maxIndexedPositionLength, int maxQueryPositionLength)
Factory method for constructing a list of Phrases given the specified input and using the analyzer for the specified field. ThemaxIndexedPositionLength
andmaxQueryPositionLength
provided *must* match the effective values used by respective analyzers.
-
formatShardResponse
public static List<org.apache.solr.common.util.NamedList<Object>> formatShardResponse(List<PhrasesIdentificationComponent.Phrase> phrases)
Format the phrases suitable for returning in a shard response- See Also:
populateStats(List,List)
-
populateStats
public static void populateStats(List<PhrasesIdentificationComponent.Phrase> phrases, List<org.apache.solr.common.util.NamedList<Object>> shardData)
Populates the phrases with (merged) stats from a remote shard
-
populateStats
public static void populateStats(List<PhrasesIdentificationComponent.Phrase> phrases, Collection<String> fieldNames, SolrIndexSearcher searcher) throws IOException
Populates the phrases with stats from the local index for the specified fields- Throws:
IOException
-
populateScores
public static void populateScores(PhrasesIdentificationComponent.PhrasesContextData contextData)
Uses the previously popuated stats to populate each Phrase with it's scores for the specified fields, and it's over all (weighted) total score. This is not needed on shard requests.
-
populateScores
public static void populateScores(List<PhrasesIdentificationComponent.Phrase> phrases, Map<String,Double> fieldWeights, int maxIndexedPositionLength, int maxQueryPositionLength)
Public for testing purposes- See Also:
populateScores(PhrasesIdentificationComponent.PhrasesContextData)
- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
-
getDetails
public org.apache.solr.common.util.NamedList<Object> getDetails()
-
getSubSequence
public CharSequence getSubSequence()
The characters from the original input that corrispond with this Phrase
-
getIndividualIndexedTerms
public List<PhrasesIdentificationComponent.Phrase> getIndividualIndexedTerms()
Returns the list of "individual" (ie:getPositionLength()==1
terms. NOTE: Indexed phrases of length 1 are the (sole) individual terms of themselves
-
getLargestIndexedSubPhrases
public List<PhrasesIdentificationComponent.Phrase> getLargestIndexedSubPhrases()
Returns the list of (overlapping) sub phrases that have the largest possible size based on the effective value ofPhrasesIdentificationComponent.PhrasesContextData.maxIndexedPositionLength
. NOTE: Indexed phrases of length less then the max indexed length are the (sole) largest sub-phrases of themselves.
-
getIndexedSuperPhrases
public List<PhrasesIdentificationComponent.Phrase> getIndexedSuperPhrases()
Returns all phrases larger then this phrase, which fully include this phrase, and are indexed. NOTE: A Phrase is never the super phrase of itself.
-
getPositionStart
public int getPositionStart()
NOTE: positions start at '1'
-
getPositionEnd
public int getPositionEnd()
NOTE: positions start at '1'
-
getPositionLength
public int getPositionLength()
-
getPositionsBitSet
public BitSet getPositionsBitSet()
Each set bit identifies a position filled by this Phrase
-
getOffsetStart
public int getOffsetStart()
-
getOffsetEnd
public int getOffsetEnd()
-
getTotalScore
public double getTotalScore()
Returns the overall score for this Phrase. In the current implementation, the only garuntee made regarding the range of possible values is that 0 (or less) means it is not a good phrase.- Returns:
- A numeric value indicating the confidence in this Phrase, higher numbers are higher confidence.
-
getFieldScore
public double getFieldScore(String field)
Returns the score for this Phrase in this given field. In the current implementation, the only garuntee made regarding the range of possible values is that 0 (or less) means it is not a good phrase.- Returns:
- A numeric value indicating the confidence in this Phrase for this field, higher numbers are higher confidence.
-
getTTF
public long getTTF(String field)
Returns the number of total TTF of this (indexed) Phrase as term in the specified field. NOTE: behavior of calling this method is undefined unless one of thepopulateStats(java.util.List<org.apache.solr.handler.component.PhrasesIdentificationComponent.Phrase>, java.util.List<org.apache.solr.common.util.NamedList<java.lang.Object>>)
methods has been called with this field.
-
getConjunctionDocCount
public long getConjunctionDocCount(String field)
Returns the number of documents that contain all of thegetIndividualIndexedTerms()
that make up this Phrase, in the specified field. NOTE: behavior of calling this method is undefined unless one of thepopulateStats(java.util.List<org.apache.solr.handler.component.PhrasesIdentificationComponent.Phrase>, java.util.List<org.apache.solr.common.util.NamedList<java.lang.Object>>)
methods has been called with this field.
-
getDocFreq
public long getDocFreq(String field)
Returns the number of documents that contain this (indexed) Phrase as term in the specified field. NOTE: behavior of calling this method is undefined unless one of thepopulateStats(java.util.List<org.apache.solr.handler.component.PhrasesIdentificationComponent.Phrase>, java.util.List<org.apache.solr.common.util.NamedList<java.lang.Object>>)
methods has been called with this field.
-
-