Package org.apache.solr.handler.tagger
Class Tagger
- java.lang.Object
-
- org.apache.solr.handler.tagger.Tagger
-
public abstract class Tagger extends Object
Tags maximum string of words in a corpus. This is a callback-style API in which you implementtagCallback(int, int, Object).This class should be independently usable outside Solr.
-
-
Constructor Summary
Constructors Constructor Description Tagger(org.apache.lucene.index.Terms terms, org.apache.lucene.util.Bits liveDocs, org.apache.lucene.analysis.TokenStream tokenStream, TagClusterReducer tagClusterReducer, boolean skipAltTokens, boolean ignoreStopWords)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description voidenableDocIdsCache(int initSize)protected org.apache.lucene.util.IntsReflookupDocIds(Object docIdsKey)Returns a sorted array of integer docIds given the corresponding key.voidprocess()protected abstract voidtagCallback(int startOffset, int endOffset, Object docIdsKey)Invoked byprocess()for each tag found.
-
-
-
Constructor Detail
-
Tagger
public Tagger(org.apache.lucene.index.Terms terms, org.apache.lucene.util.Bits liveDocs, org.apache.lucene.analysis.TokenStream tokenStream, TagClusterReducer tagClusterReducer, boolean skipAltTokens, boolean ignoreStopWords) throws IOException- Throws:
IOException
-
-
Method Detail
-
enableDocIdsCache
public void enableDocIdsCache(int initSize)
-
process
public void process() throws IOException- Throws:
IOException
-
tagCallback
protected abstract void tagCallback(int startOffset, int endOffset, Object docIdsKey)Invoked byprocess()for each tag found. endOffset is always >= the endOffset given in the previous call.- Parameters:
startOffset- The character offset of the original stream where the tag starts.endOffset- One more than the character offset of the original stream where the tag ends.docIdsKey- A reference to the matching docIds that can be resolved vialookupDocIds(Object).
-
lookupDocIds
protected org.apache.lucene.util.IntsRef lookupDocIds(Object docIdsKey)
Returns a sorted array of integer docIds given the corresponding key.- Parameters:
docIdsKey- The lookup key.- Returns:
- Not null
-
-