Package org.apache.solr.handler.tagger
Class Tagger
- java.lang.Object
-
- org.apache.solr.handler.tagger.Tagger
-
public abstract class Tagger extends Object
Tags maximum string of words in a corpus. This is a callback-style API in which you implementtagCallback(int, int, Object)
.This class should be independently usable outside Solr.
-
-
Constructor Summary
Constructors Constructor Description Tagger(org.apache.lucene.index.Terms terms, org.apache.lucene.util.Bits liveDocs, org.apache.lucene.analysis.TokenStream tokenStream, TagClusterReducer tagClusterReducer, boolean skipAltTokens, boolean ignoreStopWords)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description void
enableDocIdsCache(int initSize)
protected org.apache.lucene.util.IntsRef
lookupDocIds(Object docIdsKey)
Returns a sorted array of integer docIds given the corresponding key.void
process()
protected abstract void
tagCallback(int startOffset, int endOffset, Object docIdsKey)
Invoked byprocess()
for each tag found.
-
-
-
Constructor Detail
-
Tagger
public Tagger(org.apache.lucene.index.Terms terms, org.apache.lucene.util.Bits liveDocs, org.apache.lucene.analysis.TokenStream tokenStream, TagClusterReducer tagClusterReducer, boolean skipAltTokens, boolean ignoreStopWords) throws IOException
- Throws:
IOException
-
-
Method Detail
-
enableDocIdsCache
public void enableDocIdsCache(int initSize)
-
process
public void process() throws IOException
- Throws:
IOException
-
tagCallback
protected abstract void tagCallback(int startOffset, int endOffset, Object docIdsKey)
Invoked byprocess()
for each tag found. endOffset is always >= the endOffset given in the previous call.- Parameters:
startOffset
- The character offset of the original stream where the tag starts.endOffset
- One more than the character offset of the original stream where the tag ends.docIdsKey
- A reference to the matching docIds that can be resolved vialookupDocIds(Object)
.
-
lookupDocIds
protected org.apache.lucene.util.IntsRef lookupDocIds(Object docIdsKey)
Returns a sorted array of integer docIds given the corresponding key.- Parameters:
docIdsKey
- The lookup key.- Returns:
- Not null
-
-