Class Tagger

  • public abstract class Tagger
    extends Object
    Tags maximum string of words in a corpus. This is a callback-style API in which you implement tagCallback(int, int, Object).

    This class should be independently usable outside Solr.

    • Constructor Detail

      • Tagger

        public Tagger​(org.apache.lucene.index.Terms terms,
                      org.apache.lucene.util.Bits liveDocs,
                      org.apache.lucene.analysis.TokenStream tokenStream,
                      TagClusterReducer tagClusterReducer,
                      boolean skipAltTokens,
                      boolean ignoreStopWords)
               throws IOException
    • Method Detail

      • enableDocIdsCache

        public void enableDocIdsCache​(int initSize)
      • tagCallback

        protected abstract void tagCallback​(int startOffset,
                                            int endOffset,
                                            Object docIdsKey)
        Invoked by process() for each tag found. endOffset is always >= the endOffset given in the previous call.
        startOffset - The character offset of the original stream where the tag starts.
        endOffset - One more than the character offset of the original stream where the tag ends.
        docIdsKey - A reference to the matching docIds that can be resolved via lookupDocIds(Object).
      • lookupDocIds

        protected org.apache.lucene.util.IntsRef lookupDocIds​(Object docIdsKey)
        Returns a sorted array of integer docIds given the corresponding key.
        docIdsKey - The lookup key.
        Not null