Class Token

java.lang.Object
org.apache.lucene.util.AttributeImpl
org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl
org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl
org.apache.solr.spelling.Token
All Implemented Interfaces:
Appendable, CharSequence, Cloneable, org.apache.lucene.analysis.tokenattributes.CharTermAttribute, org.apache.lucene.analysis.tokenattributes.FlagsAttribute, org.apache.lucene.analysis.tokenattributes.OffsetAttribute, org.apache.lucene.analysis.tokenattributes.PayloadAttribute, org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute, org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute, org.apache.lucene.analysis.tokenattributes.TermFrequencyAttribute, org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute, org.apache.lucene.analysis.tokenattributes.TypeAttribute, org.apache.lucene.util.Attribute

@Deprecated public class Token extends org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl implements org.apache.lucene.analysis.tokenattributes.FlagsAttribute, org.apache.lucene.analysis.tokenattributes.PayloadAttribute
Deprecated.
A Token is an occurrence of a term from the text of a field. It consists of a term's text, the start and end offset of the term in the text of the field, and a type string.

The start and end offsets permit applications to re-associate a token with its source text, e.g., to display highlighted query terms in a document browser, or to show matching text fragments in a KWIC display, etc.

The type is a string, assigned by a lexical analyzer (a.k.a. tokenizer), naming the lexical or syntactic class that the token belongs to. For example an end of sentence marker token might be implemented with type "eos". The default token type is "word".

A Token can optionally have metadata (a.k.a. payload) in the form of a variable length byte array. Use PostingsEnum.getPayload() to retrieve the payloads from the index.

A few things to note:

  • clear() initializes all of the fields to default values. This was changed in contrast to Lucene 2.4, but should affect no one.
  • Because TokenStreams can be chained, one cannot assume that the Token's current type is correct.
  • The startOffset and endOffset represent the start and offset in the source text, so be careful in adjusting them.
  • When caching a reusable token, clone it. When injecting a cached token into a stream that can be reset, clone it again.
  • Field Summary

    Fields inherited from class org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl

    builder

    Fields inherited from interface org.apache.lucene.analysis.tokenattributes.TypeAttribute

    DEFAULT_TYPE
  • Constructor Summary

    Constructors
    Constructor
    Description
    Deprecated.
    Constructs a Token will null text.
    Token(CharSequence text, int start, int end)
    Deprecated.
    Constructs a Token with the given term text, start and end offsets.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Deprecated.
    Resets the term text, payload, flags, positionIncrement, positionLength, startOffset, endOffset and token type to default.
    Deprecated.
     
    void
    copyTo(org.apache.lucene.util.AttributeImpl target)
    Deprecated.
     
    boolean
    Deprecated.
     
    int
    Deprecated.
    org.apache.lucene.util.BytesRef
    Deprecated.
    int
    Deprecated.
     
    void
    reflectWith(org.apache.lucene.util.AttributeReflector reflector)
    Deprecated.
     
    void
    setFlags(int flags)
    Deprecated.
    void
    setPayload(org.apache.lucene.util.BytesRef payload)
    Deprecated.

    Methods inherited from class org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl

    end, endOffset, getPositionIncrement, getPositionLength, getTermFrequency, setOffset, setPositionIncrement, setPositionLength, setTermFrequency, setType, startOffset, type

    Methods inherited from class org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl

    append, append, append, append, append, append, buffer, charAt, copyBuffer, getBytesRef, length, resizeBuffer, setEmpty, setLength, subSequence, toString

    Methods inherited from class org.apache.lucene.util.AttributeImpl

    reflectAsString

    Methods inherited from class java.lang.Object

    finalize, getClass, notify, notifyAll, wait, wait, wait

    Methods inherited from interface java.lang.CharSequence

    chars, codePoints, isEmpty
  • Constructor Details

    • Token

      public Token()
      Deprecated.
      Constructs a Token will null text.
    • Token

      public Token(CharSequence text, int start, int end)
      Deprecated.
      Constructs a Token with the given term text, start and end offsets. The type defaults to "word." NOTE: for better indexing speed you should instead use the char[] termBuffer methods to set the term text.
      Parameters:
      text - term text
      start - start offset in the source text
      end - end offset in the source text
  • Method Details

    • getFlags

      public int getFlags()
      Deprecated.
      Specified by:
      getFlags in interface org.apache.lucene.analysis.tokenattributes.FlagsAttribute
      See Also:
      • FlagsAttribute
    • setFlags

      public void setFlags(int flags)
      Deprecated.
      Specified by:
      setFlags in interface org.apache.lucene.analysis.tokenattributes.FlagsAttribute
      See Also:
      • FlagsAttribute
    • getPayload

      public org.apache.lucene.util.BytesRef getPayload()
      Deprecated.
      Specified by:
      getPayload in interface org.apache.lucene.analysis.tokenattributes.PayloadAttribute
      See Also:
      • PayloadAttribute
    • setPayload

      public void setPayload(org.apache.lucene.util.BytesRef payload)
      Deprecated.
      Specified by:
      setPayload in interface org.apache.lucene.analysis.tokenattributes.PayloadAttribute
      See Also:
      • PayloadAttribute
    • clear

      public void clear()
      Deprecated.
      Resets the term text, payload, flags, positionIncrement, positionLength, startOffset, endOffset and token type to default.
      Overrides:
      clear in class org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl
    • equals

      public boolean equals(Object obj)
      Deprecated.
      Overrides:
      equals in class org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl
    • hashCode

      public int hashCode()
      Deprecated.
      Overrides:
      hashCode in class org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl
    • clone

      public Token clone()
      Deprecated.
      Overrides:
      clone in class org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl
    • copyTo

      public void copyTo(org.apache.lucene.util.AttributeImpl target)
      Deprecated.
      Overrides:
      copyTo in class org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl
    • reflectWith

      public void reflectWith(org.apache.lucene.util.AttributeReflector reflector)
      Deprecated.
      Overrides:
      reflectWith in class org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl