PatternTokenizer (Solr 3.6.1 API)

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- org.apache.lucene.util.AttributeSource
- - org.apache.lucene.analysis.TokenStream
  - - org.apache.lucene.analysis.Tokenizer
    - - org.apache.solr.analysis.PatternTokenizer

All Implemented Interfaces:

Closeable
```
public final class PatternTokenizer
extends Tokenizer
```
This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".
- "pattern" is the regular expression.
- "group" says which group to extract into tokens.
group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): String.split(java.lang.String)

Using group >= 0 selects the matching group as the token. For example, if you have:
```
  pattern = \'([^\']+)\'
  group = 0
  input = aaa 'bbb' 'ccc'
```
the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

NOTE: This Tokenizer does not output tokens that are of zero length.
See Also:
Pattern

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
  AttributeSource.AttributeFactory, AttributeSource.State

Field Summary
- Fields inherited from class org.apache.lucene.analysis.Tokenizer
  input

Constructor Summary

Constructors
Constructor and Description
`PatternTokenizer(Reader input, Pattern pattern, int group)` creates a new PatternTokenizer returning tokens from group (-1 for split functionality)

Method Summary

Methods
Modifier and Type Method and Description

void end()

boolean incrementToken()

void reset(Reader input)
- Methods inherited from class org.apache.lucene.analysis.Tokenizer
  close, correctOffset
- Methods inherited from class org.apache.lucene.analysis.TokenStream
  reset
- Methods inherited from class org.apache.lucene.util.AttributeSource
  addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
- Methods inherited from class java.lang.Object
  clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Constructor Detail

PatternTokenizer

public PatternTokenizer(Reader input,
                Pattern pattern,
                int group)
                 throws IOException

creates a new PatternTokenizer returning tokens from group (-1 for split functionality)

Throws:: IOException

Method Detail
- incrementToken
```
public boolean incrementToken()
                       throws IOException
```
  Specified by:
  
  incrementToken in class TokenStream
  
  Throws:
  
  IOException
- end
```
public void end()
         throws IOException
```
  Overrides:
  
  end in class TokenStream
  
  Throws:
  
  IOException
- reset
```
public void reset(Reader input)
           throws IOException
```
  Overrides:
  
  reset in class Tokenizer
  
  Throws:
  
  IOException

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method