PatternTokenizerFactory (Solr 3.6.1 API)

java.lang.Object
- org.apache.solr.analysis.BaseTokenizerFactory
- - org.apache.solr.analysis.PatternTokenizerFactory

All Implemented Interfaces:

TokenizerFactory
```
public class PatternTokenizerFactory
extends BaseTokenizerFactory
```
Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".
- "pattern" is the regular expression.
- "group" says which group to extract into tokens.
group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): String.split(java.lang.String)

Using group >= 0 selects the matching group as the token. For example, if you have:
```
  pattern = \'([^\']+)\'
  group = 0
  input = aaa 'bbb' 'ccc'
```
the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

NOTE: This Tokenizer does not output tokens that are of zero length.
```
 <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
   </analyzer>
 </fieldType>
```
Since:

solr1.2

Version:

$Id:$

See Also:
PatternTokenizer

Field Summary

Fields
Modifier and Type	Field and Description
`protected Map<String,String>`	`args` The init args
`protected int`	`group`
`static String`	`GROUP`
`protected Version`	`luceneMatchVersion` the luceneVersion arg
`protected Pattern`	`pattern`
`static String`	`PATTERN`

Fields inherited from class org.apache.solr.analysis.BaseTokenizerFactory
log

Constructor Summary

Constructors
Constructor and Description

PatternTokenizerFactory()

Constructors
Constructor and Description
`PatternTokenizerFactory()`

Method Summary

Methods
Modifier and Type	Method and Description
`protected void`	`assureMatchVersion()` this method can be called in the `TokenizerFactory.create(java.io.Reader)` or `TokenFilterFactory.create(org.apache.lucene.analysis.TokenStream)` methods, to inform user, that for this factory a `luceneMatchVersion` is required
`Tokenizer`	`create(Reader in)` Split the input using configured pattern
`Map<String,String>`	`getArgs()`
`protected boolean`	`getBoolean(String name, boolean defaultVal)`
`protected boolean`	`getBoolean(String name, boolean defaultVal, boolean useDefault)`
`protected int`	`getInt(String name)`
`protected int`	`getInt(String name, int defaultVal)`
`protected int`	`getInt(String name, int defaultVal, boolean useDefault)`
`protected CharArraySet`	`getSnowballWordSet(ResourceLoader loader, String wordFiles, boolean ignoreCase)` same as `getWordSet(ResourceLoader, String, boolean)`, except the input is in snowball format.
`protected CharArraySet`	`getWordSet(ResourceLoader loader, String wordFiles, boolean ignoreCase)`
`static List<Token>`	`group(Matcher matcher, String input, int group)` Deprecated.
`void`	`init(Map<String,String> args)` Require a configured pattern
`static List<Token>`	`split(Matcher matcher, String input)` Deprecated.
`protected void`	`warnDeprecated(String message)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.solr.analysis.TokenizerFactory
getArgs

Field Detail

PATTERN
```
public static final String PATTERN
```
See Also:
Constant Field Values

GROUP
```
public static final String GROUP
```
See Also:
Constant Field Values

pattern
```
protected Pattern pattern
```

group
```
protected int group
```

args
```
protected Map<String,String> args
```
The init args

luceneMatchVersion
```
protected Version luceneMatchVersion
```
the luceneVersion arg

Constructor Detail
- PatternTokenizerFactory
```
public PatternTokenizerFactory()
```

Method Detail

init
```
public void init(Map<String,String> args)
```
Require a configured pattern

Specified by:

init in interface TokenizerFactory

create
```
public Tokenizer create(Reader in)
```
Split the input using configured pattern

split
```
@Deprecated
public static List<Token> split(Matcher matcher,
                           String input)
```
Deprecated.

This behaves just like String.split( ), but returns a list of Tokens rather then an array of strings NOTE: This method is not used in 1.4.

group

@Deprecated
public static List<Token> group(Matcher matcher,
                           String input,
                           int group)

Deprecated.

Create tokens from the matches in a matcher NOTE: This method is not used in 1.4.

getArgs
```
public Map<String,String> getArgs()
```

assureMatchVersion
```
protected final void assureMatchVersion()
```
this method can be called in the TokenizerFactory.create(java.io.Reader) or TokenFilterFactory.create(org.apache.lucene.analysis.TokenStream) methods, to inform user, that for this factory a luceneMatchVersion is required

warnDeprecated

protected final void warnDeprecated(String message)

getInt
```
protected int getInt(String name)
```

getInt

protected int getInt(String name,
         int defaultVal)

getInt

protected int getInt(String name,
         int defaultVal,
         boolean useDefault)

getBoolean

protected boolean getBoolean(String name,
                 boolean defaultVal)

getBoolean

protected boolean getBoolean(String name,
                 boolean defaultVal,
                 boolean useDefault)

getWordSet

protected CharArraySet getWordSet(ResourceLoader loader,
                      String wordFiles,
                      boolean ignoreCase)
                           throws IOException

Throws:: IOException

getSnowballWordSet

protected CharArraySet getSnowballWordSet(ResourceLoader loader,
                              String wordFiles,
                              boolean ignoreCase)
                                   throws IOException

same as getWordSet(ResourceLoader, String, boolean), except the input is in snowball format.

Throws:: IOException

Class PatternTokenizerFactory

Field Summary

Fields inherited from class org.apache.solr.analysis.BaseTokenizerFactory

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.solr.analysis.TokenizerFactory

Field Detail

PATTERN

GROUP

pattern

group

args

luceneMatchVersion

Constructor Detail

PatternTokenizerFactory

Method Detail

init

create

split

group

getArgs

assureMatchVersion

warnDeprecated

getInt

getInt

getInt

getBoolean

getBoolean

getWordSet

getSnowballWordSet