java.lang.Object
- org.apache.solr.schema.SimplePreAnalyzedParser

All Implemented Interfaces:: PreAnalyzedField.PreAnalyzedParser

public final class SimplePreAnalyzedParser
extends Object
implements PreAnalyzedField.PreAnalyzedParser

Simple plain text format parser for PreAnalyzedField.

Serialization format

The format of the serialization is as follows:

 content ::= version (stored)? tokens
 version ::= digit+ " "
 ; stored field value - any "=" inside must be escaped!
 stored ::= "=" text "="
 tokens ::= (token ((" ") + token)*)*
 token ::= text ("," attrib)*
 attrib ::= name '=' value
 name ::= text
 value ::= text

Special characters in "text" values can be escaped using the escape character \ . The following escape sequences are recognized:

 "\ " - literal space character
 "\," - literal , character
 "\=" - literal = character
 "\\" - literal \ character
 "\n" - newline
 "\r" - carriage return
 "\t" - horizontal tab

Please note that Unicode sequences (e.g. \u0001) are not supported.

Supported attribute names

The following token attributes are supported, and identified with short symbolic names:

 i - position increment (integer)
 s - token offset, start position (integer)
 e - token offset, end position (integer)
 t - token type (string)
 f - token flags (hexadecimal integer)
 p - payload (bytes in hexadecimal format; whitespace is ignored)

Token offsets are tracked and implicitly added to the token stream - the start and end offsets consider only the term text and whitespace, and exclude the space taken by token attributes.

Example token streams

 1 one two three
 - version 1
 - stored: 'null'
 - tok: '(term=one,startOffset=0,endOffset=3)'
 - tok: '(term=two,startOffset=4,endOffset=7)'
 - tok: '(term=three,startOffset=8,endOffset=13)'
 1 one  two   three
 - version 1
 - stored: 'null'
 - tok: '(term=one,startOffset=0,endOffset=3)'
 - tok: '(term=two,startOffset=5,endOffset=8)'
 - tok: '(term=three,startOffset=11,endOffset=16)'
 1 one,s=123,e=128,i=22  two three,s=20,e=22
 - version 1
 - stored: 'null'
 - tok: '(term=one,positionIncrement=22,startOffset=123,endOffset=128)'
 - tok: '(term=two,positionIncrement=1,startOffset=5,endOffset=8)'
 - tok: '(term=three,positionIncrement=1,startOffset=20,endOffset=22)'
 1 \ one\ \,,i=22,a=\, two\=

 \n,\ =\   \
 - version 1
 - stored: 'null'
 - tok: '(term= one ,,positionIncrement=22,startOffset=0,endOffset=6)'
 - tok: '(term=two=


 ,positionIncrement=1,startOffset=7,endOffset=15)'
 - tok: '(term=\,positionIncrement=1,startOffset=17,endOffset=18)'
 1 ,i=22 ,i=33,s=2,e=20 ,
 - version 1
 - stored: 'null'
 - tok: '(term=,positionIncrement=22,startOffset=0,endOffset=0)'
 - tok: '(term=,positionIncrement=33,startOffset=2,endOffset=20)'
 - tok: '(term=,positionIncrement=1,startOffset=2,endOffset=2)'
 1 =This is the stored part with \=
 \n    \t escapes.=one two three
 - version 1
 - stored: 'This is the stored part with =
 \n    \t escapes.'
 - tok: '(term=one,startOffset=0,endOffset=3)'
 - tok: '(term=two,startOffset=4,endOffset=7)'
 - tok: '(term=three,startOffset=8,endOffset=13)'
 1 ==
 - version 1
 - stored: ''
 - (no tokens)
 1 =this is a test.=
 - version 1
 - stored: 'this is a test.'
 - (no tokens)

Constructor Summary

Constructors
Constructor Description

SimplePreAnalyzedParser()

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`PreAnalyzedField.ParseResult`	`parse(Reader reader, org.apache.lucene.util.AttributeSource parent)`	Parse input.
`String`	`toFormattedString(org.apache.lucene.document.Field f)`	Format a field so that the resulting String is valid for parsing with `PreAnalyzedField.PreAnalyzedParser.parse(Reader, AttributeSource)`.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - SimplePreAnalyzedParser
```
public SimplePreAnalyzedParser()
```
- Method Detail
  - parse
```
public PreAnalyzedField.ParseResult parse(Reader reader,
                                          org.apache.lucene.util.AttributeSource parent)
                                   throws IOException
```
    Description copied from interface: PreAnalyzedField.PreAnalyzedParser
    
    Parse input.
    
    Specified by:
    
    parse in interface PreAnalyzedField.PreAnalyzedParser
    
    Parameters:
    
    reader - input to read from
    
    parent - parent who will own the resulting states (tokens with attributes)
    
    Returns:
    
    parse result, with possibly null stored and/or states fields.
    
    Throws:
    
    IOException - if a parsing error or IO error occurs
  - toFormattedString
```
public String toFormattedString(org.apache.lucene.document.Field f)
                         throws IOException
```
    Description copied from interface: PreAnalyzedField.PreAnalyzedParser
    
    Format a field so that the resulting String is valid for parsing with PreAnalyzedField.PreAnalyzedParser.parse(Reader, AttributeSource).
    
    Specified by:
    
    toFormattedString in interface PreAnalyzedField.PreAnalyzedParser
    
    Parameters:
    
    f - field instance
    
    Returns:
    
    formatted string
    
    Throws:
    
    IOException - If there is a low-level I/O error.

Class SimplePreAnalyzedParser

Serialization format

Supported attribute names

Example token streams

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

SimplePreAnalyzedParser

Method Detail

parse

toFormattedString