public final class SimplePreAnalyzedParser extends Object implements PreAnalyzedField.PreAnalyzedParser
PreAnalyzedField
.
The format of the serialization is as follows:
content ::= version (stored)? tokens version ::= digit+ " " ; stored field value - any "=" inside must be escaped! stored ::= "=" text "=" tokens ::= (token ((" ") + token)*)* token ::= text ("," attrib)* attrib ::= name '=' value name ::= text value ::= text
Special characters in "text" values can be escaped using the escape character \ . The following escape sequences are recognized:
"\ " - literal space character "\," - literal , character "\=" - literal = character "\\" - literal \ character "\n" - newline "\r" - carriage return "\t" - horizontal tabPlease note that Unicode sequences (e.g. \u0001) are not supported.
i - position increment (integer) s - token offset, start position (integer) e - token offset, end position (integer) t - token type (string) f - token flags (hexadecimal integer) p - payload (bytes in hexadecimal format; whitespace is ignored)Token offsets are tracked and implicitly added to the token stream - the start and end offsets consider only the term text and whitespace, and exclude the space taken by token attributes.
1 one two three - version 1 - stored: 'null' - tok: '(term=one,startOffset=0,endOffset=3)' - tok: '(term=two,startOffset=4,endOffset=7)' - tok: '(term=three,startOffset=8,endOffset=13)' 1 one two three - version 1 - stored: 'null' - tok: '(term=one,startOffset=0,endOffset=3)' - tok: '(term=two,startOffset=5,endOffset=8)' - tok: '(term=three,startOffset=11,endOffset=16)' 1 one,s=123,e=128,i=22 two three,s=20,e=22 - version 1 - stored: 'null' - tok: '(term=one,positionIncrement=22,startOffset=123,endOffset=128)' - tok: '(term=two,positionIncrement=1,startOffset=5,endOffset=8)' - tok: '(term=three,positionIncrement=1,startOffset=20,endOffset=22)' 1 \ one\ \,,i=22,a=\, two\= \n,\ =\ \ - version 1 - stored: 'null' - tok: '(term= one ,,positionIncrement=22,startOffset=0,endOffset=6)' - tok: '(term=two= ,positionIncrement=1,startOffset=7,endOffset=15)' - tok: '(term=\,positionIncrement=1,startOffset=17,endOffset=18)' 1 ,i=22 ,i=33,s=2,e=20 , - version 1 - stored: 'null' - tok: '(term=,positionIncrement=22,startOffset=0,endOffset=0)' - tok: '(term=,positionIncrement=33,startOffset=2,endOffset=20)' - tok: '(term=,positionIncrement=1,startOffset=2,endOffset=2)' 1 =This is the stored part with \= \n \t escapes.=one two three - version 1 - stored: 'This is the stored part with = \n \t escapes.' - tok: '(term=one,startOffset=0,endOffset=3)' - tok: '(term=two,startOffset=4,endOffset=7)' - tok: '(term=three,startOffset=8,endOffset=13)' 1 == - version 1 - stored: '' - (no tokens) 1 =this is a test.= - version 1 - stored: 'this is a test.' - (no tokens)
Constructor and Description |
---|
SimplePreAnalyzedParser() |
Modifier and Type | Method and Description |
---|---|
PreAnalyzedField.ParseResult |
parse(Reader reader,
AttributeSource parent)
Parse input.
|
String |
toFormattedString(Field f)
Format a field so that the resulting String is valid for parsing with
PreAnalyzedField.PreAnalyzedParser.parse(Reader, AttributeSource) . |
public PreAnalyzedField.ParseResult parse(Reader reader, AttributeSource parent) throws IOException
PreAnalyzedField.PreAnalyzedParser
parse
in interface PreAnalyzedField.PreAnalyzedParser
reader
- input to read fromparent
- parent who will own the resulting states (tokens with attributes)IOException
- if a parsing error or IO error occurspublic String toFormattedString(Field f) throws IOException
PreAnalyzedField.PreAnalyzedParser
PreAnalyzedField.PreAnalyzedParser.parse(Reader, AttributeSource)
.toFormattedString
in interface PreAnalyzedField.PreAnalyzedParser
f
- field instanceIOException
- If there is a low-level I/O error.Copyright © 2000-2019 Apache Software Foundation. All Rights Reserved.