Package org.apache.solr.schema
Class SimplePreAnalyzedParser
- java.lang.Object
-
- org.apache.solr.schema.SimplePreAnalyzedParser
-
- All Implemented Interfaces:
PreAnalyzedField.PreAnalyzedParser
public final class SimplePreAnalyzedParser extends Object implements PreAnalyzedField.PreAnalyzedParser
Simple plain text format parser forPreAnalyzedField
.Serialization format
The format of the serialization is as follows:
content ::= version (stored)? tokens version ::= digit+ " " ; stored field value - any "=" inside must be escaped! stored ::= "=" text "=" tokens ::= (token ((" ") + token)*)* token ::= text ("," attrib)* attrib ::= name '=' value name ::= text value ::= text
Special characters in "text" values can be escaped using the escape character \ . The following escape sequences are recognized:
"\ " - literal space character "\," - literal , character "\=" - literal = character "\\" - literal \ character "\n" - newline "\r" - carriage return "\t" - horizontal tab
Please note that Unicode sequences (e.g. \u0001) are not supported.Supported attribute names
The following token attributes are supported, and identified with short symbolic names:i - position increment (integer) s - token offset, start position (integer) e - token offset, end position (integer) t - token type (string) f - token flags (hexadecimal integer) p - payload (bytes in hexadecimal format; whitespace is ignored)
Token offsets are tracked and implicitly added to the token stream - the start and end offsets consider only the term text and whitespace, and exclude the space taken by token attributes.Example token streams
1 one two three - version 1 - stored: 'null' - tok: '(term=one,startOffset=0,endOffset=3)' - tok: '(term=two,startOffset=4,endOffset=7)' - tok: '(term=three,startOffset=8,endOffset=13)' 1 one two three - version 1 - stored: 'null' - tok: '(term=one,startOffset=0,endOffset=3)' - tok: '(term=two,startOffset=5,endOffset=8)' - tok: '(term=three,startOffset=11,endOffset=16)' 1 one,s=123,e=128,i=22 two three,s=20,e=22 - version 1 - stored: 'null' - tok: '(term=one,positionIncrement=22,startOffset=123,endOffset=128)' - tok: '(term=two,positionIncrement=1,startOffset=5,endOffset=8)' - tok: '(term=three,positionIncrement=1,startOffset=20,endOffset=22)' 1 \ one\ \,,i=22,a=\, two\= \n,\ =\ \ - version 1 - stored: 'null' - tok: '(term= one ,,positionIncrement=22,startOffset=0,endOffset=6)' - tok: '(term=two= ,positionIncrement=1,startOffset=7,endOffset=15)' - tok: '(term=\,positionIncrement=1,startOffset=17,endOffset=18)' 1 ,i=22 ,i=33,s=2,e=20 , - version 1 - stored: 'null' - tok: '(term=,positionIncrement=22,startOffset=0,endOffset=0)' - tok: '(term=,positionIncrement=33,startOffset=2,endOffset=20)' - tok: '(term=,positionIncrement=1,startOffset=2,endOffset=2)' 1 =This is the stored part with \= \n \t escapes.=one two three - version 1 - stored: 'This is the stored part with = \n \t escapes.' - tok: '(term=one,startOffset=0,endOffset=3)' - tok: '(term=two,startOffset=4,endOffset=7)' - tok: '(term=three,startOffset=8,endOffset=13)' 1 == - version 1 - stored: '' - (no tokens) 1 =this is a test.= - version 1 - stored: 'this is a test.' - (no tokens)
-
-
Constructor Summary
Constructors Constructor Description SimplePreAnalyzedParser()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description PreAnalyzedField.ParseResult
parse(Reader reader, org.apache.lucene.util.AttributeSource parent)
Parse input.String
toFormattedString(org.apache.lucene.document.Field f)
Format a field so that the resulting String is valid for parsing withPreAnalyzedField.PreAnalyzedParser.parse(Reader, AttributeSource)
.
-
-
-
Method Detail
-
parse
public PreAnalyzedField.ParseResult parse(Reader reader, org.apache.lucene.util.AttributeSource parent) throws IOException
Description copied from interface:PreAnalyzedField.PreAnalyzedParser
Parse input.- Specified by:
parse
in interfacePreAnalyzedField.PreAnalyzedParser
- Parameters:
reader
- input to read fromparent
- parent who will own the resulting states (tokens with attributes)- Returns:
- parse result, with possibly null stored and/or states fields.
- Throws:
IOException
- if a parsing error or IO error occurs
-
toFormattedString
public String toFormattedString(org.apache.lucene.document.Field f) throws IOException
Description copied from interface:PreAnalyzedField.PreAnalyzedParser
Format a field so that the resulting String is valid for parsing withPreAnalyzedField.PreAnalyzedParser.parse(Reader, AttributeSource)
.- Specified by:
toFormattedString
in interfacePreAnalyzedField.PreAnalyzedParser
- Parameters:
f
- field instance- Returns:
- formatted string
- Throws:
IOException
- If there is a low-level I/O error.
-
-