Class CSVParser
Because CSV appears in many different dialects, the parser supports many configuration
settings by allowing the specification of a CSVStrategy.
Parsing of a csv-string having tabs as separators, '"' as an optional value encapsulator, and comments starting with '#':
String[][] data =
(new CSVParser(new StringReader("a\tb\nc\td"), new CSVStrategy('\t','"','#'))).getAllValues();
Parsing of a csv-string in Excel CSV format
String[][] data =
(new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
Internal parser state is completely covered by the strategy and the reader-state.
see package documentation for more details
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static final intToken (which can have content) when end of file is reached.protected static final intToken with content when end of a line is reached.protected static final intToken has no valid content, i.e.protected static final intToken with content, at beginning or in the middle of a line. -
Constructor Summary
ConstructorsConstructorDescriptionCSV parser using the defaultCSVStrategy.CSVParser(Reader input, CSVStrategy strategy) Customized CSV parser using the givenCSVStrategy -
Method Summary
Modifier and TypeMethodDescriptionString[][]Parses the CSV according to the given strategy and returns the content as an array of records (whereas records are arrays of single values).String[]getLine()Parses from the current point in the stream til the end of the current line.intReturns the current line number in the input stream.Obtain the specified CSV Strategy.protected org.apache.solr.internal.csv.CSVParser.TokenConvenience method fornextToken(null).protected org.apache.solr.internal.csv.CSVParser.TokennextToken(org.apache.solr.internal.csv.CSVParser.Token tkn) Returns the next token.Parses the CSV according to the given strategy and returns the next csv-value as string.protected intunicodeEscapeLexer(int c) Decodes Unicode escapes.
-
Field Details
-
TT_INVALID
protected static final int TT_INVALIDToken has no valid content, i.e. is in its initialized state.- See Also:
-
TT_TOKEN
protected static final int TT_TOKENToken with content, at beginning or in the middle of a line.- See Also:
-
TT_EOF
protected static final int TT_EOFToken (which can have content) when end of file is reached.- See Also:
-
TT_EORECORD
protected static final int TT_EORECORDToken with content when end of a line is reached.- See Also:
-
-
Constructor Details
-
CSVParser
CSV parser using the defaultCSVStrategy.- Parameters:
input- a Reader containing "csv-formatted" input
-
CSVParser
Customized CSV parser using the givenCSVStrategy- Parameters:
input- a Reader containing "csv-formatted" inputstrategy- the CSVStrategy used for CSV parsing
-
-
Method Details
-
getAllValues
Parses the CSV according to the given strategy and returns the content as an array of records (whereas records are arrays of single values).The returned content starts at the current parse-position in the stream.
- Returns:
- matrix of records x values ('null' when end of file)
- Throws:
IOException- on parse error or input read-failure
-
nextValue
Parses the CSV according to the given strategy and returns the next csv-value as string.- Returns:
- next value in the input stream ('null' when end of file)
- Throws:
IOException- on parse error or input read-failure
-
getLine
Parses from the current point in the stream til the end of the current line.- Returns:
- array of values til end of line ('null' when end of file has been reached)
- Throws:
IOException- on parse error or input read-failure
-
getLineNumber
public int getLineNumber()Returns the current line number in the input stream.ATTENTION: in case your csv has multiline-values the returned number does not correspond to the record-number
- Returns:
- current line number
-
nextToken
Convenience method fornextToken(null).- Throws:
IOException
-
nextToken
protected org.apache.solr.internal.csv.CSVParser.Token nextToken(org.apache.solr.internal.csv.CSVParser.Token tkn) throws IOException Returns the next token.A token corresponds to a term, a record change or an end-of-file indicator.
- Parameters:
tkn- an existing Token object to reuse. The caller is responsible to initialize the Token.- Returns:
- the next token found
- Throws:
IOException- on stream access error
-
unicodeEscapeLexer
Decodes Unicode escapes.Interpretation of "\\uXXXX" escape sequences where XXXX is a hex-number.
- Parameters:
c- current char which is discarded because it's the "\\" of "\\uXXXX"- Returns:
- the decoded character
- Throws:
IOException- on wrong unicode escape sequence or read error
-
getStrategy
Obtain the specified CSV Strategy. This should not be modified.- Returns:
- strategy currently being used
-