Class CSVParser
- java.lang.Object
-
- org.apache.solr.internal.csv.CSVParser
-
public class CSVParser extends Object
Parses CSV files according to the specified configuration.Because CSV appears in many different dialects, the parser supports many configuration settings by allowing the specification of a
CSVStrategy
.Parsing of a csv-string having tabs as separators, '"' as an optional value encapsulator, and comments starting with '#':
String[][] data = (new CSVParser(new StringReader("a\tb\nc\td"), new CSVStrategy('\t','"','#'))).getAllValues();
Parsing of a csv-string in Excel CSV format
String[][] data = (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
Internal parser state is completely covered by the strategy and the reader-state.
see package documentation for more details
-
-
Field Summary
Fields Modifier and Type Field Description protected static int
TT_EOF
Token (which can have content) when end of file is reached.protected static int
TT_EORECORD
Token with content when end of a line is reached.protected static int
TT_INVALID
Token has no valid content, i.e.protected static int
TT_TOKEN
Token with content, at beginning or in the middle of a line.
-
Constructor Summary
Constructors Constructor Description CSVParser(Reader input)
CSV parser using the defaultCSVStrategy
.CSVParser(Reader input, CSVStrategy strategy)
Customized CSV parser using the givenCSVStrategy
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String[][]
getAllValues()
Parses the CSV according to the given strategy and returns the content as an array of records (whereas records are arrays of single values).String[]
getLine()
Parses from the current point in the stream til the end of the current line.int
getLineNumber()
Returns the current line number in the input stream.CSVStrategy
getStrategy()
Obtain the specified CSV Strategy.protected org.apache.solr.internal.csv.CSVParser.Token
nextToken()
Convenience method fornextToken(null)
.protected org.apache.solr.internal.csv.CSVParser.Token
nextToken(org.apache.solr.internal.csv.CSVParser.Token tkn)
Returns the next token.String
nextValue()
Parses the CSV according to the given strategy and returns the next csv-value as string.protected int
unicodeEscapeLexer(int c)
Decodes Unicode escapes.
-
-
-
Field Detail
-
TT_INVALID
protected static final int TT_INVALID
Token has no valid content, i.e. is in its initialized state.- See Also:
- Constant Field Values
-
TT_TOKEN
protected static final int TT_TOKEN
Token with content, at beginning or in the middle of a line.- See Also:
- Constant Field Values
-
TT_EOF
protected static final int TT_EOF
Token (which can have content) when end of file is reached.- See Also:
- Constant Field Values
-
TT_EORECORD
protected static final int TT_EORECORD
Token with content when end of a line is reached.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
CSVParser
public CSVParser(Reader input)
CSV parser using the defaultCSVStrategy
.- Parameters:
input
- a Reader containing "csv-formatted" input
-
CSVParser
public CSVParser(Reader input, CSVStrategy strategy)
Customized CSV parser using the givenCSVStrategy
- Parameters:
input
- a Reader containing "csv-formatted" inputstrategy
- the CSVStrategy used for CSV parsing
-
-
Method Detail
-
getAllValues
public String[][] getAllValues() throws IOException
Parses the CSV according to the given strategy and returns the content as an array of records (whereas records are arrays of single values).The returned content starts at the current parse-position in the stream.
- Returns:
- matrix of records x values ('null' when end of file)
- Throws:
IOException
- on parse error or input read-failure
-
nextValue
public String nextValue() throws IOException
Parses the CSV according to the given strategy and returns the next csv-value as string.- Returns:
- next value in the input stream ('null' when end of file)
- Throws:
IOException
- on parse error or input read-failure
-
getLine
public String[] getLine() throws IOException
Parses from the current point in the stream til the end of the current line.- Returns:
- array of values til end of line ('null' when end of file has been reached)
- Throws:
IOException
- on parse error or input read-failure
-
getLineNumber
public int getLineNumber()
Returns the current line number in the input stream.ATTENTION: in case your csv has multiline-values the returned number does not correspond to the record-number
- Returns:
- current line number
-
nextToken
protected org.apache.solr.internal.csv.CSVParser.Token nextToken() throws IOException
Convenience method fornextToken(null)
.- Throws:
IOException
-
nextToken
protected org.apache.solr.internal.csv.CSVParser.Token nextToken(org.apache.solr.internal.csv.CSVParser.Token tkn) throws IOException
Returns the next token.A token corresponds to a term, a record change or an end-of-file indicator.
- Parameters:
tkn
- an existing Token object to reuse. The caller is responsible to initialize the Token.- Returns:
- the next token found
- Throws:
IOException
- on stream access error
-
unicodeEscapeLexer
protected int unicodeEscapeLexer(int c) throws IOException
Decodes Unicode escapes.Interpretation of "\\uXXXX" escape sequences where XXXX is a hex-number.
- Parameters:
c
- current char which is discarded because it's the "\\" of "\\uXXXX"- Returns:
- the decoded character
- Throws:
IOException
- on wrong unicode escape sequence or read error
-
getStrategy
public CSVStrategy getStrategy()
Obtain the specified CSV Strategy. This should not be modified.- Returns:
- strategy currently being used
-
-