Class CSVParser

java.lang.Object
org.apache.solr.internal.csv.CSVParser

public class CSVParser extends Object
Parses CSV files according to the specified configuration.

Because CSV appears in many different dialects, the parser supports many configuration settings by allowing the specification of a CSVStrategy.

Parsing of a csv-string having tabs as separators, '"' as an optional value encapsulator, and comments starting with '#':

  String[][] data =
   (new CSVParser(new StringReader("a\tb\nc\td"), new CSVStrategy('\t','"','#'))).getAllValues();
 

Parsing of a csv-string in Excel CSV format

  String[][] data =
   (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
 

Internal parser state is completely covered by the strategy and the reader-state.

see package documentation for more details

  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected static final int
    Token (which can have content) when end of file is reached.
    protected static final int
    Token with content when end of a line is reached.
    protected static final int
    Token has no valid content, i.e.
    protected static final int
    Token with content, at beginning or in the middle of a line.
  • Constructor Summary

    Constructors
    Constructor
    Description
    CSV parser using the default CSVStrategy.
    CSVParser(Reader input, CSVStrategy strategy)
    Customized CSV parser using the given CSVStrategy
  • Method Summary

    Modifier and Type
    Method
    Description
    String[][]
    Parses the CSV according to the given strategy and returns the content as an array of records (whereas records are arrays of single values).
    Parses from the current point in the stream til the end of the current line.
    int
    Returns the current line number in the input stream.
    Obtain the specified CSV Strategy.
    protected org.apache.solr.internal.csv.CSVParser.Token
    Convenience method for nextToken(null).
    protected org.apache.solr.internal.csv.CSVParser.Token
    nextToken(org.apache.solr.internal.csv.CSVParser.Token tkn)
    Returns the next token.
    Parses the CSV according to the given strategy and returns the next csv-value as string.
    protected int
    Decodes Unicode escapes.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • TT_INVALID

      protected static final int TT_INVALID
      Token has no valid content, i.e. is in its initialized state.
      See Also:
    • TT_TOKEN

      protected static final int TT_TOKEN
      Token with content, at beginning or in the middle of a line.
      See Also:
    • TT_EOF

      protected static final int TT_EOF
      Token (which can have content) when end of file is reached.
      See Also:
    • TT_EORECORD

      protected static final int TT_EORECORD
      Token with content when end of a line is reached.
      See Also:
  • Constructor Details

    • CSVParser

      public CSVParser(Reader input)
      CSV parser using the default CSVStrategy.
      Parameters:
      input - a Reader containing "csv-formatted" input
    • CSVParser

      public CSVParser(Reader input, CSVStrategy strategy)
      Customized CSV parser using the given CSVStrategy
      Parameters:
      input - a Reader containing "csv-formatted" input
      strategy - the CSVStrategy used for CSV parsing
  • Method Details

    • getAllValues

      public String[][] getAllValues() throws IOException
      Parses the CSV according to the given strategy and returns the content as an array of records (whereas records are arrays of single values).

      The returned content starts at the current parse-position in the stream.

      Returns:
      matrix of records x values ('null' when end of file)
      Throws:
      IOException - on parse error or input read-failure
    • nextValue

      public String nextValue() throws IOException
      Parses the CSV according to the given strategy and returns the next csv-value as string.
      Returns:
      next value in the input stream ('null' when end of file)
      Throws:
      IOException - on parse error or input read-failure
    • getLine

      public String[] getLine() throws IOException
      Parses from the current point in the stream til the end of the current line.
      Returns:
      array of values til end of line ('null' when end of file has been reached)
      Throws:
      IOException - on parse error or input read-failure
    • getLineNumber

      public int getLineNumber()
      Returns the current line number in the input stream.

      ATTENTION: in case your csv has multiline-values the returned number does not correspond to the record-number

      Returns:
      current line number
    • nextToken

      protected org.apache.solr.internal.csv.CSVParser.Token nextToken() throws IOException
      Convenience method for nextToken(null).
      Throws:
      IOException
    • nextToken

      protected org.apache.solr.internal.csv.CSVParser.Token nextToken(org.apache.solr.internal.csv.CSVParser.Token tkn) throws IOException
      Returns the next token.

      A token corresponds to a term, a record change or an end-of-file indicator.

      Parameters:
      tkn - an existing Token object to reuse. The caller is responsible to initialize the Token.
      Returns:
      the next token found
      Throws:
      IOException - on stream access error
    • unicodeEscapeLexer

      protected int unicodeEscapeLexer(int c) throws IOException
      Decodes Unicode escapes.

      Interpretation of "\\uXXXX" escape sequences where XXXX is a hex-number.

      Parameters:
      c - current char which is discarded because it's the "\\" of "\\uXXXX"
      Returns:
      the decoded character
      Throws:
      IOException - on wrong unicode escape sequence or read error
    • getStrategy

      public CSVStrategy getStrategy()
      Obtain the specified CSV Strategy. This should not be modified.
      Returns:
      strategy currently being used