CSVReader

We assume that you have already learned what is described in:

If you want to find the right Reader for your purposes, see Readers Comparison.

Short Summary

CSVReader reads data from flat files.

Component Data source Input ports Output ports Each to all outputs [ 1)] Different to different outputs [ 2)] Transformation Transf. req. Java CTL
CSVReaderflat file0-11-2
no
no
no
no
no
no

[ 1)] Sending each data record to every connected output port

[ 2)] Sending data records to output ports according to Return Values of Transformations

Abstract

CSVReader reads data from flat files such as CSV (comma-separated values) file and delimited, fixed-length, or mixed text files. The component can read a single file as well as a collection of files placed on a local disk or remotely. Remote files are accessible via HTTP, HTTPS, FTP, or SFTP protocols. Using this component, ZIP and TAR archives of flat files can be read. Also reading data from stdin (console), input port, or dictionary is supported.

Parsed data records are sent to the first output port. The component has an optional output logging port for getting detailed information about incorrect records. Only if Data Policy is set to controlled and a proper Writer (Trash or CSVWriter) is connected to port 1, all incorrect records together with the information about the incorrect value, its location and the error message are sent out through this error port.

Icon

Ports

Port typeNumberRequiredDescriptionMetadata
Input0
no
for Input Port Reading include specific byte/ cbyte/ string field
Output0
yes
for correct data recordsany [ 1)]
1
no
for incorrect data recordsspecific structure, see table below

[ 1)] Metadata on output port 0 can use Autofilling Functions

The optional logging port for incorrect records has to define the following metadata structure - the record contains exactly four fields (named arbitrarily) of given types in the following order:

Table 53.3. Error Metadata for CSVReader

Field numberField nameData typeDescription
0recordID integerposition of the erroneous record in the dataset (record numbering starts at 1)
1fieldID integerposition of the erroneous field in the record (1 stands for the first field, i.e., that of index 0)
2data string | byte | cbyteerroneous record in raw form (including delimiters)
3error string | byte | cbyteerror message - detailed information about this error

CSVReader Attributes

AttributeReqDescriptionPossible values
Basic
File URL
yes
path to data source (flat file, console, input port, dictionary) to be read specified, see Supported File URL Formats for Readers. 
Charset character encoding of input records (character encoding does not apply on byte fields if the record type is fixed)ISO-8859-1 (default) | <other encodings>
Data policy  specifies how to handle misformatted or incorrect data, see Data Policystrict (default) | controlled | lenient
Trim strings specifies whether leading and trailing whitespace should be removed from strings before setting them to data fields, see Trimming Data belowdefault | true | false
Quoted strings Fields that contain a special character (comma, newline, or double quote), must be enclosed in quotes (only single/double quote as a quote character is accepted). If true, such special characters inside the quoted string are not treated as delimiters and the quotes are removed. false (default) | true
Quote character Specifies which kind of quotes will be permitted in Quoted strings. both (default) | " | '
Advanced
Skip leading blanks specifies whether to skip leading whitespace (blanks e.g.) before setting input strings to data fields. If not explicitly set (i.e., having the default value), the value of Trim strings attribute is used. See Trimming Data.default | true | false
Skip trailing blanks specifies whether to skip trailing whitespace (blanks e.g.) before setting input strings to data fields. If not explicitly set (i.e., having the default value), the value of Trim strings attribute is used. See Trimming Data.default | true | false
Number of skipped records how many records/rows to be skipped from the source file(s); see Selecting Input Records.0 (default) - N
Max number of records how many records to be read from the source file(s) in turn; all records are read by default; See Selecting Input Records.1 - N
Number of skipped records per source how many records/rows to be skipped from each source file. By default, the value of Skip source rows record property in output port 0 metadata is used. In case the value in metadata differs from the value of this attribute, the Number of skipped records per source value is applied, having a higher priority. See Selecting Input Records.0 (default)- N
Max number of records per source how many records/rows to be read from each source file; all records from each file are read by default; See Selecting Input Records.1 - N
Max error count maximum number of tolerated error records in input file(s); applicable only if Controlled Data Policy is set0 (default) - N
Treat multiple delimiters as one If a field is delimited by a multiplied delimiter char, it will be interpreted as a single delimiter when setting to true. false (default) | true
Incremental file[ 1)]Name of the file storing the incremental key, including path. See Incremental Reading. 
Incremental key[ 1)]Variable storing the position of the last read record. See Incremental Reading. 
Verbose By default, less comprehensive error notification is provided and the performance is slightly higher. However, if switched to true, more detailed information with less performance is provided.false (default) | true
Parser By default, the most appropriate parser is applied. Besides, the parser for processing data may be set explicitly. If an improper one is set, an exception is thrown and the graph fails. See Data Parsersauto (default) | <other>
Deprecated
Skip first line By default, the first line is not skipped, if switched to true (if it contains a header), the first line is skipped.false (default) | true

[ 1)] Either both or neither of these attributes must be specified

Advanced Description

[Note]Note

Choosing org.jetel.data.parser.SimpleDataParser while using Quoted strings will cause the Quoted strings attribute to be ignored.

Tips & Tricks

General examples