Three components allow you to part the incoming data flow and distribute the records among different output files. These components are the following: CSVWriter, XLSDataWriter and StructuredDataWriter.
If you want to part the data flow and write the records into different output files depending on a key value, you must specify the key for such a partition (Partition key). It has the form of a sequence of incoming record field names separated by semicolon.
In addition to this, you can also select only some incoming records. This can be done by using a lookup table (Partition lookup table). The records whose Partition key equals the values of lookup table key are saved to the specified output files, those whose key values do not equal to lookup table key values are either saved to the file specified in the Partition unassigned file name attribute or discarded (if no Partition unassigned file name is specified).
Remember that if all incoming records are assigned to the values of lookup table, the file for unassigned records will be empty (even if it is defined).
Such lookup table will also serve to group together selected data records into different output files and give them the names. The Partition output fields attribute must be specified. It is a sequence of lookup table fields separated by semicolon.
The File URL value will only serve as the base name for the output file names. Such base name is concatenated with distinguishing names or numbers. If some partitioning is specified (if Partition key is defined), hash signs can be used in File URL as placeholder(s) for distinguishing names or numbers. These hash signs must only be used in the file name part of File URL.
Important | |
---|---|
You need to differentiate between hash sign and dollar sign usage.
|
The hash(es) can be located in any place of this file part of
File URL, even in its middle. For example:
path/output#.xls
(in case of the output XLS
file). If no hash is contained in File URL,
distinguishing names or numbers will be appended to the end of the
file base name.
If Partition file tag is set to
Number file tag
, output files are numbered and the
count of hashes used in File URL means the count
of digits for these distinguishing numbers. This is the default value
of Partition file tag. Thus,
###
can go from 000
to
999
.
If Partition file tag is set to
Key file tag
, single hash must be used in
File URL.at most. Distinguishing names are
used.
These distinguishing names will be created as follows:
If the Partition key attribute (or the
Partition output fields attribute) is of the
following form: field1;field2;...;fieldN
and the
values of these fields are the following:
valueofthefield1
,
valueofthefield2
, ...,
valueofthefieldN
, all the values of the fields are
converted to strings and concatenated. The resulting strings will have
the following form:
valueofthefield1valueofthefield2...valueofthefieldN
.
Such resulting strings are used as distinguishing names and each of
them is inserted to the File URL into the place
marked with hash. Or appended to the end of File
URL if no hash is used in File
URL.
For example, if firstname;lastname
is the
Partition key (or Partition output
fields), you can have the output files as follows:
path/outjohnsmith.xls
,
path/outmarksmith.xls
,
path/outmichaelgordon.xls
, etc. (if
File URL is
path/out#.xls
and Partition file
tag is set to Key file tag
).
Or path/out01.xls
,
path/out02.xls
. etc. (if File
URL is path/out##.xls
and
Partition file tag is set to Number
file tag
).
In XLSDataWriter and CSVWriter, there is another attribute: Exclude fields.
It is a sequence of field names separated by semicolon that should not be written to the output. It can be used when the same fields serve as a part of Partition key.
If you are partitioning data using any of these two writers and Partition file tag is set to Key file tag, values of Partition key are written to the names of these files. At the same time, the same values should be written to corresponding output file.
In order to avoid the files whose names would contain the same values as those written in them, you can select the fields that will be excluded when writing to these files. You need to choose the Exclude fields attribute.
These fields will only be part of file or sheet names, but will not be written to the contents of these files.
Subsequently, when you will read these files, you will be able to use an autofilling function (source_name
for CloudConnectReader or XLSDataReader,
or sheet_name
for XLSDataReader)
to get such value from either file name or sheet name
(when you have previously set Sheet name to $<field name>
).
In other words, when you have files created using Partition key set to City
and the output files are London.txt
, Stockholm.txt
, etc., you can get these values
(London
, Stockholm
, etc.) from these names. The City
field values do not need to be contained in these files.
Note | |
---|---|
If you want to use the value of a field as the path to an existing file, type the following as the File URL attribute in Writer:
This way, if the value of the field used for partitioning is |