Partitioning Output into Different Output Files

Three components allow you to part the incoming data flow and distribute the records among different output files. These components are the following: CSVWriter, XLSDataWriter and StructuredDataWriter.

If you want to part the data flow and write the records into different output files depending on a key value, you must specify the key for such a partition (Partition key). It has the form of a sequence of incoming record field names separated by semicolon.

In addition to this, you can also select only some incoming records. This can be done by using a lookup table (Partition lookup table). The records whose Partition key equals the values of lookup table key are saved to the specified output files, those whose key values do not equal to lookup table key values are either saved to the file specified in the Partition unassigned file name attribute or discarded (if no Partition unassigned file name is specified).

Remember that if all incoming records are assigned to the values of lookup table, the file for unassigned records will be empty (even if it is defined).

Such lookup table will also serve to group together selected data records into different output files and give them the names. The Partition output fields attribute must be specified. It is a sequence of lookup table fields separated by semicolon.

The File URL value will only serve as the base name for the output file names. Such base name is concatenated with distinguishing names or numbers. If some partitioning is specified (if Partition key is defined), hash signs can be used in File URL as placeholder(s) for distinguishing names or numbers. These hash signs must only be used in the file name part of File URL.

[Important]Important

You need to differentiate between hash sign and dollar sign usage.

  • Hash sign

    Hash sigh should be used when each of multiple output files only contains records correspoding to the value of specified Partition key.

  • Dollar sign

    Dollar sigh should be used when each of multiple output files contains only a specified number of records based on the Records per file attribute.

The hash(es) can be located in any place of this file part of File URL, even in its middle. For example: path/output#.xls (in case of the output XLS file). If no hash is contained in File URL, distinguishing names or numbers will be appended to the end of the file base name.

If Partition file tag is set to Number file tag, output files are numbered and the count of hashes used in File URL means the count of digits for these distinguishing numbers. This is the default value of Partition file tag. Thus, ### can go from 000 to 999.

If Partition file tag is set to Key file tag, single hash must be used in File URL.at most. Distinguishing names are used.

These distinguishing names will be created as follows:

If the Partition key attribute (or the Partition output fields attribute) is of the following form: field1;field2;...;fieldN and the values of these fields are the following: valueofthefield1, valueofthefield2, ..., valueofthefieldN, all the values of the fields are converted to strings and concatenated. The resulting strings will have the following form: valueofthefield1valueofthefield2...valueofthefieldN. Such resulting strings are used as distinguishing names and each of them is inserted to the File URL into the place marked with hash. Or appended to the end of File URL if no hash is used in File URL.

For example, if firstname;lastname is the Partition key (or Partition output fields), you can have the output files as follows:

In XLSDataWriter and CSVWriter, there is another attribute: Exclude fields.

It is a sequence of field names separated by semicolon that should not be written to the output. It can be used when the same fields serve as a part of Partition key.

If you are partitioning data using any of these two writers and Partition file tag is set to Key file tag, values of Partition key are written to the names of these files. At the same time, the same values should be written to corresponding output file.

In order to avoid the files whose names would contain the same values as those written in them, you can select the fields that will be excluded when writing to these files. You need to choose the Exclude fields attribute.

These fields will only be part of file or sheet names, but will not be written to the contents of these files.

Subsequently, when you will read these files, you will be able to use an autofilling function (source_name for CloudConnectReader or XLSDataReader, or sheet_name for XLSDataReader) to get such value from either file name or sheet name (when you have previously set Sheet name to $<field name>).

In other words, when you have files created using Partition key set to City and the output files are London.txt, Stockholm.txt, etc., you can get these values (London, Stockholm, etc.) from these names. The City field values do not need to be contained in these files.

[Note]Note

If you want to use the value of a field as the path to an existing file, type the following as the File URL attribute in Writer:

//#

This way, if the value of the field used for partitioning is path/to/my/file/filename.txt, it will be assigned to the output file as its name. For this reason, the output file will be located in path/to/my/file and its name will be filename.txt.