You are viewing our older product's guide. Click here for the documentation of GoodData Cloud, our latest and most advanced product.

Manifest File

A manifest file describes the source data files to download and confirms completeness and integrity of an upload batch. The manifest file describes which files belong to the batch, what entity they correspond to, the timestamp of export (this is important for incremental load) and can also contain hashes and the number of rows to ensure file integrity.

CSV Downloader processes only the source files that are referenced in the manifest file. A source data file that is not mentioned in the manifest file will not be processed. This way, you can specify only those source files in the manifest that you want to export, and CSV Downloader will process only those files. For example, you can export specific entities monthly while other entities should be exported daily.

When CSV Downloader reads the manifest file, it processes either all or none of the source files. It cannot process some source files from the manifest file and not process the rest.

Create a new manifest file for each load. Upload it as the last file after all the source files from a specific batch have been uploaded.

Though you can have multiple manifest files, we recommend that you have one manifest file per load batch. Data files within one manifest file are processed together. Having multiple manifest files may slow down performance.

Provide Your Own Manifest Files

The optional ‘generate_manifests’ parameter in the configuration file (see ‘Set optional parameters for your manifest file’ in CSV Downloader) specifies whether to generate a manifest file or to use the manifest file that you provide. By default, the ‘generate_manifests’ parameter is not set and defaults to ‘false’, which means that you have to provide your own manifest files.

When creating your own manifest files, make sure that their names and structure meet the requirements that are described in this section.

If you want CSV Downloader to generate the manifest files for you, set the ‘generate_manifests’ parameter to ‘true’ and see ‘Generate a manifest file’ in CSV Downloader.

File Name Format

The name of a manifest file defines the order in which the manifest files will be processed.

Use the Default Format

The default format of the manifest file name is the following:

manifest_{time(%s)}.csv

When resolved, the name of a manifest file may look like the following:

manifest_1468493700.csv

If you want to use the default name format for your manifest, you can use it right away without setting any additional parameters.

Customize the Format

If you want to customize the file name format, generate your format and set the ‘manifest’ parameter in the configuration file to your custom format (see ‘Set optional parameters for your manifest file’ in CSV Downloader).

You can use the following keywords in the file name:

sequence: Include ‘sequence’ into the file names to ensure that they are loaded in the correct order. If you use ‘sequence’, all manifest file names must contain a sequence number in the right order (1..2..3; CSV Downloader always expects the last sequence number +1) for a given manifest; otherwise, you will receive an error.
regex: If a file name has a changing part, use ‘regex’ to be able to process the files. For more information, see https://ruby-doc.org/core-2.1.1/Regexp.html.
time: If ‘sequence’ is not present, manifest files are sorted by time. ‘time’ can be set as timestamp ( {time(%s)} ) or any kind of the YYYYMMDD format (for example, {time(%Y-%m-%d-%H-%M-%S)} ). For more information about tags, see http://ruby-doc.org/core-2.2.0/Time.html#method-i-strftime.

Examples:

To get the following manifest file name:
```
manifest_1.20180217104924.csv
```
the file name format may look like the following:
```
manifest_{sequence}.{time(%Y%m%d%H%M%S)}.csv
```
To get the following manifest file name:
```
manifest-datafeed_pot_1.20160905140015.csv
```
the file name definition may look like the following (notice how ‘\’ is escaped with ‘\'):
```
{regex(manifest-datafeed_pot_\\d+)}.{time(%Y%m%d%H%M%S)}.csv
```

File Structure

The manifest file is a text file delimited with vertical bars ( | ).

The manifest file can have the following columns:

Name	Mandatory?	Description
file_url	yes	The path to the source data file Examples: Source files on S3: `s3://bucket/folder/account.1515628800.csv` Source files on SFTP, WebDAV, Google Cloud Storage, or One Drive (do not include the root directory in the path): `/folder/account.1515628800.csv`
timestamp	yes	The UNIX timestamp representing the time when the source data file was uploaded to storage
feed	yes	The name of the entity (table) to download data from The name must match the name of the entity in the feed file (see Feed File).
feed_version	no	The version in the feed file that the source data file is connected to The version must match the version of the entity in the feed file (see Feed File). NOTE: You can have only one version of the same entity in one manifest file.
num_rows	no	The number of rows in the source data file Use 'num_rows' to verify integrity of an upload batch. If you want CSV Downloader to skip the verification, put 'unknown' to this column.
md5	no	The MD5 checksum of the source data file If you want CSV Downloader to skip the MD5 check, put 'unknown' to this column.
export_type	no	Load mode used for loading the source data file to the database If not set or set to 'inc', incremental load is used. If set to 'full', full load is used. If set to 'delete', CSV Downloader deletes the data from ADS based on the primary key that is set by the 'hub' parameter in the configuration file for CSV Downloader. The source CSV file must contain the header with the table columns that generate the primary key. The source file may or may not contain other columns (the feed file is ignored in this case, and only the columns generating the primary key are verified). If the primary key contains more than one column (that is, the 'hub' parameter contains more than one column name), the column names in the header must be specified in the same order as they are specified in the 'hub' parameter. The names of the columns generating the primary key columns are case-sensitive. The column names in the header must be specified in the same case as they are specified in the 'hub' parameter.
target_predicate	no	The field used for partial full load (see 'Set partial full load mode for the entities' in CSV Downloader) NOTE: You can also use this field as a reference field for defining the partition of the source tables (see the 'drop_source_partition' parameter in ADS Integrator).

File Example

file_url|timestamp|feed|feed_version|num_rows|md5|export_type
s3://bucket/folder/account.1515628800.csv.gz|1515628800|Account|1.0|3|366513286293c4b369bc7fafca23ddde|inc
s3://bucket/folder/user.1515628800.txt.gz|1515628800|User|1.0|0|unknown|full
s3://bucket/folder/product.1515628800.txt.gz|1515628800|Product|1.0|0|unknown|full
s3://bucket/folder/facts.1.1515628800.txt.gz|1515628800|Facts|1.2|15444|5d0a290ca7fc8d4dc7dd9cdd0dd15f96|inc
s3://bucket/folder/facts.2.1515628800.txt.gz|1515628800|Facts|1.2|52755|ba63d9912e49fa4f4b2e0797d3fcfa41|inc

Input Data for CSV Downloader

Feed File