Manifest File

A manifest file describes the source data files to download and confirms completeness and integrity of an upload batch. The manifest file describes which files belong to the batch, what entity they correspond to, the timestamp of export (this is important for incremental load) and can also contain hashes and the number of rows to ensure file integrity.

CSV Downloader processes only the source files that are referenced in the manifest file. A source data file that is not mentioned in the manifest file will not be processed. This way, you can specify only those source files in the manifest that you want to export, and CSV Downloader will process only those files. For example, you can export specific entities monthly while other entities should be exported daily.

When CSV Downloader reads the manifest file, it processes either all or none of the source files. It cannot process some source files from the manifest file and not process the rest.

Create a new manifest file for each load. Upload it as the last file after all the source files from a specific batch have been uploaded.

Though you can have multiple manifest files, we recommend that you have one manifest file per load batch. Data files within one manifest file are processed together. Having multiple manifest files may slow down performance.

Contents:

Provide Your Own Manifest Files

The optional 'generate_manifests' parameter in the configuration file (see 'Set optional parameters for your manifest file' in CSV Downloader) specifies whether to generate a manifest file or to use the manifest file that you provide. By default, the 'generate_manifests' parameter is not set and defaults to 'false', which means that you have to provide your own manifest files.

When creating your own manifest files, make sure that their names and structure meet the requirements that are described in this section.

If you want CSV Downloader to generate the manifest files for you, set the 'generate_manifests' parameter to 'true' and see 'Generate a manifest file' in CSV Downloader.

File Name Format

The name of a manifest file defines the order in which the manifest files will be processed.

Use the Default Format

The default format of the manifest file name is the following:

manifest_{time(%s)}.csv

When resolved, the name of a manifest file may look like the following:

manifest_1468493700.csv

If you want to use the default name format for your manifest, you can use it right away without setting any additional parameters.

Customize the Format

If you want to customize the file name format, generate your format and set the 'manifest' parameter in the configuration file to your custom format (see 'Set optional parameters for your manifest file' in CSV Downloader).

You can use the following keywords in the file name:

  • sequence: Include 'sequence' into the file names to ensure that they are loaded in the correct order. If you use 'sequence', all manifest file names must contain a sequence number in the right order (1..2..3; CSV Downloader always expects the last sequence number +1) for a given manifest; otherwise, you will receive an error.
  • regex: If a file name has a changing part, use 'regex' to be able to process the files. For more information, see https://ruby-doc.org/core-2.1.1/Regexp.html.
  • time: If 'sequence' is not present, manifest files are sorted by time. 'time' can be set as timestamp ( {time(%s)} ) or any kind of the YYYYMMDD format (for example, {time(%Y-%m-%d-%H-%M-%S)} ). For more information about tags, see http://ruby-doc.org/core-2.2.0/Time.html#method-i-strftime.

Examples:

  • To get the following manifest file name:

    manifest_1.20180217104924.csv

    the file name format may look like the following:

    manifest_{sequence}.{time(%Y%m%d%H%M%S)}.csv
  • To get the following manifest file name:

    manifest-datafeed_pot_1.20160905140015.csv

    the file name definition may look like the following (notice how '\' is escaped with '\'):

    {regex(manifest-datafeed_pot_\\d+)}.{time(%Y%m%d%H%M%S)}.csv

File Structure

The manifest file is a text file delimited with vertical bars ( | ).

The manifest file can have the following columns:

NameMandatory?Description
file_urlyes

The path to the source data file

Examples:

  • Source files on S3:
    s3://bucket/folder/account.1515628800.csv
  • Source files on SFTP, WebDAV, Google Cloud Storage, or One Drive (do not include the root directory in the path):
    /folder/account.1515628800.csv
timestampyesThe UNIX timestamp representing the time when the source data file was uploaded to storage
feedyes

The name of the entity (table) to download data from
The name must match the name of the entity in the feed file (see Feed File).

feed_versionnoThe version in the feed file that the source data file is connected to
The version must match the version of the entity in the feed file (see Feed File).
NOTE: You can have only one version of the same entity in one manifest file.
num_rowsnoThe number of rows in the source data file
Use 'num_rows' to verify integrity of an upload batch.
If you want CSV Downloader to skip the verification, put 'unknown' to this column.
md5no

The MD5 checksum of the source data file
If you want CSV Downloader to skip the MD5 check, put 'unknown' to this column.

export_typeno

Load mode used for loading the source data file to the database

  • If not set or set to 'inc', incremental load is used.
  • If set to 'full', full load is used.
  • If set to 'delete', CSV Downloader deletes the data from ADS based on the defined primary key (see the 'hub' parameter in CSV Downloader).
    NOTE: The source CSV file must contain the header with the table columns that generate a primary key, and may or may not contain other columns. The names of the primary key columns are case-sensitive.
target_predicatenoThe field used for partial full load (see 'Set partial full load mode for the entities' in CSV Downloader)
NOTE: You can also use this field as a reference field for defining the partition of the source tables (see the 'drop_source_partition' parameter in ADS Integrator).

File Example

file_url|timestamp|feed|feed_version|num_rows|md5|export_type
s3://bucket/folder/account.1515628800.csv.gz|1515628800|Account|1.0|3|366513286293c4b369bc7fafca23ddde|inc
s3://bucket/folder/user.1515628800.txt.gz|1515628800|User|1.0|0|unknown|full
s3://bucket/folder/product.1515628800.txt.gz|1515628800|Product|1.0|0|unknown|full
s3://bucket/folder/facts.1.1515628800.txt.gz|1515628800|Facts|1.2|15444|5d0a290ca7fc8d4dc7dd9cdd0dd15f96|inc
s3://bucket/folder/facts.2.1515628800.txt.gz|1515628800|Facts|1.2|52755|ba63d9912e49fa4f4b2e0797d3fcfa41|inc