Supported File URL Formats for Writers

The File URL attribute may be defined using the URL File Dialog.

The URL shown below can also contain placeholders - dollar sign or hash sign.

[Important]Important

You need to differentiate between dollar sign and hash sign usage.

  • Dollar sign should be used when each of multiple output files contains only a specified number of records based on the Records per file attribute.

  • Hash sign should be used when each of multiple output files only contains records correspoding to the value of specified Partition key.

[Note]Note

Hash signs in URL examples in this section serve to separate a compressed file (zip, gz) from its contents. These are not placeholders!

[Important]Important

To ensure graph portability, forward slashes must be used when defining the path in URLs (even on Microsoft Windows).

Here we present some examples of possible URL for Writers:

Writing to Local Files

[Note]Note

Although CloudConnect can read data from a .tar file, writing to a .tar file is not supported.

Writing to Remote Files

Writing to Output Port

Writing to Console

Using Proxy in Writers

Writing to Dictionary

Legend:

1): The discrete processing type uses byte array for storing data.

2): The stream processing type uses an output stream that must be created before running a graph (from Java code).

Sandbox Resource as Data Source

A sandbox resource, whether it is a shared, local or partitioned sandbox, is specified in the graph under the fileURL attributes as a so called sandbox URL like this:

sandbox://data/path/to/file/file.dat

where "data" is code for sandbox and "path/to/file/file.dat" is the path to the resource from the sandbox root. URL is evaluated by CloudConnect Server during graph execution and a component (reader or writer) obtains the opened stream from the server. This may be a stream to a local file or to some other remote resource. Thus, a graph does not have to run on the node which has local access to the resource. There may be more sandbox resources used in the graph and each of them may be on a different node. In such cases, CloudConnect Server would choose the node with the most local resources to minimalize remote streams.

The sandbox URL has a specific use for parallel data processing. When the sandbox URL with the resource in a partitioned sandbox is used, that part of the graph/phase runs in parallel, according to the node allocation specified by the list of partitioned sandbox locations. Thus, each worker has its own local sandbox resource. CloudConnect Server evaluates the sandbox URL on each worker and provides an open stream to a local resource to the component.