The File URL attribute may be defined using the URL File Dialog.
Important | |
---|---|
To ensure graph portability, forward slashes must be used when defining the path in URLs (even on Microsoft Windows). |
Here we present some examples of possible URL for Readers:
/path/filename.txt
Reads specified file.
/path1/filename1.txt;/path2/filename2.txt
Reads two specified files.
/path/filename?.txt
Reads all files satisfying the mask.
/path/*
Reads all files in specified directory.
zip:(/path/file.zip)
Reads the first file compressed in the
file.zip
file.
zip:(/path/file.zip)#innerfolder/filename.txt
Reads specified file compressed in the
file.zip
file.
gzip:(/path/file.gz)
Reads the first file compressed in the
file.gz
file.
tar:(/path/file.tar)#innerfolder/filename.txt
Reads specified file archived in the
file.tar
file.
zip:(/path/file??.zip)#innerfolder?/filename.*
Reads all files from the compressed zipped file(s) that
satisfy the specified mask. Wild cards (?
and
*
) may be used in the compressed file names,
inner folder and inner file names.
tar:(/path/file????.tar)#innerfolder??/filename*.txt
Reads all files from the archive file(s) that satisfy the
specified mask. Wild cards (?
and
*
) may be used in the compressed file names,
inner folder and inner file names.
gzip:(/path/file*.gz)
Reads all files each of them has been gzipped into the file that satisfy the specified mask. Wild cards may be used in the compressed file names.
tar:(gzip:/path/file.tar.gz)#innerfolder/filename.txt
Reads specified file compressed in the
file.tar.gz
file. Note that although CloudConnect can read data from
.tar
file, writing to .tar
files is not supported.
tar:(gzip:/path/file??.tar.gz)#innerfolder?/filename*.txt
Reads all files from the gzipped tar
archive file(s) that satisfy the specified mask. Wild cards
(?
and *
) may be used in
the compressed file names, inner folder and inner file
names.
zip:(zip:(/path/name?.zip)#innerfolder/file.zip)#innermostfolder?/filename*.txt
Reads all files satisfying the file mask from all paths
satisfying the path mask from all compressed files satisfying the
specified zip mask. Wild cards (?
and
*
) may be used in the outer compressed files,
innermost folder and innermost file names. They cannot be used in
the inner folder and inner zip file names.
ftp://username:password@server/path/filename.txt
Reads specified filename.txt
file on
remote server connected via ftp protocol using username and
password.
sftp://username:password@server/path/filename.txt
Reads specified filename.txt
file on
remote server connected via ftp protocol using username and
password.
http://server/path/filename.txt
Reads specified filename.txt
file on
remote server connected via http protocol.
https://server/path/filename.txt
Reads specified filename.txt
file on
remote server connected via https protocol.
zip:(ftp://username:password@server/path/file.zip)#innerfolder/filename.txt
Reads specified filename.txt
file
compressed in the file.zip
file on remote
server connected via ftp protocol using username and
password.
zip:(http://server/path/file.zip)#innerfolder/filename.txt
Reads specified filename.txt
file
compressed in the file.zip
file on remote
server connected via http protocol.
tar:(ftp://username:password@server/path/file.tar)#innerfolder/filename.txt
Reads specified filename.txt
file
archived in the file.tar
file on remote
server connected via ftp protocol using username and
password.
zip:(zip:(ftp://username:password@server/path/name.zip)#innerfolder/file.zip)#innermostfolder/filename.txt
Reads specified filename.txt
file
compressed in the file.zip
file that is also
compressed in the name.zip
file on remote
server connected via ftp protocol using username and
password.
gzip:(http://server/path/file.gz)
Reads the first file compressed in the
file.gz
file on remote server connected via
http protocol.
http://server/filename*.dat
Reads all files from WebDAV server which satisfy specified mask (only * is supported.)
http://access_key_id:secret_access_key@bucketname.s3.amazonaws.com/filename*.out
Reads all files which satisfy specified mask (only * is supported) from Amazon S3 web storage service from given bucket using access key ID and secret access key.
port:$0.FieldName:discrete
Data from each record field selected for input port reading are read as a single input file.
port:$0.FieldName:source
URL addresses, i.e., values of field selected for input port reading, are loaded in and parsed.
port:$0.FieldName:stream
Input port field values are concatenated and processed
as an input file(s); null
values are replaced by the
eof
.
-
Reads data from stdin
after start of the
graph. When you want to stop reading, press Ctrl+Z.
http:(direct:)//seznam.cz
Without proxy.
http:(proxy://user:password@212.93.193.82:443)//seznam.cz
Proxy setting for http protocol.
ftp:(proxy://user:password@proxyserver:1234)//seznam.cz
Proxy setting for ftp protocol.
sftp:(proxy://66.11.122.193:443)//user:password@server/path/file.dat
Proxy setting for sftp protocol.
dict:keyName:discrete
1)
Reads data from dictionary.
dict:keyName:source
1)
Reads data from dictionary in the same way like the
discrete
processing type, but expects that the
dictionary values are input file URLs. The data from this input
passes to the Reader.
Legend:
1): Reader finds out the type of source
value from the dictionary and creates readable channel for the parser.
Reader supports following type of sources:
InputStream
, byte[]
,
ReadableByteChannel
,
CharSequence
, CharSequence[]
,
List<CharSequence>
,
List<byte[]>
,
ByteArrayOutputStream
.
A sandbox resource, whether it is a shared, local or partitioned sandbox, is specified in the graph under the fileURL attributes as a so called sandbox URL like this:
sandbox://data/path/to/file/file.dat
where "data" is code for sandbox and "path/to/file/file.dat" is the path to the resource from the sandbox root. URL is evaluated by CloudConnect Server during graph execution and a component (reader or writer) obtains the opened stream from the server. This may be a stream to a local file or to some other remote resource. Thus, a graph does not have to run on the node which has local access to the resource. There may be more sandbox resources used in the graph and each of them may be on a different node. In such cases, CloudConnect Server would choose the node with the most local resources to minimalize remote streams.
The sandbox URL has a specific use for parallel data processing. When the sandbox URL with the resource in a partitioned sandbox is used, that part of the graph/phase runs in parallel, according to the node allocation specified by the list of partitioned sandbox locations. Thus, each worker has its own local sandbox resource. CloudConnect Server evaluates the sandbox URL on each worker and provides an open stream to a local resource to the component.