Use COPY FROM S3 to Load Data
The COPY FROM S3 command allows you to load CSV files and Apache Parquet files from an Amazon S3 bucket to a Vertica table.
To copy data from the local client, see Use COPY FROM LOCAL to Load Data.
Prerequisites
For you to be able to use COPY FROM S3, it must be enabled on your Data Warehouse instance. Enabling COPY FROM S3 is done by GoodData Support.
Send a request to GoodData Support to enable COPY FROM S3 on your Data Warehouse instance. In the request, include the ID of your Data Warehouse instance and the names of the S3 buckets from which you want to copy data.
Limitations
COPY FROM S3 does not support an EXCEPTIONS clause. COPY FROM S3 stores rejected data and exceptions as tables. This means that you cannot specify EXCEPTIONS and REJECTED DATA files the same way you would do when using COPY FROM LOCAL. For example, the following command syntax would work for COPY FROM LOCAL but would fail for COPY FROM S3:
COPY users_table FROM LOCAL 'input.csv' EXCEPTIONS 'exceptions.log' REJECTED DATA 'rejected_data.log';
The rejected data table is created as LOCAL TEMPORARY. The table is dropped once the session is closed.
You cannot run multiple COPY commands within a single statement.
Use COPY FROM S3
Steps:
Set the credentials for your S3 bucket. The credentials must include the access key, secret key, and region.
SET SESSION AUTOCOMMIT TO off; ALTER SESSION SET AWSAuth='<access_key>:<secret_key>'; ALTER SESSION SET AWSRegion='<region>';
SET SESSION AUTOCOMMIT TO off; ALTER SESSION SET AWSAuth='<access_key>:<secret_key>', AWSRegion='<region>';
Disable auto-commit mode for the JDBC driver. To do so, set AUTO_COMMIT to ‘false’.
Run COPY FROM S3. For CSV files:
COPY <table_name> FROM 's3://<bucket>/<file>' [REJECTED DATA AS TABLE <table_name>];
For Parquet files:
COPY <table_name> FROM 's3://<bucket>/<file>' PARQUET [REJECTED DATA AS TABLE <table_name>];
Examples
Correct:
COPY users FROM 's3://user_data/users_all.csv';
COPY users FROM 's3://user_data/users_all.csv' REJECTED DATA AS TABLE users_rejected_table;
COPY users FROM 's3://user_data/users_all.parquet' PARQUET;
COPY users FROM 's3://user_data/users_all.parquet' PARQUET REJECTED DATA AS TABLE users_rejected_table;
Incorrect: Cannot use a file to store rejected data
COPY users FROM 's3://user_data/users_all.csv' REJECTED DATA 'users_rejected.csv';
Incorrect: Cannot use an EXCEPTIONS clause
COPY users FROM 's3://user_data/users_all.csv' EXCEPTIONS 'exceptions.log';