Use COPY FROM S3 to Load Data

The COPY FROM S3 command allows you to load CSV files and Apache Parquet files from an Amazon S3 bucket to a Vertica table.

Prerequisites

For you to be able to use COPY FROM S3, it must be enabled on your Data Warehouse instance. Enabling COPY FROM S3 is done by GoodData Support.

Send a request to GoodData Support to enable COPY FROM S3 on your Data Warehouse instance. In the request, include the ID of your Data Warehouse instance and the names of the S3 buckets from which you want to copy data.

Limitations

  • COPY FROM S3 does not support an EXCEPTIONS clause. COPY FROM S3 stores rejected data and exceptions as tables. This means that you cannot specify EXCEPTIONS and REJECTED DATA files the same way you would do when using COPY FROM LOCAL. For example, the following command syntax would work for COPY FROM LOCAL but would fail for COPY FROM S3:

    COPY users_table FROM LOCAL 'input.csv' EXCEPTIONS 'exceptions.log' REJECTED DATA 'rejected_data.log';
    

    The rejected data table is created as LOCAL TEMPORARY. The table is dropped once the session is closed.

  • You cannot run multiple COPY commands within a single statement.

Use COPY FROM S3

Steps:

  1. Set the credentials for your S3 bucket. The credentials must include the access key, secret key, and region.

    SET SESSION AUTOCOMMIT TO off;
    ALTER SESSION SET AWSAuth='<access_key>:<secret_key>';
    ALTER SESSION SET AWSRegion='<region>';
    
    SET SESSION AUTOCOMMIT TO off;
    ALTER SESSION SET AWSAuth='<access_key>:<secret_key>', AWSRegion='<region>';
    
  2. Disable auto-commit mode for the JDBC driver. To do so, set AUTO_COMMIT to ‘false’.

  3. Run COPY FROM S3. For CSV files:

    COPY <table_name> FROM 's3://<bucket>/<file>' [REJECTED DATA AS TABLE <table_name>];
    

    For Parquet files:

    COPY <table_name> FROM 's3://<bucket>/<file>' PARQUET [REJECTED DATA AS TABLE <table_name>];
    

Examples

Correct:

COPY users FROM 's3://user_data/users_all.csv';
COPY users FROM 's3://user_data/users_all.csv' REJECTED DATA AS TABLE users_rejected_table;
COPY users FROM 's3://user_data/users_all.parquet' PARQUET;
COPY users FROM 's3://user_data/users_all.parquet' PARQUET REJECTED DATA AS TABLE users_rejected_table;

 

Incorrect: Cannot use a file to store rejected data

COPY users FROM 's3://user_data/users_all.csv' REJECTED DATA 'users_rejected.csv';

 

Incorrect: Cannot use an EXCEPTIONS clause

COPY users FROM 's3://user_data/users_all.csv' EXCEPTIONS 'exceptions.log';