Date Dimension Loader

Date Dimension Loader is a utility that supports the data preparation and distribution pipeline (see Data Preparation and Distribution Pipeline). Date Dimension Loader loads date dimension data to date datasets in one or more workspaces based on the criteria that you have defined (where the date dimension data is stored and what workspaces to load it to).

For more information about date dimensions, see Manage Custom Date Dimensions.

How Date Dimension Loader Works

When Data Dimension Loader runs, it reads date dimension data and loads it to the workspaces according to the criteria that you have defined in queries:

  • The date dimension data to read
  • The location where the date dimension data is stored (a data warehouse or an S3 bucket)
  • The workspaces to load the date dimensions to

Configuration File

Date Dimension Loader does not require any parameters in the configuration file.

Schedule Parameters

When scheduling Date Dimension Loader (see Phases of Building the Data Pipeline -> Production Implementation), provide the parameters from this section in the schedule.

General Parameters

Some parameters must be entered as secure parameters (see Configure Schedule Parameters).

NameTypeMandatory?Secure?DefaultDescription
gd_encoded_paramsJSONyesnon/aThe parameters coding the queries and date dimension data
process_modestringnonoclient_id

The identifier of the workspaces to load the date dimensions to

Possible values:

  • client_id: The workspaces are identified by the client IDs (see Use Automated Data Distribution). In this case, you must also enter the domain parameter (see further in this table).
  • project_id: The workspaces are identified by the workspace IDs (see Find the Workspace ID).

The value of this parameter defines what columns you provide in your queries.

domainstringsee "Description"non/a

The name of the domain where the workspaces belong to

This parameter is mandatory only when process_mode is set to client_id. Otherwise, do not use it.

columns_to_deletestringnonoidentifier

The columns that will be deleted from the table with date dimension data before the data is loaded to the date datasets

If you want to provide more than one column name, separate them with a comma.

If this parameter is not specified, only the identifier column is deleted (see Date Data Dimension Structure). Explicitly specify this parameter if the table with the date dimension data contains extra columns in addition to the identifier column and standard date-related columns (for example, a column with notes).

Example: identifier,description

Data Warehouse-Specific Parameters

If your date dimension data is stored in the same data warehouse as the queries (such as Snowflake, Redshift, or BigQuery), provide the following parameters as schedule parameters (see Configure Schedule Parameters) to point to the table in the data warehouse where the date dimension data is stored.

Some parameters must be entered as secure parameters (see Configure Schedule Parameters).

NameTypeMandatory?Secure?DefaultDescription
dd_table_namestringyesnon/aThe name of the table in the data warehouse that holds the date dimension data
dd_filter_columnsstringnonon/a

The columns in the table that holds the date dimension data (see the dd_table_name parameter earlier in this table) by which the data in the table should be filtered

If you want to provide more than one column name, separate them with a comma.

Example: identifier

dd_order_columnstringnonodate.day.yyyy_mm_ddThe columns in the table that holds the date dimension data (see the dd_table_name parameter earlier in this table) by which the data in the table should be sorted

Queries

The queries define the criteria for processing the date dimension data: where to read the date dimension data from and what workspaces to load it to.

Query Structure

You can specify the queries as a CSV file or a database table with the following columns:

NameTypeMandatory?DefaultDescription
dd_idstringyesn/aThe title of the date dataset in the workspace where the date dimension data will be loaded to
data_product_idstringsee "Description"n/a

(Only when process_mode is set to client_id; otherwise, use project_id - see further in this table) Use either data_product_id or a combination of data_product_id and client_id to specify the workspaces to load the date dimensions to.

  • To load the date dimensions to all the workspaces within a data product, specify only data_product_id and do not specify client_id (leave the client_id column empty).
  • To load the date dimensions to a specific workspace, specify both client_id of the workspace and data_product_id of the data product that this workspace belongs to.

NOTE: If you use data_product_id and client_id to specify the workspaces, do not use project_id.

client_idstringsee "Description"n/a
project_idstringsee "Description"n/a

(Only when process_mode is set to project_id; otherwise, use data_product_id and client_id - see earlier in this table) The workspace ID of the workspace where the date dimensions should be loaded to

NOTE: If you use project_id to specify the workspaces, do not use use data_product_id and client_id.

identifierstringsee "Description"n/a

(Only when the date dimension data is stored in a table in a data warehouse) The identifier of the date dataset where the date dimension data will be loaded to

NOTE: Specify the identifier column only when the date dimension data is stored in a table in a data warehouse. If the date dimension data is stored as a CSV file in an S3 bucket, specify the file column instead (see further in this table).

Do not specify both identifier and file.

filestringsee "Description"n/a

(Only when the date dimension data is stored as a CSV file in an S3 bucket) The name of the CSV file in the S3 bucket that holds data to load to the date dataset

One CSV file must contain data for one date dataset.

NOTE: Specify the file column only when the date dimension data is stored as a CSV file in an S3 bucket. If the date dimension data is stored in a table in a data warehouse, specify the identifier column instead (see earlier in this table).

Do not specify both identifier and file.

x__timestamptimestampnon/a

(For incremental processing only) The timestamp when the last successful incremental load completed

When Date Dimension Loader runs successfully, it stores to the workspace's metadata the greatest value in the x__timestamp column of the query table. Next time Date Dimension Loader retrieves the last greatest value from the workspace's metadata, and only those records whose value of the x__timestamp column is greater than the timestamp stored in the workspace's metadata are taken from the query table and are loaded to the workspace.

The incremental processing works similarly to the incremental load mode with the x__timestamp column (see Naming Convention for Output Stage Objects).

 

Here is an example of the query that points to date dimension data stored in a table in a data warehouse (notice the identifier column) and use the combination of data product IDs and client IDs to specify the workspaces to load the date dimensions to (notice the data_product_id and client_id columns):

dd_iddata_product_idclient_ididentifierx__timestamp
createdate.dataset.dtdefaultp3489createdate2021-11-03 10:22:34
expiredate.dataset.dtdefaulta9800expiredate2021-12-07 10:22:34
updatedate.dataset.dttesting updatedate2021-12-07 10:22:34

 

Here is an example of the query that points to date dimension data stored as CSV files in an S3 bucket (notice the file column) and use workspace IDs to specify the workspaces to load the date dimensions to (notice the project_id column):

dd_idproject_idfilex__timestamp
createdate.dataset.dtop8swkuuh6ccmh9paaynje4zdfu39u63createdate_data.csv2021-11-03 10:22:34
expiredate.dataset.dtop8swkuuh6ccmh9paaynje4zdfu39u63expiredate_data.csv2021-12-07 10:22:34
updatedate.dataset.dtebyg33vqnls7kp13f1u0nkov2pxf8dbtupdatedate_data.csv2021-12-07 10:22:34

Query Source

Date Dimension Loader can read the queries from any supported data warehouse or an S3 bucket (see Types of Input Data Sources).

  • If you want to store the queries in an S3 bucket, specify the queries in a CSV file.
  • If you want to store the queries in a data warehouse, specify the queries as an SQL query to this data warehouse.

Example: The queries stored in an S3 bucket

{
  "s3_client": {
    "bucket": "date_dimensions",
    "accessKey": "123456789",
    "secretKey": "secret",
    "region": "us-east-1",
    "serverSideEncryption": "true"
  },
  "input_source": {
    "type": "s3",
    "file": "dd_query.csv”,
  }
}

Example: The queries stored in ADS

{
  "ads_client": {
    "username": "john.doe@address.com",
    "password": "secret",
    "ads_id": "rtmmgjsqc4zmf64egtu6l6xv2xhxempi"
  },
  "input_source": {
    "type": "ads",
    "query": "SELECT dd_id, data_product_id, client_id, identifier FROM dd_query_table"
  }
}

Date Dimension Data

The date dimension data contains date-related calendar data. For more information about the date dimensions and their structure, see Manage Custom Date Dimensions.

Date Dimension Data Structure

You can specify the date dimension data as a CSV file or a database table. This file or table must have the mandatory date-related columns (for more information, see “Prepare a CSV File with Calendar Requirements” in Manage Custom Date Dimensions) and one additional column, identifier, that contains the identifier of the date dataset in the workspace where the data will be loaded to. The records with the same identifier in the identifier column are treated as belonging to the same date dataset.

The file or table can have more columns, if needed (for example, a column with notes).

 

Here is an example of the date dimension data:

identifierdate.day.yyyy_mm_dddate.day.uk.dd_mm_yyyydate.day.us.mm_dd_yyyydate.day.eu.dd_mm_yyyy<other date related columns>
createdate1900-01-0101/01/1900

01/01/1900

01-01-1900

...
createdate1900-01-0202/01/1900

01/02/1900

02-01-1900

...
createdate

1900-01-03

03/01/1900

01/03/1900

03-01-1900...
expiredate1900-01-0101/01/1900

01/01/1900

01-01-1900

...
expiredate1900-01-0202/01/1900

01/02/1900

02-01-1900

...
expiredate

1900-01-03

03/01/1900

01/03/1900

03-01-1900...
updatedate1900-01-0101/01/1900

01/01/1900

01-01-1900

...
updatedate1900-01-0202/01/1900

01/02/1900

02-01-1900

...
updatedate

1900-01-03

03/01/1900

01/03/1900

03-01-1900...

Date Dimension Data Source

Date Dimension Loader can read the date dimension data from any supported data warehouse or an S3 bucket (see Types of Input Data Sources).

  • If you want to store the date dimension data in an S3 bucket, specify the date dimension data in a CSV file.
  • If you want to store the date dimension data in a data warehouse, specify the date dimension data as an SQL query to this data warehouse.

How you specify the source of the date dimension data depends on the following:

  • Whether the queries and the data dimension data are stored in the same source or two different sources
  • Whether the source is a data warehouse or an object storage system

 

The following are possible scenarios of how you can set up the source of the date dimension data:

  • The data dimension data and the queries are stored in the [same data warehouse]{.ul}.  In this case, the source of the date dimension data is a table in the same data warehouse where the queries are stored. To point to this table, use the data warehouse-specific parameters in the Date Dimension Loader schedule. Date Dimension Loader will obtain the other necessary information about the data warehouse from the query source.

    • Example: The data dimension data and the queries are stored in ADS The source of the queries is specified by the following:

      {
        "ads_client": {
          "username": "john.doe@address.com",
          "password": "secret",
          "ads_id": "rtmmgjsqc4zmf64egtu6l6xv2xhxempi"
        },
        "input_source": {
          "type": "ads",
          "query": "SELECT dd_id, data_product_id, client_id, identifier FROM dd_query_table"
        }
      }
      

      The source of the date dimension data is specified by the following parameters in the schedule (see also Schedule Examples): 

    • Example: The data dimension data and the queries are stored in Snowflake The source of the queries is specified by the following:

      {
        "snowflake_client": {
          "connection": {
            "url": "jdbc:snowflake://acme.snowflakecomputing.com",
            "authentication": {
              "basic": {
                "userName": "GOODDATA_INTEGRATION",
                "password": "secret"
              }
            },
            "database": "ACME",
            "schema": "ANALYTICS",
            "warehouse": "GOODDATA_INTEGRATION"
          }
        },
        "input_source": {
          "type": "snowflake",
          "query": "SELECT dd_id, data_product_id, client_id, identifier FROM dd_query_table"
        }
      }
      

      The source of the date dimension data is specified by the following parameters in the schedule (see also Schedule Examples): 

  • The data dimension data and the queries are stored in the [same S3 bucket]{.ul}.  In this case, the source of the date dimension data is a folder with one or more CSV files in the same S3 bucket where the queries are stored. To point to this folder, add the folder parameter under input_source in the query source, and set it to the path to the folder with the CSV files. Date Dimension Loader will obtain the other necessary information about the S3 bucket from the query source.

    {
      "s3_client": {
        "bucket": "date_dimensions",
        "accessKey": "123456789",
        "secretKey": "${s3_secret_key}",
        "region": "us-east-1",
        "serverSideEncryption": "true"
      },
      "input_source": {
        "type": "s3",
        "file": "input_dd_data/date_dimension_Params.csv”,
        "folder": "input_dd_data"
      }
    }
    
  • The data dimension data is stored in an [S3 bucket]{.ul}, and the queries are stored in a [data warehouse]{.ul}.  In this case, the date dimension data is stored in a folder with one or more CSV files in an S3 bucket, and the queries are stored in a table in a data warehouse. To point to the folder where the date dimension data is stored, add the input data source for the S3 bucket (see Types of Input Data Sources). In this source, replace input_source with the data_input_source, then replace the file parameter under data_input_source with the folder parameter, and set it to the path to the folder with the CSV files. 

    • Example: The data dimension data is stored in an S3 bucket, and the queries are stored in ADS

      {
        "ads_client": {
          "username": "john.doe@address.com",
          "password": "secret",
          "ads_id": "rtmmgjsqc4zmf64egtu6l6xv2xhxempi"
        },
        "input_source": {
          "type": "ads",
          "query": "SELECT dd_id, data_product_id, client_id, file FROM dd_query_table"
        },
        "s3_client": {
          "bucket": "date_dimensions",
          "accessKey": "123456789",
          "secretKey": "secret",
          "region": "us-east-1",
          "serverSideEncryption": "true"
        },
        "data_input_source": {
          "type": "s3",
          "folder": "/input_dd_data/"
        }
      }
      
    • Example: The data dimension data is stored in an S3 bucket, and the queries are stored in Snowflake

      {
        "snowflake_client": {
          "connection": {
            "url": "jdbc:snowflake://acme.snowflakecomputing.com",
            "authentication": {
              "basic": {
                "userName": "GOODDATA_INTEGRATION",
                "password": "secret"
              }
            },
            "database": "ACME",
            "schema": "ANALYTICS",
            "warehouse": "GOODDATA_INTEGRATION"
          }
        },
        "input_source": {
          "type": "snowflake",
          "query": "SELECT dd_id, data_product_id, client_id, file FROM dd_query_table"
        },
        "s3_client": {
          "bucket": "date_dimensions",
          "accessKey": "123456789",
          "secretKey": "secret",
          "region": "us-east-1",
          "serverSideEncryption": "true"
        },
        "data_input_source": {
          "type": "s3",
          "folder": "/input_dd_data/"
        }
      }
      

Queries and Data Dimension Data as Schedule Parameters

When providing the queries and date dimension data as schedule parameters, you have to code them using the gd_encoded_params parameter (see General Parameters).

To code the queries and date dimension data, follow the instructions in Specifying Complex Parameters. Once done, you should have the gd_encoded_params parameter coding the queries and reference parameters encoding sensitive information that you will be adding as secure parameters.

Example: The data dimension data is stored in an S3 bucket (sensitive information present: secret key), and the queries are stored in ADS (sensitive information present: ADS password) to be coded by the gd_encoded_params and reference parameters encoding the secret key and ADS password

The following will become the value of the gd_encoded_params parameter:

{
  "ads_client": {
    "username": "john.doe@address.com",
    "password": "${ads_client_password}",
    "ads_id": "rtmmgjsqc4zmf64egtu6l6xv2xhxempi"
  },
  "input_source": {
    "type": "ads",
    "query": "SELECT dd_id, data_product_id, client_id, file FROM dd_query_table"
  },
  "s3_client": {
    "bucket": "date_dimensions",
    "accessKey": "123456789",
    "secretKey": "${s3_client_secret_key}",
    "region": "us-east-1",
    "serverSideEncryption": "true"
  },
  "data_input_source": {
    "type": "s3",
    "folder": "/input_dd_data/"
  }
}

In addition, you have to add the reference parameters ads_client_password and s3_client_secret_key as secure parameters.

Schedule Examples

Example 1: The data dimension data is stored in an S3 bucket, and the queries are stored in ADS.

 

Example 2: The data dimension data and the queries are stored in Snowflake.