When you need to load data to a workspace with a specific logical data model, you have to map all datasets in the workspace’s logical data model to the objects in the output stage.
The names of the output stage objects must follow the specific naming convention at the dataset level and the LDM field level.
You can use the SQL diff to generate SQL commands for creating or updating corresponding output stage objects. The generated SQL commands are compliant with the output stage object naming convention.
Dataset Level Mapping
If the LDM element ID of a dataset is the following:
then Automated Data Distribution (ADD) expects the following name of the source database object for this dataset in the output stage:
<output_stage_prefix><dataset_name> - for the data load table case
<output_stage_prefix>_deleted_<dataset_name> - for the delete side table case (only if you are using the data deletion, see Delete Data from Datasets in Automated Data Distribution)
For example, ADD expects that the database object person in the output stage will be used for populating data into the dataset with the LDM element ID of dataset.person.
LDM Field Level Mapping
(Applies to all LDM element types except for labels and references.)
If the LDM element ID of an LDM field is the following:
then ADD expects the following column name in the mapped database object of the output stage:
The following table shows the mapping of <element_type> to <prefix_type>, and their meaning:
prefix_type element_type Element Type Meaning a attr attribute cp attr connection point (anchor) f fact fact d date date dimension r reference l label
If the LDM element ID of a label is the following:
then ADD expects the following column name in the source database in the mapped database object of the output stage:
- If dataset.<dataset1_name> is a dataset referenced from dataset.<dataset2_name>, ADD expects the following:
- The source column for this reference exists in the database object <dataset2_name>.
- The name of the source column is r__<dataset1_name>.
In the following table, columns represent particular datasets and the corresponding database objects in Data Warehouse.
In a cell, the first line (if present) indicates the LDM element ID, and the second line indicates the corresponding Data Warehouse database object / column.
Typically, only the last section of the LDM element ID is used to map Data Warehouse database objects to the LDM datasets. This is true when the second section of the LDM element ID matches the Data Warehouse object that it maps to. For example, the LDM fact fact.person.age in the dataset dataset.person becomes the Data Warehouse column f__age.
However, if the Data Warehouse database object and the LDM dataset do not match, the last two sections of the LDM element become a part of the Data Warehouse column name. For example, the LDM fact fact.spouse.age in the dataset dataset.person becomes the Data Warehouse column f__spouse__age.
The situation when the last section of the LDM element ID is not unique in the dataset usually happens when LDM elements are moved between datasets and the LDM element ID is not changed. To resolve such conflicts, include the LDM dataset name in the Data Warehouse column name.
Delete Side Table
For more information about the delete side table, see Delete Data from Datasets in Automated Data Distribution.
The delete side table is an optional entity in the ADD output stage. The delete side table adheres to the same naming convention as the data load table. The API resource for SQL diff does not generate SQL commands for the delete side table due to a large number of possible variations.
The delete side table is used only in incremental load mode (the x__timestamp column is present in the data load table). When loading data in incremental mode, the delete side table must be incremental itself (that is, must have the x__timestamp column).
In full load mode, the delete side table is not used and gets ignored.
CREATE TABLE "out_deleted_contacts" ( "a__branch" VARCHAR(256), "x__timestamp" TIMESTAMP ENCODING RLE, "x__client_id" VARCHAR(128) ENCODING RLE ); --if Vertica optimization is required, then: 1) replace ");" with ""; 2) replace "--verticaOpt " with "". --Align the usage of x__client_id and x__timestamp columns in the statements below with their real presence in the table. --verticaOpt ) --verticaOpt order by "x__client_id", "x__timestamp" --verticaOpt segmented by hash("a__branch","x__client_id","x__timestamp") all nodes;
outis the output stage prefix.
deletedindicates that the table is the delete side table.
contactsis the name of the dataset from which you want to delete data.
Special Columns in Output Stage Tables
In addition to the standard columns mapped to LDM elements in the output stage tables, there are the following optional columns that, when present, influence ADD behavior:
- The x__client_id column enables data distribution from a single output stage table into multiple workspaces based on the values in this column.
When data is loaded to a particular workspace, only the records with the value in the x__client_id column equal to the Client ID workspace parameter are taken from the output stage table and are loaded into the corresponding dataset in the workspace.
- The x__timestamp column is relevant only for incremental loads.
The records with the same value in the x__timestamp column make up a group of records that was created in the output stage during run of a particular ADD process. Between two groups of records, the group that was created later has a greater value in the x__timestamp column than the group of records that was created earlier.
When data is loaded to a particular dataset and the load completes successfully, ADD stores to dataset’s metadata the greatest value in the x__timestamp column of the corresponding output stage table. Next time ADD retrieves the last greatest value from dataset’s metadata, and only those records whose value of the x__timestamp column is greater than the timestamp stored in the corresponding dataset's metadata are taken from the output stage table and are loaded into the corresponding dataset in the workspace.
- The x__deleted column enables the data deletion functionality on a single output stage table (see Delete Data from Datasets in Automated Data Distribution).
Read more about ADD: Automated Data Distribution
Read how to use ADD: Use Automated Data Distribution