Naming Convention for Output Stage Objects

When you need to load data to a workspace with a specific logical data model, you have to map all datasets in the workspace’s logical data model to the objects in the output stage.

The names of the output stage objects must follow the specific naming convention at the dataset level and the LDM field level.

Dataset Level Mapping

If the LDM element ID of a dataset is the following:

dataset.<dataset_name>

then Automated Data Distribution (ADD) expects the following name of the source database object for this dataset in the output stage:

For example, ADD expects that the database object person in the output stage will be used for populating data into the dataset with the LDM element ID of dataset.person.

LDM Field Level Mapping

  • (Applies to all LDM element types except for labels and references.)  If the LDM element ID of an LDM field is the following:

    <element_type>.<dataset_name>.<element_name>
    

    then ADD expects the following column name in the mapped database object of the output stage:

    <prefix_type>__<element_name>
    

    The following table shows the mapping of <element_type> to <prefix_type>, and their meaning:

    prefix_typeelement_typeElement Type Meaning
    aattrattribute
    cpattrconnection point (anchor)
    ffactfact
    ddatedate dimension
    r reference
    l label
  • If the LDM element ID of a label is the following:

    label.<dataset_name>.<attribute_name>.<label_name>
    

    then ADD expects the following column name in the source database in the mapped database object of the output stage:

    l__<dataset_name>__<attribute_name>__<label_name>
    
  • If dataset.<dataset1_name> is a dataset referenced from dataset.<dataset2_name>, ADD expects the following:

    • The source column for this reference exists in the database object <dataset2_name>.
    • The name of the source column is r__<dataset1_name>.

Example:

In the following table, columns represent particular datasets and the corresponding database objects in Data Warehouse.

In a cell, the first line (if present) indicates the LDM element ID, and the second line indicates the corresponding Data Warehouse database object / column.

dataset.state

State

dataset.Customer

Customer

dataset.product

Product

dataset.invoice

Invoice

dataset.invoiceitem

InvoiceItem

attr.state.stateid

cp__stateid

attr.customer.customerid

cp__customerid

attr.product.productid

cp__productid

attr.invoice.invoiceid

cp__invoiceid

fact.invoiceitem.quantity

f__quantity

label.state.stateid.abbrev

l__stateid__abbrev

 

r__state
 

 

r__customer

fact.invoiceitem.price

f__price

label.state.stateid.name

l__stateid__name
  

 

d__invoice

 

r__product

attr.state.region

a__region
   

 

r__invoice

Conflict Resolution

Typically, only the last section of the LDM element ID is used to map Data Warehouse database objects to the LDM datasets. This is true when the second section of the LDM element ID matches the Data Warehouse object that it maps to. For example, the LDM fact fact.person.age in the dataset dataset.person becomes the Data Warehouse column f__age.

However, if the Data Warehouse database object and the LDM dataset do not match, the last two sections of the LDM element become a part of the Data Warehouse column name. For example, the LDM fact fact.spouse.age in the dataset dataset.person becomes the Data Warehouse column f__spouse__age.

The situation when the last section of the LDM element ID is not unique in the dataset usually happens when LDM elements are moved between datasets and the LDM element ID is not changed. To resolve such conflicts, include the LDM dataset name in the Data Warehouse column name.

Delete Side Table

The delete side table is an optional entity in the ADD output stage. The delete side table adheres to the same naming convention as the data load table. The API resource for SQL diff does not generate SQL commands for the delete side table due to many possible variations.

The delete side table is used only in incremental load mode (the x__timestamp column is present in the data load table). When loading data in incremental mode, the delete side table must be incremental itself (that is, must have the x__timestamp column).

In full load mode, the delete side table is not used and gets ignored.

Example:

CREATE TABLE "out_deleted_contacts" (
"a__branch" VARCHAR(256),
"x__timestamp" TIMESTAMP ENCODING RLE,
"x__client_id" VARCHAR(128) ENCODING RLE
);
--if Vertica optimization is required, then: 1) replace ");" with ""; 2) replace "--verticaOpt " with "".
--Align the usage of x__client_id and x__timestamp columns in the statements below with their real presence in the table.
--verticaOpt )
--verticaOpt order by "x__client_id", "x__timestamp"
--verticaOpt segmented by hash("a__branch","x__client_id","x__timestamp") all nodes;

In "out_deleted_contacts":

  • out is the output stage prefix.
  • deleted indicates that the table is the delete side table.
  • contacts is the name of the dataset from which you want to delete data.

Special Columns in Output Stage Tables

In addition to the standard columns mapped to LDM elements in the output stage tables, there are the following optional columns that, when present, influence ADD behavior:

  • The x__client_id column enables data distribution from a single output stage table into multiple workspaces based on the values in this column. When data is loaded to a particular workspace, only the records with the value in the x__client_id column equal to the Client ID workspace parameter are taken from the output stage table and are loaded into the corresponding dataset in the workspace.
  • The x__timestamp column is relevant only for incremental loads. The records with the same value in the x__timestamp column make up a group of records that was created in the output stage during run of a particular ADD process. Between two groups of records, the group that was created later has a greater value in the x__timestamp column than the group of records that was created earlier. When data is loaded to a particular dataset and the load completes successfully, ADD stores to dataset’s metadata the greatest value in the x__timestamp column of the corresponding output stage table. Next time ADD retrieves the last greatest value from dataset’s metadata, and only those records whose value of the x__timestamp column is greater than the timestamp stored in the corresponding dataset’s metadata are taken from the output stage table and are loaded into the corresponding dataset in the workspace.
  • The x__deleted column enables the data deletion functionality on a single output stage table (see Delete Data from Datasets in Automated Data Distribution).

 

Read more about ADD: Automated Data Distribution

Read how to use ADD: Use Automated Data Distribution