Naming Convention for Output Stage Objects in Automated Data Distribution v2 for Data Warehouses

When you need to load data to a workspace with a specific logical data model (LDM), you have to map all datasets in the workspace LDM to the objects in your data warehouse Output Stage.

The names of the Output Stage objects must follow the specific naming convention at the dataset level and the LDM field level.

To generate SQL commands for creating or updating the Output Stage, see Create the Output Stage based on Your Logical Data Model.

Contents:

Dataset Level Mapping

If the identifier of a dataset is the following:

dataset.<dataset_name>

then Automated Data Distribution (ADD) v2 for data warehouses expects the following name of the source database object for this dataset in the Output Stage:

<output_stage_prefix>_<dataset_name>

For example, ADD v2 expects that the database object out_person in the Output Stage (out_ is the Output Stage prefix) will be used for populating data into the LDM dataset with the identifier of dataset.person.

LDM Field Level Mapping

The following table shows the mapping of element_type to prefix_type:

prefix_typeelement_typeDescription
aattrattribute
cpattrconnection point (anchor)
ffactfact
ddatedate dimension
r
reference
l
label

Attributes, Connection Points, and Facts

If the identifier of an LDM field is the following:

<element_type>.<dataset_name>.<element_name>

then ADD v2 expects the following column name in the mapped database object in the Output Stage:

<prefix_type>__<element_name>

Attribute Labels

If the identifier of an attribute label is the following:

label.<dataset_name>.<attribute_name>.<label_name>

then ADD v2 expects the following column name in the mapped database object in the Output Stage:

l__<attribute_name>__<label_name>

References

If dataset.<dataset1_name> is a dataset referenced from dataset.<dataset2_name>, ADD v2 expects the following:

  • The source column for this reference exists in the database object <dataset2_name>.
  • The name of the source column is r__<dataset1_name>.

Example

In the following table:

  • The columns represent particular datasets and the corresponding database objects in the data warehouse.
  • In a cell, the first line (if present) indicates the identifier of the object, and the second line indicates the corresponding database object/column.

dataset.state

State

dataset.customer

Customer

dataset.product

Product

dataset.invoice

Invoice

dataset.invoiceitem

InvoiceItem

attr.state.stateid

cp__stateid

attr.customer.customerid

cp__customerid

attr.product.productid

cp__productid

attr.invoice.invoiceid

cp__invoiceid

fact.invoiceitem.quantity
f__quantity

label.state.stateid.abbrev

l__stateid__abbrev

r__state


r__customer

fact.invoiceitem.price

f__price

label.state.stateid.name

l__stateid__name



d__invoice

r__product

attr.state.region

a__region




r__invoice

Conflict Resolution

Typically, only the last section of an LDM element identifier is used to map database objects in the Output Stage to the LDM datasets. This is true when the second section of the identifier matches the database object that it maps to. For example, the LDM fact fact.person.age in the dataset dataset.person becomes the column f__age in the corresponding database object.

However, if the database object and the LDM dataset do not match, the last two sections of the LDM element identifier become a part of the column name. For example, the LDM fact fact.spouse.age in the dataset dataset.person becomes the column f__spouse__age.

Special Columns in Output Stage Tables

In addition to the standard columns mapped to the LDM elements in the Output Stage database objects, there are the following optional columns that, when present, influence ADD v2 behavior:

  • The x__client_id column enables data distribution from a single Output Stage object into multiple workspaces based on the values in this column.
    When data is loaded to a particular workspace, only the records with the value in the x__client_id column equal to the workspace client ID are loaded into the corresponding dataset in the workspace. For more information about the client ID, see Automated Data Distribution v2 for Data Warehouses and Set Up Automated Data Distribution v2 for Data Warehouses.
  • The x__timestamp column is relevant only for incremental loads (see Load Modes in Automated Data Distribution v2 for Data Warehouses).
    The records with the same value in the x__timestamp column make up a group of records that was created in the Output Stage during run of a particular ADD v2 process. Between two groups of records, the group that was created later has a greater value in the x__timestamp column than the group of records that was created earlier.
    When data is loaded to a particular dataset and the load completes successfully, ADD v2 stores to the dataset’s metadata the greatest value in the x__timestamp column of the corresponding Output Stage table. Next time, ADD v2 retrieves the last greatest value from the dataset’s metadata, and only those records whose value of the x__timestamp column is greater than the timestamp stored in the corresponding dataset's metadata are taken from the Output Stage table and are loaded into the corresponding dataset in the workspace.
  • The x__deleted column enables the data deletion functionality on a single Output Stage table (see Load Modes in Automated Data Distribution v2 for Data Warehouses).
Powered by Atlassian Confluence and Scroll Viewport.