You are viewing our older product's guide. Click here for the documentation of GoodData Cloud, our latest and most advanced product.

Naming Convention for Output Stage Objects

When you need to load data to a workspace with a specific logical data model, you have to map all datasets in the workspace’s logical data model to the objects in the output stage.

The names of the output stage objects must follow the specific naming convention at the dataset level and the LDM field level.

You can use the SQL diff to generate SQL commands for creating or updating corresponding output stage objects. The generated SQL commands are compliant with the output stage object naming convention.

Dataset Level Mapping

If the LDM element ID of a dataset is the following:

dataset.<dataset_name>

then Automated Data Distribution (ADD) expects the following name of the source database object for this dataset in the output stage:

for the data load table case
```
<output_stage_prefix><dataset_name>
```
for the delete side table case (only if you are using the data deletion, see Delete Data from Datasets in Automated Data Distribution)
```
<output_stage_prefix>_deleted_<dataset_name>
```

For example, ADD expects that the database object person in the output stage will be used for populating data into the dataset with the LDM element ID of dataset.person.

LDM Field Level Mapping

(Applies to all LDM element types except for labels and references.) If the LDM element ID of an LDM field is the following:
```
<element_type>.<dataset_name>.<element_name>
```
then ADD expects the following column name in the mapped database object of the output stage:
```
<prefix_type>__<element_name>
```
The following table shows the mapping of <element_type> to <prefix_type>, and their meaning:
prefix_type element_type Element Type Meaning
a attr attribute
cp attr connection point (anchor)
f fact fact
d date date dimension
r reference
l label
If the LDM element ID of a label is the following:
```
label.<dataset_name>.<attribute_name>.<label_name>
```
then ADD expects the following column name in the source database in the mapped database object of the output stage:
```
l__<dataset_name>__<attribute_name>__<label_name>
```
If dataset.<dataset1_name> is a dataset referenced from dataset.<dataset2_name>, ADD expects the following:
- The source column for this reference exists in the database object <dataset2_name>.
- The name of the source column is r__<dataset1_name>.

prefix_type	element_type	Element Type Meaning
a	attr	attribute
cp	attr	connection point (anchor)
f	fact	fact
d	date	date dimension
r		reference
l		label

Example:

In the following table, columns represent particular datasets and the corresponding database objects in Data Warehouse.

In a cell, the first line (if present) indicates the LDM element ID, and the second line indicates the corresponding Data Warehouse database object / column.

dataset.state State	dataset.Customer Customer	dataset.product Product	dataset.invoice Invoice	dataset.invoiceitem InvoiceItem
attr.state.stateid cp__stateid	attr.customer.customerid cp__customerid	attr.product.productid cp__productid	attr.invoice.invoiceid cp__invoiceid	fact.invoiceitem.quantity f__quantity
label.state.stateid.abbrev l__stateid__abbrev	r__state		r__customer	fact.invoiceitem.price f__price
label.state.stateid.name l__stateid__name			d__invoice	r__product
attr.state.region a__region				r__invoice

Conflict Resolution

Typically, only the last section of the LDM element ID is used to map Data Warehouse database objects to the LDM datasets. This is true when the second section of the LDM element ID matches the Data Warehouse object that it maps to. For example, the LDM fact fact.person.age in the dataset dataset.person becomes the Data Warehouse column f__age.

However, if the Data Warehouse database object and the LDM dataset do not match, the last two sections of the LDM element become a part of the Data Warehouse column name. For example, the LDM fact fact.spouse.age in the dataset dataset.person becomes the Data Warehouse column f__spouse__age.

The situation when the last section of the LDM element ID is not unique in the dataset usually happens when LDM elements are moved between datasets and the LDM element ID is not changed. To resolve such conflicts, include the LDM dataset name in the Data Warehouse column name.

Delete Side Table

For more information about the delete side table, see Delete Data from Datasets in Automated Data Distribution.

The delete side table is an optional entity in the ADD output stage. The delete side table adheres to the same naming convention as the data load table. The API resource for SQL diff does not generate SQL commands for the delete side table due to many possible variations.

The delete side table is used only in incremental load mode (the x__timestamp column is present in the data load table). When loading data in incremental mode, the delete side table must be incremental itself (that is, must have the x__timestamp column).

In full load mode, the delete side table is not used and gets ignored.

Example:

CREATE TABLE "out_deleted_contacts" (
"a__branch" VARCHAR(256),
"x__timestamp" TIMESTAMP ENCODING RLE,
"x__client_id" VARCHAR(128) ENCODING RLE
);
--if Vertica optimization is required, then: 1) replace ");" with ""; 2) replace "--verticaOpt " with "".
--Align the usage of x__client_id and x__timestamp columns in the statements below with their real presence in the table.
--verticaOpt )
--verticaOpt order by "x__client_id", "x__timestamp"
--verticaOpt segmented by hash("a__branch","x__client_id","x__timestamp") all nodes;

In "out_deleted_contacts":

out is the output stage prefix.
deleted indicates that the table is the delete side table.
contacts is the name of the dataset from which you want to delete data.

Special Columns in Output Stage Tables

In addition to the standard columns mapped to LDM elements in the output stage tables, there are the following optional columns that, when present, influence ADD behavior:

The x__client_id column enables data distribution from a single output stage table into multiple workspaces based on the values in this column. When data is loaded to a particular workspace, only the records with the value in the x__client_id column equal to the Client ID workspace parameter are taken from the output stage table and are loaded into the corresponding dataset in the workspace.
The x__timestamp column is relevant only for incremental loads. The records with the same value in the x__timestamp column make up a group of records that was created in the output stage during run of a particular ADD process. Between two groups of records, the group that was created later has a greater value in the x__timestamp column than the group of records that was created earlier. When data is loaded to a particular dataset and the load completes successfully, ADD stores to dataset’s metadata the greatest value in the x__timestamp column of the corresponding output stage table. Next time ADD retrieves the last greatest value from dataset’s metadata, and only those records whose value of the x__timestamp column is greater than the timestamp stored in the corresponding dataset’s metadata are taken from the output stage table and are loaded into the corresponding dataset in the workspace.
The x__deleted column enables the data deletion functionality on a single output stage table (see Delete Data from Datasets in Automated Data Distribution).

Read more about ADD: Automated Data Distribution
Read how to use ADD: Use Automated Data Distribution

Delete Data from Datasets in Automated Data Distribution

Time Zones in Automated Data Distribution