Naming Convention for Source Files in Automated Data Distribution v2 for Object Storage Services
When you need to load data to a workspace with a specific logical data model (LDM), you have to map all datasets in the workspace LDM to the source files in your object storage service.
The names of the source files must follow the specific naming convention at the dataset level and the LDM field level.
Dataset Level Mapping
Review the requirements for the names of the source files in GoodData-S3 Integration Details or GoodData-Azure Blob Storage Integration Details.
LDM Field Level Mapping
The following table shows the mapping of element_type
to prefix_type
:
prefix_type | element_type | Description |
---|---|---|
a | attr | attribute |
cp | attr | connection point (anchor) |
f | fact | fact |
d | date | date dimension |
r | reference | |
l | label |
Attributes, Connection Points, and Facts
If the identifier of an LDM field is the following:
<element_type>.<dataset_name>.<element_name>
then Automated Data Distribution (ADD) v2 for object storage services expects the following column name in the mapped source file:
<prefix_type>__<element_name>
Attribute Labels
If the identifier of an attribute label is the following:
label.<dataset_name>.<attribute_name>.<label_name>
then ADD v2 expects the following column name in the mapped source file:
l__<attribute_name>__<label_name>
References
If dataset.<dataset1_name>
is a dataset referenced from dataset.<dataset2_name>
, ADD v2 expects the following:
- The source column for this reference exists in the corresponding source file.
- The name of the source column is
r__<dataset1_name>
.
Example
In the following table:
- The columns represent particular datasets and the corresponding source files.
- In a cell, the first line (if present) indicates the identifier of the object, and the second line indicates the corresponding column in the source file.
dataset.state State | dataset.customer Customer | dataset.product Product | dataset.invoice Invoice | dataset.invoiceitem InvoiceItem |
---|---|---|---|---|
attr.state.stateid cp__stateid | attr.customer.customerid cp__customerid | attr.product.productid cp__productid | attr.invoice.invoiceid cp__invoiceid | fact.invoiceitem.quantity f__quantity |
label.state.stateid.abbrev l__stateid__abbrev | r__state | r__customer | fact.invoiceitem.price f__price | |
label.state.stateid.name l__stateid__name | d__invoice | r__product | ||
attr.state.region a__region | r__invoice |
Conflict Resolution
Typically, only the last section of an LDM element identifier is used to map the source files in your object storage service to the LDM datasets. This is true when the second section of the identifier matches the source file that it maps to. For example, the LDM fact fact.person.age
in the dataset dataset.person
becomes the column f__age
in the corresponding source file.
However, if the source file and the LDM dataset do not match, the last two sections of the LDM element identifier become a part of the column name. For example, the LDM fact fact.spouse.age
in the dataset dataset.person
becomes the column f__spouse__age
.
Special Columns in Source Files
In addition to the standard columns mapped to the LDM elements in the source files, there are the following optional columns that, when present, influence ADD v2 behavior:
- The
x__client_id
column enables data distribution from a single source file into multiple workspaces based on the values in this column. When data is loaded to a particular workspace, only the records with the value in thex__client_id
column equal to the workspace client ID are loaded into the corresponding dataset in the workspace. For more information about the client ID, see Automated Data Distribution v2 for Object Storage Services and Set Up Automated Data Distribution v2 for Object Storage Services. - The
x__deleted
column enables the data deletion functionality on a single file (see Load Modes in Automated Data Distribution v2 for Object Storage Services).