Page tree
Skip to end of metadata
Go to start of metadata

In CloudConnect Designer, when performing field mappings from the metadata of your GD Dataset Writer component, you may be required to select the primary identifier of some attributes. This identifier is a label of the attribute.

Realistically, this task should belong in the LDM Modeler, as it specifies a change to the logical data model. In a future release, this selection will become part of the LDM specification.

For now, this article describes how to manage these changes through the Choose a primary identifier of the following attributes dialog in the GD Dataset Writer for the following cases:

  • You have one dataset
  • You have two more datasets connected through a connection point.

Below are two short examples.

Before you begin, please check your data for the primary and foreign keys. These keys must be used as the primary identification, which functions like a primary key; it stores only unique values. More information is provided below.

Case 1: one dataset

Example Model

Here is a simple example, which contains a single dataset.

This dataset contains the following fields. Note that the Employee ID attribute contains three attribute labels: First Name, Last Name, and Identification number.

 

Here is some sample data in a CSV file:

ETL Designer - mapping fields in GD Dataset Writer

When you are mapping fields in GD Dataset Writer, the following dialog appears:

In the above dialog, you must select the primary attribute label, which describes the Employee ID attribute.

Behavior variations

Depending on the field you select, the behavior varies. In the examples below, a simple report has been created to sum salary values for the Employee ID attribute, the label of which has been changed in each of the following reports.

 

1. Select Employee ID (label.employee.employeeid) as the primary identifier

Secondary-click the Employee ID column to change the attribute label.

Primary identifier = Employee ID*

 

Primary identifier = Last Name

 

Primary identifier = First Name

 

2. Select First name (label.employee.employeeid.firstname) as the primary identifier

Secondary-click the Employee ID column to change the attribute label.

Primary identifier = Employee ID

 

Primary identifier = First Name

 

Primary identifier = Last Name

 

In Example 2, only four rows are displayed. The primary identifier is retrieving only the unique values, storing only the last data associated with each value. These facts are summed together in the displayed metric.

The primary identifier behaves like a primary key; it stores only the unique values and store only the last one.

For example, in the source data, there are three rows in which the value for First Name is “Jan”. In the generated report, their salaries are summed together and displayed in a single row as a unique value.

This behavior is consistent whenever you select the primary identifier and have repetitive values in multiple rows.

 

Case 2: two datasets

Example Model

In this example, two datasets are connected through a connection point:

In this example, you must map the fields for both datasets. Since they are connected through a connection point, you must specify in the Employee dataset the foreign key-like reference in the Work dataset.

The field-level information on these datasets is displayed below.

Field mapping for Employee dataset

 

Field mapping for Work dataset

CSV files

Example data for each dataset is listed below.

employee.csv

 

work.csv

ETL Designer - mapping fields in GD Dataset Writer

When you are mapping fields in the GD Dataset Writer, the following dialog appears for each dataset.

Employee Dataset Writer dialog

Field mapping dialog - Select the foreign key (the primary key is the Work ID connection point in the Work dataset)

 

For the Employee dataset, the Work ID functions like the foreign key for the dataset, since it is the connection point for the Work dataset.

You must select the referenced label (foreign key) and also select the primary identifier of the Employee ID attribute:

Selecting the referenced label and the primary key

 

Then, you must select the label that identifies the attribute.

 

Work Dataset Writer dialog

For the Work dataset, you must identify the primary attribute label.

Field mapping - primary attribute label

 

You must select the label that identifies the attribute.

Behavior variations

Depending on the field you select, the behavior varies when data is uploaded to the GoodData platform.

1. Select Employee ID (label.employee.employeeid) as the primary identifier in Employee dataset

Behaviors are the same as in the first case. See above.

 

2. Select First name (label.employee.employeeid.firstname) as the primary identifier in Employee dataset

Work dataset with Employee ID

 

Work dataset with First Name

 

Work dataset with Last Name

 

Here again, values are summed based on unique values for the label identifying the first name. So, due to the multiple instances of “Jan,” the reported value is the sum of all salaries for all instances of Jan.

 

3. Select First name (label.employee.employeeid.firstname) as the primary identifier in Employee dataset and Title (label.work.workid.title) as the primary identifier in Work Dataset (and also in Employee dataset writer as a reference)

(empty value) generated by using Work ID as primary key and foreign key

 

The (empty value) for the Work ID is generated because the field Work ID is used as the connection point (primary key and reference key).

employee.csv

 

work.csv

 

To correct this use case, you must change the employee.csv:

employee.csv with worktitle column

 

In the above, the worktitle column has been added, as a reference to the person’s job Title.

 

When the above CSVs are reloaded, the following values are displayed in your reports:

Employee dataset with Work ID = worktitle

 

Employee dataset with Work ID = First Name

 

Employee dataset with Work ID = Last Name

What It Means

Example 1


Employee.csv

employee id,fname,lname,idnumber,salary,workid
1,Martin,Cacky,65456,1,1
2,Jan,Cisar,3543,2,2
3,Jan,Vlcek,34535,3,2
4,Jan,Nanista,354354,4,3
5,Michal,Manak,74,5,4
6,Pert,Stedry,121,6,5


Work.csv

work id,title,url
1,alfa,www.alfa.cz
2,beta,www.beta.cz
3,beta,www.beta.sk
4,gamma,www.gamma.net
5,delta,www.delta.com

 

As you can see, work and employee are now connected through the primary key, Work ID. In this example, the work ID is represented directly by the work id attribute.

When performing the mapping in the GD Dataset Writer component, you must choose “Work ID” as the primary identifier:

Field mapping Work ID in Work dataset

 

Select the same name in the field mapping dialog for the Employee dataset:

Mapping Work ID in Employee dataset

Example 2


Employee.csv

employee id,fname,lname,idnumber,salary,workid
1,Martin,Cacky,65456,1,alfa
2,Jan,Cisar,3543,2,beta
3,Jan,Vlcek,34535,3,beta
4,Jan,Nanista,354354,4,beta
5,Michal,Manak,74,5,gamma
6,Pert,Stedry,121,6,delta


Work.csv

work id,title,url
1,alfa,www.alfa.cz
2,beta,www.beta.cz
3,beta,www.beta.sk
4,gamma,www.gamma.net
5,delta,www.delta.com

 

As you can see, Work.csv and Employee.csv are connected through primary key workid. In this example, this primary key is identified as the title of the work.

So you must choose “Title” as the primary identifier:

Choose primary identifier for Work ID in Work dataset

 

Also, select the same name in the field mapping dialog for the Employee dataset:

Select field mapping for Work ID reference in Employee dataset

 

This reference in the Employee dataset must be the same as in the Work dataset. Otherwise, your data will not be connected together and (empty values) are displayed.

Conclusion

  1. Be careful while mapping fields in GD Dataset Writer.
  2. Remember to have some attributes with at least one attached label.
  3. Before selecting the primary identifier, check your source data to verify the values for primary and foreign keys.

In some future release, this feature will be included in the LDM Modeler.

  • No labels