This example shows how to load three connected datasets together. There are three hierarchically connected datasets to load in the HR example: Department -> Employee -> Salary.
The CloudConnect graph loads all three datasets.
As the datasets are connected, we need to make sure that the datasets are not loaded in parallel but in sequence. We need to first load the Department, then Employee, and finally the Salary dataset. CloudConnect uses so called phases to execute different parts of the graph sequentially. The phase can be assigned to a connected branch of the graph by simple right-clicking at a specific component and selecting the popup menu item.
The phase is an integer number. Components with the lower phase execute sooner in the sequence than the components with the higher phase. Note, that the component's phase is indicated as a small number in the top left corner of the component's rectangle.
The Department dataset contains only one attribute called Department ID and one label called Name. In fact this attribute has two textual labels: Department ID and Name. GoodData platform needs to know which of these two labels uniquely identify any Department record to correctly load the data. Lets explain this on a simple example. Lets assume that we want to load the following employee records to the GoodData platform:
Table 12.1. Employee records
| Employee ID | Employee Name | Department ID | Department Name | Salary |
|---|---|---|---|---|
| 1 | John Simons | SW | Sales | $170k |
| 2 | Jeff Nicholson | SE | Sales | $180k |
| 3 | Sarah Robinson | MKTG | Marketing | $220k |
The platform needs to break down these records to two attributes and one fact:
Department attribute is created from the Department ID and Department Name columns.
Employee attribute is created from the Employee ID and Employee Name columns.
Salary fact is created from the Salary column.
Now lets look more closely at the Department attribute. The GoodData platform needs to designate one of the Department's columns as primary. Each distinct value of the primary column identifies a record of the attribute. We can choose the Department ID as the primary column and end up with three Department records identified by the values: SW, SE, and MKTG or select the Department Name and end up with only two Department records identified by the values: Sales, and Marketing.
The Field mapping dialog of the GD Dataset Writer component asks for identification of the primary label for all attributes that have more than one label.
The Employee dataset references the Department dataset in the project's data model. As we saw earlier, the Department dataset has one attribute and two labels Department ID and the Name. So there are two options, how to reference any Department record from an Employee record. The CloudConnect needs to know what label you choose. It first asks you for this label during the Employee metadata creation to give the field that references the Department the right name ( → and select the Employee dataset).
Selection of the correct Department label that is referenced from the Employee records is very important in the Employee's GD Dataset Wizard's Field mapping dialog.