Create a Relationship between Datasets

A relationship between two datasets allows you to use information from one dataset to slice the data in the other dataset. Creating a relationship requires a primary key in one of those datasets and a reference (foreign key) in the second dataset. You cannot create a relationship between two datasets if none of them has the primary key set. This article will explain how create the relationship between your datasets. 

Contents:

Logical Data Model Dataset Types

There are three types of datasets in the logical data model (LDM).

Attribute Datasets

Attributes are data that is used to categorize the values that result from measures. In almost all use cases, this data is a non-calculable value. An attribute dataset is a dataset that consists of attributes to describe the facts in your logical data model. Generally, the attribute dataset will contain the data that you want to use to help analyze the data in a fact dataset. Only an attribute can be used as a primary key and a reference key. The product.csv file is an example data that can be used to create an attribute dataset.

Fact Datasets

A fact (also known as a measure) is a numerical piece of data, which in a business environment is used to measure a business process. A collection of facts that measure the same business process are stored in a single data unit called a fact dataset (also known as a fact table). A fact dataset contains individual fact values and pointers to associated attributes used as context for the fact data. The order_line.csv file is an example of data that be used to create a fact table.

Date Dataset

The date dataset is a predefined collection of attributes that you can use to slice to the data in a fact dataset by the lowest level of the date hierarchy (day) to the highest level (year). The order_lines.csv file contains the date column, which automatically imports into the LDM Modeler as a date dataset. 

Set the Primary Key

A good primary key should be a unique column of data that you can use to help analyze your data. For example, the Product ID column in the product.csv example file is a great primary key.

To decide what dataset must have the primary key, decide what data you want to analyze (slice) and what data you want to use to help analyze. The dataset with the data that you want to use to help analyze the other data is the one that must have a primary key.

If you did not set a primary key for your dataset at the time of import, you will need to set the primary key.

Only an attribute can be set as a primary key.

  1. Select More... in the dataset that will have the primary key and then select Set primary key.
  2. In the primary key dialog, select the attribute that should become the primary key.
  3. Click Set key.
    The dialog closes, and the primary key is set. In the following example, notice how the icon for the Employee ID attribute is a key to signify it is the primary key:

    You can now connect the dataset with the primary key to another dataset.
  4. Select the blue dot on the right border of the dataset with the primary key and drag the arrow that appears to connect the dataset to a different dataset. The following examples demonstrate this process:


    The relationship is created and the logical data model should now show a relationship between the two datasets.

    • The direction of the arrow determines which dataset's data can be used to analyze (slice) the data from the other dataset.

    • In this example, a reference to the Employee ID attribute has been added to the Payments dataset. This reference indicates that the Employee ID field is a foreign key in the Payments dataset, with the Employee ID values from the Employee dataset as the key values.

Manage Field Mapping in a Dataset

When you add a reference key to a dataset, you must update the logical data model mapping so that the reference key uses the correct source column. If an attribute exists with the same name as the reference key in the dataset that you have added the reference key to, you must remove that attribute. Every dataset in your LDM must have each field (a fact or an attribute) unambiguously mapped to a column in a source table (represented by a table in your data warehouse or a single CSV file).  The dataset itself must be mapped to this source table.

During data load, the data from the columns in the source table will be loaded to the corresponding facts or attributes in the dataset.

Steps:

  1. Select the dataset, click More...View details.

    The dataset details dialog opens that lists all the dataset facts, attributes, and attribute labels.
  2. Click the Load configuration tab.
  3. Review the current mapping and update it, if needed.
  4. When you are done with updating, click Save changes.
  5. Publish your Logical Data Model.
Powered by Atlassian Confluence and Scroll Viewport.