Create a Relationship between Datasets

A relationship between two datasets allows you to use information from one dataset to slice the data in the other dataset.

This article will explain how create the relationship between your datasets.

Contents:

Relationship Properties

Primary Key

Creating a relationship requires a primary key in one of the datasets. You cannot create a relationship between two datasets if none of them has the primary key set. For more information about primary keys, see Connection Points in Logical Data Models.

A good primary key should be a unique column of data that you can use to help analyze your data. For example, the Product ID column in the product.csv example file is a great primary key.

If you did not set a primary key for your datasets at the time of import, you will need to set it while creating a relationship between the datasets. To decide what dataset must have the primary key, decide what data you want to analyze (slice) and what data you want to use to help analyze. The dataset with the data that you want to use to help analyze the other data is the one that must have a primary key.

Only an attribute can be set as a primary key. A primary key must contain only one attribute.

Reference

When a relationship is created, a reference is set in the other (target) dataset. The reference is an attribute in the target dataset that is associated with the primary key of the originating dataset.

When deciding on how to set the reference, you can choose one of the following options:

  • Create a new attribute in the target dataset based on the primary key from the originating dataset, and make it a reference.
    This new attribute in the target dataset is populated by references to the primary key values from the originating dataset. Each value in the reference attribute must have a corresponding value of the primary key in the originating dataset.
  • Reuse an existing attribute from the target dataset, and make it a reference.
    Once the relationship is created, the primary key from the originating dataset is merged with the attribute in the target dataset that you made the reference.

Relationship Type

A relationship between two datasets can be one of the following types:

  • A 1:N (one-to-many) relationship, which is a one-directional relationship between the datasets when the data from one dataset can be used to analyze the data from the other dataset, but not the other way round. This is the default type of the relationship.
  • An M:N (many-to-many) relationship that allows you to use the data from either dataset to analyze the data from the other dataset. For more information about them, see Many-to-Many in Logical Data Models.

Types of Datasets in a Logical Data Model

There are three types of datasets in the logical data model (LDM).

  • Attribute datasets. Attributes are data that is used to categorize the values that result from measures. In almost all use cases, this data is a non-calculable value. An attribute dataset is a dataset that consists of attributes to describe the facts in your LDM. Generally, the attribute dataset will contain the data that you want to use to help analyze the data in a fact dataset. Only an attribute can be used as a primary key and a reference. The product.csv file is an example data that can be used to create an attribute dataset.
  • Fact datasets. A fact (also known as a measure) is a numerical piece of data, which in a business environment is used to measure a business process. A collection of facts that measure the same business process is stored in a single data unit called a fact dataset (also known as a fact table). A fact dataset contains individual fact values and pointers to associated attributes used as context for the fact data. The order_line.csv file is an example of data that be used to create a fact table.
  • Date dataset. The Date dataset is a predefined collection of attributes that you can use to slice to the data in a fact dataset by the lowest level of the date hierarchy (day) to the highest level (year). The order_lines.csv file contains the date column, which automatically imports into the LDM Modeler as a date dataset.

Create a Relationship between Datasets

Steps:

  1. Select the originating dataset (the dataset that you want to create the relationship from).
  2. Click the blue dot on the right border of the dataset, and drag the arrow that appears to connect the dataset to the other dataset.


    The following dialog opens:

    This dialog does not appear when you are creating a relationship between a regular dataset and a Date dataset. Such relationship is created right away as a 1:N relationship, and a reference to the Date dataset is added to the regular dataset as a new field.

  3. On the left, click Primary key not set, and select the attribute that should become the primary key.

    If the dataset already as the primary key set, it will be pre-selected in the dropdown. You cannot edit the primary key for the dataset in this dialog. To update the dataset, see Set or Update the Primary Key in a Dataset.

  4. On the right, choose how to set up a reference in the other dataset, as a new field or as an existing attribute.
    For the purpose of this tutorial, try using the default option, Connect as new field.


  5. Select the type of the relationship.
    For the purpose of this tutorial, try using the default option, 1:N.


  6. Click Connect.
    The relationship is created.
    • The direction of the arrow determines which dataset's data can be used to analyze (slice) the data from the other dataset.
    • The primary key has been set in the originating dataset.
      In the following example, notice how the icon for the Employee ID attribute is a key to signify it is the primary key:
    • A reference to the primary key has been added to the target dataset.
      In following example, a reference to the Employee ID attribute has been added to the Payments dataset. This reference indicates that the Employee ID field is a foreign key in the Payments dataset, with the Employee ID values from the Employee dataset as the key values.

Now, update the mapping for the reference set in the target dataset.

Update Field Mapping in a Dataset

When you add a reference to a dataset, you must update the LDM mapping so that the reference uses the correct source column. If an attribute exists with the same name as the reference in the dataset that you have added the reference to, you must remove that attribute from the dataset. Every dataset in your LDM must have each field (a fact or an attribute) unambiguously mapped to a column in a source table (represented by a table in your data warehouse or a single CSV file). The dataset itself must be mapped to this source table.

During data load, the data from the columns in the source table will be loaded to the corresponding facts or attributes in the dataset.

Steps:

  1. Select the dataset, and click More... -> View details.

    The dataset details dialog opens that lists all the dataset facts, attributes, and attribute labels.
  2. Click the Load configuration tab, and set the source column for the reference.
  3. Review the other mapping and update it, if needed.
  4. When you are done with updating, click Save changes.
  5. Publish the LDM (see Publish your Logical Data Model).

Set or Update the Primary Key in a Dataset

Steps:

  1. Select the dataset, and click More... -> Set primary key.
  2. Select the attribute that should become the primary key, and click Set key.

    The dialog closes, and the primary key is set.

To remove the primary key from the dataset, de-select all the attributes, and click Set key.

You cannot remove the primary key from the dataset that has a relationship to another dataset. To remove the primary key, delete the relationship first.

Powered by Atlassian Confluence and Scroll Viewport.