Create a Logical Data Model from CSV Files

For project administrators only

This tutorial guides you through the process of creating a logical data model (LDM) in your workspace from CSV files. A newly created workspace does not have an LDM therefore you are going to create an LDM from scratch.

For the purpose of this tutorial, you are going to create a simple LDM with three datasets and publish it to your workspace.

You create an LDM in the LDM Modeler. To do so, perform the following steps:

  1. Prepare CSV files.
  2. Add a dataset with attributes.
  3. Create a dataset with facts and a Date dataset.
  4. Publish the LDM.

If you do not have CSV files, you can create an LDM manually (see Create a Logical Data Model Manually).

Contents:

Prepare CSV Files

Make sure that your CSV files meet the requirements described in CSV File Requirements.

In this tutorial, you are going to use two CSV files, product.csv and order_line.csv. Download them to your machine.

  • product.csv (download as .CSV)

    product_id,product_name,category
    150,Skirt,Clothing
    210,Artego,Electronics
    240,PortaCode,Electronics
    420,Elentrix,Outdoor
    430,Integres,Outdoor
    120,"Polo Shirt",Clothing
    110,Pullover,Clothing
    310,Applica,Home
    330,Optique,Home
  • order_line.csv (download as .CSV)

    order_line_id,order_id,order_status,date,product_id,price,quantity
    10668-9VYN74-2,10668-9VYN74,Cancelled,2016-07-12,210,100.33,1.00
    10697-7GBN87-1,10697-7GBN87,Delivered,2017-12-05,310,39.35,1.00
    10907-0TPH53-3,10907-0TPH53,Delivered,2015-03-28,220,50.40,1.00
    10778-7INQ32-1,10778-7INQ32,Delivered,2018-02-08,230,78.40,1.00
    10240-3SBQ40-3,10240-3SBQ40,Delivered,2017-05-07,160,21.49,1.00
    10356-7ZBU60-1,10356-7ZBU60,Cancelled,2015-03-01,140,21.49,1.00
    10700-0ACT49-1,10700-0ACT49,Returned,2019-02-20,150,21.82,1.00
    10175-0YRN35-3,10175-0YRN35,Delivered,2015-04-11,150,19.15,3.00
    10152-6HOB25-1,10152-6HOB25,Cancelled,2016-03-04,140,29.67,1.00

Add a Dataset with Attributes

You are going to create your first dataset from the product.csv file. This CSV file contains only attributes. Attributes are data that is to be used in grouping or segmenting the values resulting from the computed functions (for example, countries, names, IDs). In almost all use cases, this data is non-numerical. For more information about the attributes, see Attributes in Logical Data Models.

You are also going to set one of the attributes in the dataset as a primary key. The primary key defines the unique identifier for a row of data in a file and serves as a connection point that allows you to connect this dataset to another dataset. In other words, you will be able to create a relation between the dataset that you are going to create from the product.csv file and another dataset that you will be creating later in this tutorial. A relation between two datasets allows you to use information from one dataset to slice the data in the other dataset. The dataset with the data that you want to use to help analyze the other data is the one that must have a primary key. In this tutorial, this dataset is the one created from the product.csv file. For more information about primary keys, see Primary and Foreign Keys and Connection Points in Logical Data Models.

Steps:

  1. Go to the LDM Modeler (to learn how to access the LDM Modeler, click here).
    The blank canvas area opens.
  2. Drag the product.csv file and drop it in the blank canvas area.
    The file preview opens.

    This preview shows the data from your file. Based on the column names and the contents of those columns in the CSV file, the column names and the types of the data are suggested for the dataset that will be created from this file.
    For example, the product_name column is detected as an attribute based on the values in the column (non-numerical data) and will be named Product Name in the dataset.
  3. Notice that the product_id column is detected as a measure (or a fact; see Facts in Logical Data Models). This is because it contains numbers. By default, numbers are auto-detected to be measures.
    However, the product_id column contains the identifiers of the products. Those are non-measurable descriptive data, and therefore must be an attribute. Because the product IDs are unique, you can use it as a primary key.
  4. Change the data type for the product_id column from Measure to Primary key.

    If you do not set the primary key now, you will be able to do it after the dataset has been created (see Update a Logical Data Model).

  5. Review the suggested column names and the data types for the other columns. These columns are correctly detected to be attributes.
  6. Click Import.
    The CSV file is imported, and the Product dataset is added.

    Notice the following:
    • The dataset is color-coded with orange. This is because the dataset contains only attributes and no facts.
    • The Product Id attribute is marked as the primary key. This means that you will be able to create a relation between this dataset and other datasets in the logical data model.
    You are now going to add another dataset from the order_line.csv file.

Create a Dataset with Facts and a Date Dataset

You are going to create another dataset from the order_line.csv file. This CSV file contains attributes, facts, and dates. Facts are numerical pieces of data, which in a business environment is used to measure a business process (see Facts in Logical Data Models).

In an LDM, dates are managed through a separate object, the Date dataset (see Dates in the Logical Data Model). Therefore, after you have imported the order_line.csv file, not one but two datasets will be added to your LDM: one Date dataset for the dates, and the other one with the rest of information. These two datasets will be automatically connected with a relation.

You are also going to establish a relation between the dataset created from the order_line.csv file and the Product dataset that you created earlier.

Steps:

  1. Drag the order_line.csv CSV file and drop it in the canvas area.
    The file preview opens.


    Notice the following:
    • The order_line_id, order_id, and order_status columns are correctly detected as attributes.
    • The date column is correctly detected as dates in the yyyy-MM-dd format and will be converted to a separate Date dataset.
    • The price and quantity columns are correctly detected as measures (or facts).
  2. Notice that the product_id column is detected as a measure. This is because it contains numbers. However, this column contains the product identifiers and therefore must be an attribute.
    Earlier in this tutorial, when creating the Product dataset, you set the Product Id attribute to be the primary key of this dataset. Now, you can set the product_id column in the order_line.csv file as a reference and thus create a relation between these two datasets.
  3. Change the data type for the product_id column from Measure to Reference.
    When prompted, select Product as the dataset to set the reference to.
  4. Click Import.
    The CSV file is imported, and the Order Line dataset and the Date dataset are added.

    Notice the following:
    • The Order Line dataset is color-coded with green. This is because the dataset contains facts.
    • The Date dataset is color-coded with blue. This is because this is a date dimension dataset.
    • A relation is created between the Date dataset and the Order Line dataset. This is because the dates were extracted from the order_line.csv file.
      The direction of the arrow determines which dataset's data can be analyzed (sliced) by the data from the other dataset. The relation between the Date dataset and the Order Line dataset allows you to slice the data from the Order Line dataset (an incoming relation to the Order Line dataset) by date (an outgoing relation from the Date dataset).
    • A relation is created between the Product and the Order Line dataset. This is because you connected these datasets by the Product Id primary key in the Product dataset.
      The direction of the arrow determines which dataset's data can be analyzed (sliced) by the data from the other dataset. The relation between the Product dataset and the Order Line dataset allows you to slice the data from the Order Line dataset (an incoming relation to the Order Line dataset) by product (an outgoing relation from the Product dataset).
    • References to the ProductId attribute and the Date dataset have been added to the Order Line dataset. These references indicate that the Product Id field and the Date dataset are foreign keys in the Order Line dataset.
  5. Click the list button on the left.

    The LDM is switched to the list view.

    This view shows you what datasets your LDM has (the Date dataset is listed separately) and statuses of those datasets. At this point, all your datasets are in the status of Not published yet, because you have not yet published the LDM.
  6. To return to the graph view, click the graph button.

Your LDM is ready. You can now publish it.

Publish the LDM

When you publish an LDM, the LDM Modeler asks you to choose the mode of publishing. You can choose one of the following:

  • Preserve data
    The LDM is published only if publishing does not delete any data. This is useful when you edit an existing LDM and do not want to break anything by deleting objects or relations unintentionally.
  • Overwrite
    The LDM is published regardless of whether any data is going to be deleted.

In this tutorial, you are creating an LDM from scratch, and either publishing mode will work well because there is nothing to break at this point. You are going to use the default option, Preserve data.

The LDM Modeler also asks you whether in addition to creating the datasets you want to upload data from the CSV files to those datasets. In this tutorial, you are focusing on creating the LDM and not loading data, therefore you are not going to load the data. For more information about loading data from CSV file to datasets, see Load CSV Files to a Workspace with the LDM Modeler.

Steps:

  1. Click Publish.
    You are prompted to choose the mode of publishing. The Preserve data mode is selected by default.
  2. Keep the Preserve data mode, and de-select the Upload data from imported CSV files checkbox.
  3. Click Publish.
    The publishing process starts. When the publishing completes, you see a message that the LDM has been published. Close this message.
  4. Click the list button on the left.

    The LDM is switched to the dataset view.

    Notice that the statuses of all datasets have changed. All your datasets are now in the status of Published, because you have published the LDM.

    You may also notice the CSV Template and Update from file buttons for each dataset. You use these buttons for loading data from a CSV file to a dataset directly in the LDM Modeler. For more information about loading data from a CSV file to a dataset, see Update Data in a Dataset in the Logical Data Model.

  5. To return to the graph view, click the graph button.

After you have published the LDM, create the Output Stage (see Create the Output Stage based on Your Logical Data Model).


Powered by Atlassian Confluence and Scroll Viewport.