Update Data in a Dataset from a CSV File

For project administrators only

In a typical workflow, when your logical data model (LDM) is stable, you would automate data loading. To do so, set up data load processes (see Deploy a Process for Automated Data Distribution v2 or Deploy a Process for a Data Pipeline Brick) and schedule them to load data regularly (see Scheduling a Process).

In some cases, however, you can benefit from updating data in a specific dataset from a CSV file. For example, you are building the LDM and you need to verify the LDM structure by creating insights.

Contents:

Check the Field Mapping in the Dataset

Each field (fact or attribute) in the dataset must be unambiguously mapped to a column in the CSV file. During data load, the data from a column in the CSV file will be loaded to the corresponding fact or attribute in the dataset. For more information about the mapping, see Mapping between a Logical Data Model and the Data Source.

For example, here is how the Product Orders dataset is mapped to the product_orders.csv file.

order_id,order_status,product_id,price,quantity
10668-9VYN74-2,Cancelled,210,100.33,1.00
10697-7GBN87-1,Delivered,310,39.35,1.00
10907-0TPH53-3,Delivered,220,50.40,1.00
10778-7INQ32-1,Delivered,230,78.40,1.00
10240-3SBQ40-3,Delivered,160,21.49,1.00
10356-7ZBU60-1,Cancelled,140,21.49,1.00
10700-0ACT49-1,Returned,150,21.82,1.00
10175-0YRN35-3,Delivered,150,19.15,3.00
10152-6HOB25-1,Cancelled,140,29.67,1.00

Steps:

  1. In the LDM Modeler (for how to access the LDM Modeler, click here), locate the dataset where you want to update data, click More... , then click View details.
    The dataset details dialog opens.
  2. Switch to the Load configuration tab and check Source Column.
  3. Close the dataset details dialog.
    You are now going to download the CSV template.

Download the CSV Template

The CSV template is a CSV file with a single header row representing the names of the source columns that are expected in a CSV file from which you will be loading data to this dataset. The CSV template also defines the delimiter (a comma, by default).

Steps:

  1. Click Load on the left.

    The LDM is switched to the list view.
  2. Locate the dataset where you want to update the data, and click CSV Template for this dataset.

    A CSV template file is generated.
  3. Follow the instructions in your browser to save the generated file.

Generate a CSV File with the New Data

Use the CSV template to generate a CSV file with the correct column names and fill it with the data to upload to the dataset.

The data from the CSV file will be uploaded to the dataset in full load mode (the existing data will be overwritten), therefore make sure that the CSV file contains all the data that you want to be in the dataset (not only an incremental update).

The order of the columns in the CSV file is not important.

The CSV file can have more columns than the fields in the dataset. Those extra columns in the CSV file would not have any fields in the dataset mapped and will be ignored at data load.

Load the Data from the CSV File to the Dataset

Once you have the CSV file ready, load the data from this file to the dataset.

Steps:

  1. In the LDM Modeler, click Load on the left to switch to the list view.
  2. Locate the dataset where you want to update the data, and click Update from file for this dataset.

    You are prompted to browse for the CSV file.
  3. Select the CSV file, and click Import.
    The loading process starts. When the loading completes, you see a message that the data has been uploaded. Close this message.

You can immediately start analyzing the data. To do so, click your username in the top right corner, and select Analyze data. You are redirected to Analytical Designer, where you can create insights from your data. For more information, see Create Insights.

Powered by Atlassian Confluence and Scroll Viewport.