Load Your Own Snowflake Data into GoodData Workspace

In this tutorial, you will learn how to load your own data from a Snowflake instance into your GoodData workspace. This tutorial expands on the Loading Sample Data from Snowflake into GoodData guide.

To load data into your workspace, you will perform the following tasks:

  1. Open a new workspace
  2. Create a data load
  3. Set up your data source
  4. Create an output stage
  5. Create a logical data model (LDM)
  6. Start the load

To complete the process, you will execute several API calls using our gray pages, our UI for APIs. You will be asked to provide details such as your IDs for both Snowflake and GoodData.

Log in to your Snowflake instance with the account that you plan to use with GoodData. Ensure that your Snowflake instance can be accessed by GoodData. For more information about the required privileges, see GoodData-Snowflake Integration Details.

Contents:

Open a New GoodData Workspace

Ensure that you are logged into your GoodData account. This tutorial presumes that your GoodData domain is secure.gooddata.com.

  • GoodData Free users
    use the link you received in your GoodData Free confirmation email, for example: https://free123ab.na.gooddata.com .
  • GoodData Growth Users
    use the link you received in your GoodData confirmation email, for example: https://yourcompanyname.na.gooddata.com .
  • White-label customers
    use your own white-label domain

For the purpose of this tutorial, you will work with a new workspace (also known as a project).

To create a workspace:

 GoodData Free users - click to expand

Your GoodData account comes with five data-ready workspaces. For this tutorial, select any empty workspace.

Once you select a workspace to work with, you can easily rename it in the Manage section:

 GoodData Growth users - click to expand

Your GoodData Growth account allows you to create ten or more workspaces (projects). To create a workspace, you must have the authorization token. For more details, see Find the Project Authorization Token.

Steps:

  1. Click the Add workspace button on your welcome screen.

  2. Enter the name of your workspace and your authorization token.

  3. Click Create.

    Your workspace opens.

To return to your welcome screen and create another workspace, click your name in the upper right corner, click Account, then click Active Project.

You can easily rename the workspace (project) it in the Manage section:

 GoodData Enterprise users - click to expand

To create a new project, also called workspace, you must have the authorization token. For more details, see Find the Project Authorization Token.

If you do not have an authorization token, contact GoodData Support.

This procedure assumes that your domain is https://secure.gooddata.com.

If you are a white-labeled customer, replace secure.gooddata.com with your white-labeled domain in the procedure steps when needed. GoodData Free and Growth users, use the domain that you received in your introduction email, such as https://free123ab.na.gooddata.com.

Steps:

  1. Go to https://secure.gooddata.com/gdc/projects.
    The page for creating a project opens.
  2. In the Title field, enter the name for the new project.

  3. In the Authorization Token field, enter your authorization token.

    Do not enter any information for the summary.

    Leave the other project settings at their defaults.

  4. Click Submit.
    The project/workspace is created and the page with the project's URL opens.
    The project/workspace is immediately available on the GoodData Portal.

Create Data Loading Process

In this step, you will create a data load that takes care of moving data from your Snowflake instance into your GoodData workspace. This process is called Automated Data Distribution (ADD) and it can be deployed to multiple GooData workspaces.

Ensure, that you have the following information ready:

  • project ID (Find the Project ID)
  • Snowflake username and password
  • Snowflake database, warehouse, and schema

Steps:

  1. Go to the Data Integration Console and click the Projects tab.
  2. Click the workspace that you want to use.
    The workspace's page opens.

  3. Click Deploy Process.
    The Deploy process to a project screen appears.
  4. Click Create Data Source.
    The New Data Source screen appears.
  5. Fill in the following fields.
    • Provider: select Snowflake
    • Connection URL: enter the URL to connect your Snowflake account:

      jdbc:Snowflake://acmecorporation.nowflakecomputing.com
    • Your Snowflake username and password
    • Your Snowflake database, warehouse, and schema
  6. Click VALIDATE CONNECTION.
  7. When your collection is validated, fill in remaining two fields:
    • Output Stage Prefix: for the purpose of this tutorial, enter out_
    • Data Source Name: enter any name you like
  8. Click Create.
  9. On the next screen, enter your Process Name of choice.
  10. Click Deploy.
    When the process ends, the following screen appears:
  11. Click the Metadata tab and make a note of the Data Source ID. You will need it in the next step.

Ensure that the user you configured in the data source has all necessary privileges. To learn more, see GoodData-Snowflake Integration Details.

Create an Output Stage

The Output Stage (OS) is a set of views created in your Snowflake schema that serve as a source for loading data into your GoodData workspace.

To help you with creating an OS, we use an API that inspects all tables within the schema defined in your data source and suggests you a SQL DDL script to create views. You will execute those SQL DDLs on top of your schema.

Ensure, that you have the following information ready:

  • Data Source ID

     If you did not note down the Data Source ID in the previous section...
    Follow these steps:
    1. Click your name in the top right corner and select Data Integration Console.
    2. Click the Projects tab.
    3. Click the relevant Project Name.
      The list of your processes appears.
    4. Click the process you are using and select the Metadata tab.

      Find the Data Source ID at the bottom of the window.

To create the output stage:

  1. Go to the page for generating the Output Stage:

    https://secure.gooddata.com/gdc/dataload/dataSources/{Data_Source_id}/generateOutputStage

    where {Data_Source_id} is the identifier of the Data Source that you created at the previous step (Create Data Loading Process).

  2. Click Generate output stage.

    The link for polling for status of the Output Stage generation displays.
  3. Poll (click the link repeatedly) for the status until the page returns SQL DDLs to create GoodData-compatible views.
    Here is an example of an SQL DDLs:

    USE DATABASE "ACME";
    USE SCHEMA "ANALYTICS";
    --------------------------------------------------
    -- out_orders --
    --------------------------------------------------
    CREATE OR REPLACE VIEW "out_orders" AS SELECT
    
            "id" AS "cp__id",
            "customerid" AS "r__customers",
            "description" AS "a__description",
            "price" AS "f__price",
            "modified_at" AS "d__modified_at"
    
            , x__timestamp AS x__timestamp /* Remove this line if it's not incremental load */
            , x__client_id AS x__client_id /* Remove this line if loading into all client workspaces */
            , x__deleted AS x__deleted /* Remove this line if inserting only */
    
    FROM "orders";
  4. Copy the returned SQL DDLs and paste them into your Snowflake SQL client.
  5. Review the suggested SQL DDLs and modify them, if needed.
    Note: For the purpose of this tutorial, you can delete all , x__ lines.

    Review the table at the end of this section to ensure the columns in your output stage comply with GoodData conventions.

  6. Execute the SQL DDLs.

The views must comply with the following naming conventions of the GoodData Output Stage:

Element typeColumn prefixColumn name formatExampleNotes
Attributea__a__{attribute_name}a__name
Factf__f__{fact_name}f__price
Primary key (connection point)cp__cp__{attribute_name}cp__id
Foreign key (reference)r__r__{referenced_table_name}

r__orders


Dated__

d__{date_dimension_name}

d__orderdate

If you want to use a shared date dimension for more tables/views, use the same {date_dimension_name}.

Create a Logical Data Model from the Output Stage

Before you load data into your workspace, you need a logical data model (LDM) to determine how the data are handled and displayed.

The LDM enables a layer of abstraction between the information that a GoodData user accesses and the method that is used to store data. We make the process of creating a workspace LDM easy by reading the output stage (we recognize the type of LDM element based on Output Stage view and column names) and generating the model in a JSON format, so you can keep track of changes that are being made using your favorite version control system.

Then, you load the model and apply it to your workspace(s). In this guide, the process of creating an LDM is split into several steps brings better flexibility and greater opportunities to customize the model at any step.

Ensure, that you have the following information ready:

  • Project ID
  • Data Source ID

Steps:

  1. Go to the page for generating the logical data model:

    https://secure.gooddata.com/gdc/dataload/dataSources/{Data_Source_id}/generateModel

    where {Data_Source_id} is the identifier of the Data Source.
    The link for polling for status of the LDM generation appears.

  2. Poll (click the link repeatedly) for the status until the gray page returns the JSON structure describing the logical data model.
    Note:
    Review warnings if they appear and modify your output stage SQL.
    Here is an example of the JSON structure:

    {
      "projectModelView": {
        "model":
          {
            "projectModel": {
              "datasets": [ {
                "dataset": {
                  "identifier": "dataset.orders",
                  "title": "orders",
                  "anchor": {
                    "attribute": {
                      "identifier": "attr.orders.id",
                      "title": "id",
                      "folder": "orders",
                      "labels": [ {
                        "label": {
                          "identifier": "label.orders.id",
                          "title": "id",
                          "type": "GDC.text",
                          "dataType": "VARCHAR (128)"
                        }
                      } ]
                    }
                  },
                  "attributes": [ {
                    "attribute": {
                      "identifier": "attr.orders.description",
                      "title": "description",
                      "folder": "orders",
                      "labels": [ {
                        "label": {
                          "identifier": "label.orders.description",
                          "title": "description",
                          "type": "GDC.text",
                          "dataType": "VARCHAR (128)"
                        }
                      } ]
                    }
                  } ],
                  "facts": [ {
                    "fact": {
                      "identifier" : "fact.orders.price",
                      "title": "price",
                      "folder": "orders",
                      "dataType": "DECIMAL (12, 2)"
                    }
                  } ],
                  "production": true
                }
              }
            ]
          }
        }
      }
    }
  3. Scroll down to the bottom of the page with the JSON until you see the following form appears:

  4. Enter your project ID o your workspace and select the Preserve data option.

  5. Click Publish.
    The polling screen appears.

  6. When the process finishes successfully, the JSON structure appears. The logical data model is created within your workspace.
    If an error screen appears, review the errors and modify the output stage SQL. Then run the process again.

You can review the logical data model in the Manage section.

Steps:

  1. Go to your workspace.
  2. Click Manage in the top navigation tab.
  3. On the Data tab, click Model.
    The LDM outline appears.
    Example:

    Note: Depending on the complexity and make up of your data, the actual LDM diagram will be different.

Create and Run a New Schedule

To ensure your GoodData analytics is always using the most up-to-date data, you can create schedule to automate data loads between your Snowflake instance and your GoodData workspace. For the purpose of this Getting Started tutorial, you create a manual schedule.

  1. Go to the Data Integration Console and click the Projects tab.
  2. Select the project that you used in the previous step.
  3. Click Create new schedule.
    The new schedule screen appears.
  4. Select the process name.
  5. In the Runs dropdown, set the frequency of execution to manually.
  6. Leave everything else intact.
  7. Click Schedule.
    The schedule is saved and opens for your preview.
    You are now going to manually run the scheduled process.
  8. Click Run.
  9. Confirm Run.

    The schedule is queued for execution and is run as platform resources are available.
    The process may take some time to complete.
  10. If the schedule fails with errors, fix the errors, and run the schedule again. Repeat until the process finishes with a status of OK, which means that the ADD process has loaded the data to your workspace.
  11. (Optional) In the Runs dropdown, set the frequency of execution to whatever schedule fits your business needs. Click Save.
    The schedule is saved.

Next Steps

In this tutorial, you successfully:

  • Set up the connection between your Snowflake instance and your GoodData workspace
  • Created and scheduled data load (albeit in the manual mode)
  • Created a logical data model

Now with your data are in your GoodData workspace, open Analytical Designer and start analyzing.

For a comprehensive advanced guide on data warehouse integration options, review the Integrate Data Warehouses Directly to GoodData based on Your DW Schema section.