Load Your Own Redshift Data into GoodData Workspace

In this tutorial, you will learn how to load your own data from a Redshift cluster into your GoodData workspace. This tutorial expands on the Loading Sample Data from Redshift into GoodData guide.

To load data into your workspace, you will perform the following tasks:

  1. Open a new GoodData workspace
  2. Create an output stage
  3. Create a logical data model (LDM)
  4. Set up your data source
  5. Start the load

Contents:

Prerequisites

Log in to your Redshift cluster with the account that you plan to use with GoodData. Ensure that the user you configured in the data source has all necessary privileges and that your Redshift cluster can be accessed by GoodData. For more information about the required privileges, see GoodData-Redshift Integration Details.

Ensure, that you have the following information ready:

  • Redshift username and password
  • Redshift database and schema
  • Workspace ID (also known as project ID (Find the Project ID))

Open a New GoodData Workspace

Ensure that you are logged into your GoodData account.

This tutorial presumes that your GoodData domain is secure.gooddata.com.

  • GoodData Free users
    use the link you received in your GoodData Free confirmation email, for example: https://free123ab.na.gooddata.com.
  • GoodData Growth Users
    use the link you received in your GoodData confirmation email, for example: https://yourcompanyname.na.gooddata.com.
  • White-label customers
    use your own white-label domain

For the purpose of this tutorial, you will work with a new workspace (also known as a project).

To create a workspace:

 GoodData Free users - click to expand

Your GoodData account comes with five data-ready workspaces. For this tutorial, select any empty workspace.

Once you select a workspace to work with, you can easily rename it in the Manage section:

 GoodData Growth users - click to expand

Your GoodData Growth account allows you to create ten or more workspaces (projects). To create a workspace, you must have the authorization token. For more details, see Find the Project Authorization Token.

Steps:

  1. Click the Add workspace button on your welcome screen.

  2. Enter the name of your workspace and your authorization token.

  3. Click Create.

    Your workspace opens.

To return to your welcome screen and create another workspace, click your name in the upper right corner, click Account, then click Active Project.

You can easily rename the workspace (project) it in the Manage section:

 GoodData Enterprise users - click to expand

To create a new project, also called workspace, you must have the authorization token. For more details, see Find the Project Authorization Token.

If you do not have an authorization token, contact GoodData Support.

This procedure assumes that your domain is https://secure.gooddata.com.

If you are a white-labeled customer, replace secure.gooddata.com with your white-labeled domain in the procedure steps when needed. GoodData Free and Growth users, use the domain that you received in your introduction email, such as https://free123ab.na.gooddata.com.

Steps:

  1. Go to https://secure.gooddata.com/gdc/projects.
    The page for creating a project opens.
  2. In the Title field, enter the name for the new project.

  3. In the Authorization Token field, enter your authorization token.

    Do not enter any information for the summary.

    Leave the other project settings at their defaults.

  4. Click Submit.
    The project/workspace is created and the page with the project's URL opens.
    The project/workspace is immediately available on the GoodData Portal.

Create a Data Source

To connect your Redshift cluster and your GoodData workspace, follow these steps:

  1. Click your name in the top right corner, select Data Integration Console, then click the Data sources tab.
  2. Click Redshift as your data warehouse. Alternatively, click Create data source in the bottom left corner.
    The connection parameter screen appears.
  3. Fill in the required fields.
  4. Click Test connection. If the connection succeeds, the green confirmation message appears.
  5. Click Save.
    The screen with your connection details appears.

In the next section, you will create an output stage to prepare your data for the logical data model.

Create an Output Stage

The Output Stage is a set of views created in your Redshift schema that serve as a source for loading data into your GoodData workspace.

To create the output stage:

  1. On your Connection screen, click Create in the Create output stage gray tab.

    The Create output stage window appears.
  2. GoodData engine will analyze your data structure and create suggested queries that you execute in your data warehouse.

    Note the Output Stage naming conventions option.
  3. Click Copy to clipboard and paste the queries into your Redshift SQL client.
  4. In your Redshift client, review the suggested SQL DDLs and modify them, if needed, to match your needs and comply with GoodData naming conventions.
  5. Execute the SQL DDLs.
  6. Close the Create output stage window to return to your Connection screen.

Create a Logical Data Model from the Output Stage

Before you load data into your workspace, you need a logical data model (LDM) to determine how the data are handled and displayed.

The LDM enables a layer of abstraction between the information that a GoodData user accesses and the method that is used to store data. In this step, you use the Output Stage (view and the column names) to create a logical data model.

Then, you load the model and apply it to your workspace(s).

Steps:

  1. On your Connection screen, click Publish into workspace.
  2. Enter or select the workspace into which you want to publish your logical data model.

  3. Click Select.
  4. On the screen that appears, select the Preserve data option.
  5. Click Publish.

    If your logical data model is published successfully, the following message appears:

    Note: If publishing LDM fails, you will see an error message prompting you to make necessary corrections.
  6. Click the Visit data load page link.
    The Data Load Process screen opens within the Data Integration Console page. Proceed to the next section to load data from the warehouse into your GoodData workspace.

Review and update your LDM

While in the Data Integration Console, click Model data in the top navigation console to open the LDM Modeler interface where you can review and Update a Logical Data Model.

Example:

Note: Depending on the complexity and make up of your data, the actual LDM diagram will be different.

Create a Data Load Process

In this step, you will create a data load that takes care of moving data from your Redshift cluster into your GoodData workspace. This process is called Automated Data Distribution (ADD) and it can be deployed to multiple GoodData workspaces.

Note: The following guide presumes that you have successfully published your logical data model and continue to create a data load process. You can start creating a data load process at any time by clicking Create data load process in the Data process tab.

To continue with the process following creating the logical data model, follow these steps:

  1. Click Deploy Process.

    The Deploy process to a project screen appears. ADD and the data source you created are preselected.
  2. On the next screen, enter your Process Name of choice.
  3. Click Deploy.
    When the process ends, the following screen appears:

Create and Run a New Schedule

To ensure your GoodData analytics is always using the most up-to-date data, you can create schedule to automate data loads between your Redshift cluster and your GoodData workspace. For the purpose of this Getting Started tutorial, you create a manual schedule.

  1. Go to the Data Integration Console and click the Projects tab.
  2. Select the workspace that you used in the previous step.
  3. Click New schedule.
    The new schedule screen appears.
  4. Select the process name.
  5. In the Runs dropdown, set the frequency of execution to manually.
  6. Leave everything else intact.
  7. Click Schedule.
    The schedule is saved and opens for your preview.
    You are now going to manually run the scheduled process.
  8. Click Run.
  9. Confirm Run.
     
    The schedule is queued for execution and is run as platform resources are available.
    The process may take some time to complete.
  10. If the schedule fails with errors, fix the errors, and run the schedule again. Repeat until the process finishes with a status of OK, which means that the ADD process has loaded the data to your workspace.
  11. (Optional) In the Runs dropdown, set the frequency of execution to whatever schedule fits your business needs. Click Save.
    The schedule is saved.

Summary and Next Steps

In this tutorial, you successfully:

  • Set up the connection between your Redshift cluster and your GoodData workspace
  • Created and scheduled data load (albeit in the manual mode)
  • Created a logical data model

Now with your data are in your GoodData workspace, you can either: