Integrate Data Warehouse with Your Workspace

As a GoodData Free plan user, you can easily create a logical data model along with the process with which you integrate your GoodData workspace and data warehouse.

In this tutorial you will work with our sample data. You will do the following:

You will perform the following tasks:

  1. Create a data source
    You will connect your GoodData workspace with your Data Warehouse
  2. Create a logical data model (LDM)
    Use the structure of your warehouse data to create 'a map' to arrange data in the workspace
  3. Load the data from the warehouse to your GoodData workspace
  4. Create a data load schedule

The following data warehouses are supported directly:

  • Amazon Redshift
  • Snowflake
  • Google BigQuery

You will need an empty GoodData workspace. If you do not know how to create one, see Create a Workspace (Project).

Contents:

Import Sample Data to a Data Warehouse

Download and review the sample CSV file. We suggest that you examine the data first to help you understand how GoodData processes the rows and columns.

The import method is slightly different for each of the supported warehouses. Click on the one that applies to your setup and proceed.

  • Redshift

     Click here to expand...

    Log in to your Redshift cluster with the account that you plan to use with GoodData. Ensure that the user you configured in the data source has all necessary privileges and that your Redshift cluster can be accessed by GoodData. For more information about the required privileges, see GoodData-Redshift Integration Details.

    Steps:

    1. Create a separate Redshift database.
    2. Execute the following SQL queries:

      DROP SCHEMA IF EXISTS GDTRIAL CASCADE;
      
      CREATE SCHEMA GDTRIAL;
      
      CREATE TABLE gdtrial.out_csv_order_lines (
      a__order_line_id VARCHAR(255),
      a__order_id VARCHAR(255),
      d__date date,
      a__order_status VARCHAR(255),
      a__customer_id VARCHAR(255),
      a__customer_name VARCHAR(255),
      a__state VARCHAR(255),
      a__product_id VARCHAR(255),
      a__product_name VARCHAR(255),
      a__category VARCHAR(255),
      f__price DECIMAL(12,2),
      f__quantity DECIMAL(15,6)
      );
    3. To copy the CSV file into your database, download and execute this SQL script that will execute the INSERT command.
  • Snowflake

     Click here to expand...

    Log in to your Snowflake instance with the account that you plan to use with GoodData. Ensure that the user you configured in the data source has all necessary privileges and that your Snowflake instance can be accessed by GoodData. For more information about the required privileges, see GoodData-Snowflake Integration Details.

    You will now execute an SQL script that will create a separate Snowflake:

    • database
    • schema
    • table

    Execute the following SQL script:

    -- Create “gdtrial” database
    CREATE OR REPLACE DATABASE gdtrial;
    
    -- Create “gdtrial” schema
    CREATE OR REPLACE SCHEMA gdtrial;
    
    -- Create “out_csv_order_lines” table
    CREATE OR REPLACE TABLE gdtrial.out_csv_order_lines (
    a__order_line_id VARCHAR(255),
    a__order_id VARCHAR(255),
    d__date date,
    a__order_status VARCHAR(255),
    a__customer_id VARCHAR(255),
    a__customer_name VARCHAR(255),
    a__state VARCHAR(255),
    a__product_id VARCHAR(255),
    a__product_name VARCHAR(255),
    a__category VARCHAR(255),
    f__price DECIMAL(12,2),
    f__quantity DECIMAL(15,6)
    );
    
    -- Create file format describing format of order_lines.csv file
    CREATE OR REPLACE FILE FORMAT "GDTRIAL"."GDTRIAL".gdtrialfileformat TYPE = 'CSV' COMPRESSION = 'NONE' FIELD_DELIMITER = ',' RECORD_DELIMITER = '\n' SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = 'NONE' TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = 'NONE' ESCAPE_UNENCLOSED_FIELD = '\134' DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');
    
    -- Create S3 stage with sample data
    CREATE OR REPLACE STAGE gdstage
    file_format = gdtrialfileformat
    url = 's3://gdc-prod-gdtrial/data';
    
    -- Copy sample CSV from S3 into table
    COPY INTO out_csv_order_lines
    FROM @gdstage/step1/order_lines.csv
    file_format = (format_name = gdtrialfileformat);
  • BigQuery

     Click here to expand...

    Log in to your BigQuery project with the account that you plan to use with GoodData. Ensure that the user you configured in the data source has all necessary privileges and that your BigQuery project can be accessed by GoodData. For more information about the required privileges, see GoodData-BigQuery Integration Details.

    BigQuery and GoodData integration requires the following role levels: bigquery.dataViewer and bigquery.jobUser.

    Do the following:

    • create a BigQuery Service Account
    • create a new dataset named gdtrial
    • in the dataset, create a new table based on the sample CSV
      • name the destination table out_order_lines
      • load the following schema

        a__order_line_id:STRING,
        a__order_id:STRING,
        d__date:DATE,
        a__order_status:STRING,
        a__customer_id:STRING,
        a__customer_name:STRING,
        a__state:STRING,
        a__product_id:STRING,
        a__product_name:STRING,
        a__category:STRING,
        f__price:FLOAT,
        f__quantity:FLOAT
      • set the header rows option to skip 1

Create a Data Source

Data source is a place in your good data workspace that stores the information about the connection with your data warehouse.

Let us walk you through setting up the specific data warehouse.

Prerequisites

Select your warehouse and learn what details you need to establish connection between your GoodData workspace and your data warehouse.

  • Redshift

     Click here to expand...

    Log in to your Redshift cluster with the account that you plan to use with GoodData. Ensure that the user that you configured in the data source has all necessary privileges and that your Redshift cluster can be accessed by GoodData. For more information about the required privileges, see GoodData-Redshift Integration Details.

    Ensure, that you have the following information ready:

    • Redshift username and password
    • Redshift database and schema
    • Workspace ID (also known as project ID (Find the Project ID))
  • Snowflake

     Click here to expand...

    Log in to your Snowflake instance with the account that you plan to use with GoodData. Ensure that the user that you configured in the data source has all necessary privileges and that your Snowflake instance can be accessed by GoodData. For more information about the required privileges, see GoodData-Snowflake Integration Details.

    Ensure, that you have the following information ready:

    • Snowflake username and password
    • Snowflake database, warehouse, and schema
    • Workspace ID (also known as project ID (Find the Project ID))

    In the next section, you will publish the logical data model into your workspace.

  • BigQuery

     Click here to expand...

    Log in to your BigQuery workspace with the account that you plan to use with GoodData. Ensure that the user that you configured in the data source has all necessary privileges and that your BigQuery workspace can be accessed by GoodData. For more information about the required privileges, see GoodData-BigQuery Integration Details.

    BigQuery and GoodData integration requires the following role levels: bigquery.dataViewer and bigquery.jobUser.

    Ensure, that you have the following information ready:

    • BigQuery project and dataset
    • BigQuery Service Account key in JSON format so you can extract

      • client e-mail

      • private key
    • GoodData workspace's Project ID (also known as project ID (Find the Project ID))

To connect your warehouse and your GoodData workspace, follow these steps:

  1. Click your name in the top right corner, select Data Integration Console, then click the Data sources tab.
  2. Select your data warehouse. Alternatively, click Create data source in the bottom left corner.
    The connection parameter screen appears.
  3. Fill in the required fields.
    Enter gdtrial. as your schema (Redshift, Snowflake) or dataset (BigQuery)
  4. Click Test connection. If the connection succeeds, the green confirmation message appears.
  5. Click Save.
    The screen with your connection details appears.

Create a Logical Data Model

A logical data model is an abstract representation of your data and determines what you can do with your data - what metrics you can construct, what and how can you analyze. In this sample scenario, you will create a simple data model directly from the warehouse data.

Once you verified that the connection between the data warehouse and your workspace works, follow these steps to create a logical data model.

  1. On the Data Source Summary page, click Connect.
  2. On the pop up that opens, select the workspace that you want to use, then click Select.

    Logical Data Modeler opens.
  3. On the left tool pane, select your workspace, then click Connect.

    After a while, the Logical Data Modeler connects to your data warehouse and your dataset is displayed.
  4. Drag and drop the table view onto the Logical Data Modeler canvas.
    The Import dataset dialogue opens.
  5. For the purpose of this tutorial, click Import to import your data structure into your logical data model.
    Note: When you import your own data, use this table to set what columns are attributes, facts, dates, and also set the primary keys.

    Your logical data model displays.
  6. On the screen with your logical data model, click Publish.

    Publishing, in effect, saves your logical data model in your workspace. You can modify and re-publish your model any time in the future.
  7. In the pop-up that opens, keep the Preserve data option and click Publish.


  8. On the confirmation strip that appears, click Visit data load page.

In the next section, you will create a data load process.

Create Data Loading Process

In this step, you will create a data load that takes care of moving data from your data warehouse cluster into your GoodData workspace. This process is called Automated Data Distribution (ADD) and it can be deployed to multiple GoodData workspaces.

Follow these steps:

  1. Click Deploy Process.

    The Deploy process to a project screen appears.
  2. Select the data source you want to use and name your process. The Automated Data Distribution option is preselected.
    Note: If you only have a single data source, it is preselected.
  3. Click Deploy.
    When the process ends, the following screen appears:


You are now ready to create a data loading schedule.


Create and Run a New Schedule

To ensure your GoodData analytics is always using the most up-to-date data, you can create schedule to automate data loads between your Redshift cluster and your GoodData workspace. For the purpose of this Getting Started tutorial, you create a manual schedule.

  1. Click the blue Schedule Process button or click Create new schedule in the process's section.

    The New schedule screen appears.
  2. In the New schedule screen, fill in the required details.

    1. Select the process for which you want to create a schedule.
    2. In the Runs dropdown, set the frequency of execution to manually.
    3. Name your schedule.
    4. Leave everything else intact.
  3. Click Schedule.
    The schedule is saved and opens for your preview.
    You are now going to manually run the scheduled process.
  4. Click Run.
  5. Confirm Run.
     
    The schedule is queued for execution and is run as platform resources are available.
    The process may take some time to complete.
  6. Close the dialogue window when the process finishes with a status of OK, it means that the ADD process has loaded the data to your workspace.

Click Analyze Data to see your data loaded in your workspace.

To see, and explore, how the LDM determines your options in creating insights, you can continue with following section.

Create Your First GoodData Insights

Now that you loaded the sample data into your GoodData workspace, you will learn how to create simple insights using Analytical Designer available in the Analyze tab.

To go to Analytical Designer from your Data Integration Console, click your account name in the upper right corner, then click Analyze data.

The goal of this tutorial is to break down raw sales figures by order category and status, and examine the pricing structure of your sales.

As you can see, the columns of the original csv sample file appear in the catalog panel on the left. This is possible thanks to the GoodData’s ability to work directly with human readable data modeling metadata (facts, attributes, column names).

Exercise 1 - Order Category and Order Status

To create your first insight:

  1. Drag and drop Order ID onto the Measures panel. This automatically creates a Count of unique Order IDs.
    Analytical Designer applied Count because the Order ID column was annotated as an attribute instead of as a numerical fact.
  2. In the Compare helper, select Product Category from the drop-down menu and click Apply.
    The number of orders is now split by the product category.
  3. Drag and drop Order Status to the Stack By panel to look into the data in more detail..
    The columns are now further split by the status of the orders.
  4. Click the Save button in the top right corner of the screen to save the insight and name the insight it Orders by Status and Category.
    You have just created your first insight! 

Exercise 2 - Sales Pricing Structure

In the following example, your insights will analyze the pricing structure of your sales - the highest priced items and the price range.

Follow these steps:

  1. Click the Clear button in the toolbar to clear the insight.
  2. Drag and drop Price onto the Measures panel.
    This displays the Sum of all prices on all order lines but it does not consider how many times the products were sold at their price.
    Note: You can apply different mathematical functions to this particular column, because the Price column was annotated as a numerical fact.
  3. In the Measures panel, click the arrow to the left of Sum of Price item, and from the drop-down menu select Average to display the average product price.
  4. Drag Category to the View By panel.

You see that the Outdoor category contains the highest priced items. But what is the range of prices?

  1. In the Measures panel, click the arrow to the left of Avg of Price and change Average to Minimum.
  2. Drag and drop Price to the Measures panel again.
    A new Sum of Price item appears.
  3. Click the arrow to the left of Sum of Price, and from the drop-down menu, change Sum to Maximum. You can now see the range of prices for each category.

You can easily handle many analytical queries without needing to write SQL for individual variations.

Powered by Atlassian Confluence and Scroll Viewport.