Integrate a Data Source for a Data Warehouse

This article will explain how to connect the Data Source to your GoodData workspace when you store your source data in a cloud data warehouse.

You can directly connect Data Sources for the following data warehouses:

  • Amazon Redshift
  • Google BigQuery
  • Microsoft Azure SQL Database
  • Microsoft Azure Synapse Analytics
  • Microsoft SQL Server
  • PostgreSQL
  • Snowflake

Each data warehouse has different integration requirements. Before you connect a data warehouse to your workspace, ensure that GoodData can communicate with the Data Source created for your data warehouse.

Contents:

Prerequisites

Before you start, make sure that you have the following in place:

  • An active GoodData account that you are logged in to with at least one active workspace
  • A GoodData workspace
    If you do not know how to create one, see Create a Workspace.
  • Access to a supported data warehouse with source data

    To better understand how GoodData processes rows and columns, we recommend that you use our sample data for your first integration. For more information, see Import Sample Data to Your Data Source.

Tasks You Will Perform

In this article, you will complete the following tasks:

  • Create a Data Source.
  • Connect the Data Source with your GoodData workspace.
  • Create a logical data model (LDM).

Create a Data Source

A Data Source is a place in your GoodData workspace that stores the information about the connection with your data warehouse.

Select your source data warehouse and learn what details you need to establish a connection between your GoodData workspace and your data warehouse.

Azure SQL Database

 Click here to expand...

Log in to your Azure SQL Database with the account that you plan to use with GoodData. Ensure that the user that you want to use in the Data Source has all necessary privileges and that your Azure SQL Database can be accessed by GoodData. For more information about the required privileges, see GoodData-Azure SQL Database Integration Details.

Ensure that you have the following information ready:

  • Azure SQL Database username and password
  • Azure SQL Database database and schema
  • GoodData workspace's ID (see Find the Workspace ID)

Azure Synapse Analytics

 Click here to expand...

Log in to your Azure Synapse Analytics with the account that you plan to use with GoodData. Ensure that the user that you want to use in the Data Source has all necessary privileges and that your Azure Synapse Analytics database can be accessed by GoodData. For more information about the required privileges, see GoodData-Azure Synapse Analytics Integration Details.

Ensure that you have the following information ready:

  • Azure Synapse Analytics username and password
  • Azure Synapse Analytics database and schema
  • GoodData workspace's ID (see Find the Workspace ID)

BigQuery

 Click here to expand...

The easiest way to integrate your BigQuery data warehouse and your GoodData workspace is by uploading the Google service account key file during establishing the connection. You can also fill in all the details manually.

The Google's documentation about creating a service account key file will guide you through the process and provide essential information about service account key files.

In short, when you create a service account key file, you will have to:

  • Define the account name
  • Add user roles (bigquery.dataViewer and bigquery.jobUser)
  • Select JSON as the key type

Ensure that you log in to your BigQuery project with the account that you plan to use with GoodData. Ensure that the user that you want to use in the Data Source has all necessary privileges and that your BigQuery project can be accessed by GoodData. For more information about the required privileges, see GoodData-BigQuery Integration Details.

Before you proceed with establishing the connection between BigQuery and GoodData, ensure that you have the following ready:

  • Google service account key file in the JSON format. GoodData will extract the following information:
    • Client email
    • Private key
    • Google project ID
  • BigQuery dataset name

Microsoft SQL Server

 Click here to expand...

Log in to your Microsoft SQL Server with the account that you plan to use with GoodData. Ensure that the user that you want to use in the Data Source has all necessary privileges and that your Microsoft SQL Server can be accessed by GoodData. For more information about the required privileges, see GoodData-Microsoft SQL Server Integration Details.

Ensure that you have the following information ready:

  • Microsoft SQL Server username and password
  • Microsoft SQL Server database and schema
  • GoodData workspace's ID (see Find the Workspace ID)

PostgreSQL

 Click here to expand...

Log in to your PostgreSQL database with the account that you plan to use with GoodData. Ensure that the user that you want to use in the Data Source has all necessary privileges and that your PostgreSQL database can be accessed by GoodData. For more information about the required privileges, see GoodData-PostgreSQL Integration Details.

Ensure that you have the following information ready:

  • PostgreSQL username and password
  • PostgreSQL database and schema
  • GoodData workspace's ID (see Find the Workspace ID)

Redshift

 Click here to expand...

Log in to your Redshift cluster with the account that you plan to use with GoodData. Ensure that the user that you want to use in the Data Source has all necessary privileges and that your Redshift cluster can be accessed by GoodData. For more information about the required privileges, see GoodData-Redshift Integration Details.

Ensure that you have the following information ready:

  • Redshift username and password
  • Redshift database and schema
  • GoodData workspace's ID (see Find the Workspace ID)

Snowflake

 Click here to expand...

Log in to your Snowflake instance with the account that you plan to use with GoodData. Ensure that the user that you want to use in the Data Source has all necessary privileges and that your Snowflake instance can be accessed by GoodData. For more information about the required privileges, see GoodData-Snowflake Integration Details.

Ensure that you have the following information ready:

  • Snowflake username and password
  • Snowflake database, warehouse, and schema
  • GoodData workspace's ID (see Find the Workspace ID)

Steps:

The screenshots in the following steps use the Snowflake Data Source, but the steps are the same for each Data Source.

  1. On the top navigation bar, select Data.

    The LDM Modeler opens.
  2. Click Sources.

    The Data Source page opens.
  3. Select the Data Source to connect to your workspace.
  4. Provide the required information.

    BigQuery only

    • If you have a service account key, click Browse and navigate to the file and upload it.
    • If you want to enter the details manually, click connect manually and enter the details of your connection to the BigQuery project.

  5. Click Test connection.
    The GoodData platform verifies whether it can connect to the data warehouse using the provided information.
    If the connection succeeds, the confirmation message appears.
    No data is loaded from the data warehouse at this point.

  6. (Optional) Provide the prefix for the tables in the Output Stage.

    Use the Output Stage if you cannot or do not want to download the data directly from the production tables in your data warehouse. For more information, see Direct Data Distribution from Data Warehouses and Object Storage Services.

  7. Click Save.
    The Data Source is created. The screen with the connection details opens.

For detailed information about creating Data Sources, see Create a Data Source.

Connect the Data Source with Your GoodData Workspace

Once you verified that the connection between the Data Source and your workspace works, connect it to the LDM Modeler in your workspace.

Steps:

  1. On the Data Source details page, click Connect.

    The LDM Modeler opens. The Data Source is connected to the LDM Modeler, and the tables and views from the data warehouse are listed in the left panel.
  2. To add a dataset, drag a table/view from the left panel and drop it in the blank canvas area.

    A preview of the data similar to the following opens:

  3. Notice what the columns are set to:
    • The order_line_id, order_id, and order_status columns are correctly detected as attributes.
    • The date column is correctly detected as dates in the yyyy-MM-dd format and will be converted to a separate Date dataset.
    • The price and quantity columns are correctly detected as measures (or facts).
  4. If the preview shows a column with numerical values that cannot be used as a numerical constant in an equation (for example, product identifiers) but are detected as measures, set this column to be an attribute.
    In the example above, Product ID is autodetected as a measure but it should be changed to an attribute.

    For more information about the data types and how to set them correctly, see Create a Logical Data Model from CSV Files.

  5. Verify that the information in the preview is correct, and click Import.
    A dataset is added. The LDM Modeler displays the structure of the dataset.

    If the table/view that you have imported contains dates, those dates will be imported as a separate Date dataset. These two datasets will be automatically connected with a relationship.

  6. To create each additional dataset, repeat Steps 2 through 5.
  7. (Optional) Create a relationship between the datasets (see Create a Relationship between Datasets).
  8. Publish the LDM (see Publish your Logical Data Model).

When the publishing is done, you can continue to scheduling a task for loading the data to your workspace (see Load Data from a Data Source to a GoodData Workspace).

For detailed information about creating an LDM based on a data warehouse, see Create a Logical Data Model from Your Cloud Data Warehouse.

Powered by Atlassian Confluence and Scroll Viewport.