Load Your Own BigQuery Data into GoodData Workspace

In this tutorial, you will learn how to load your own data from a BigQuery workspace into your GoodData project. This tutorial expands on the Load Sample Data from BigQuery into GoodData guide.

To load data into your workspace, you will perform the following tasks:

  1. Open a new workspace
  2. Create a data load
  3. Set up your data source
  4. Create an output stage
  5. Create a logical data model (LDM)
  6. Start the load

To complete the process, you will execute several API calls using our gray pages, our UI for APIs. You will be asked to provide details such as your IDs for both BigQuery and GoodData.

Ensure that your BigQuery workspace can be accessed by GoodData. For more information about the required privileges, see GoodData-BigQuery Integration Details.

BigQuery and GooData integration requires the following role levels: bigquery.dataViewer and bigquery.jobUser.

Contents:

Open a New GoodData Workspace

Ensure that you are logged into your GoodData account.

This tutorial presumes that your GoodData domain is secure.gooddata.com.

  • GoodData Free users
    use the link you received in your GoodData Free confirmation email, for example: https://free123ab.na.gooddata.com .
  • GoodData Growth users
    use the link you received in your GoodData confirmation email, for example: https://yourcompanyname.na.gooddata.com .
  • White-label customers
    use your own white-label domain

For the purpose of this tutorial, you will work with a new workspace (also known as a project).

To create a workspace:

 GoodData Free users - click to expand

Your GoodData account comes with five data-ready workspaces. For this tutorial, select any empty workspace.

Once you select a workspace to work with, you can easily rename it in the Manage section:

 GoodData Growth users - click to expand

Your GoodData Growth account allows you to create ten or more workspaces (projects). To create a workspace, you must have the authorization token. For more details, see Find the Project Authorization Token.

Steps:

  1. Click the Add workspace button on your welcome screen.

  2. Enter the name of your workspace and your authorization token.

  3. Click Create.

    Your workspace opens.

To return to your welcome screen and create another workspace, click your name in the upper right corner, click Account, then click Active Project.

You can easily rename the workspace (project) it in the Manage section:

 GoodData Enterprise users - click to expand

To create a new project, also called workspace, you must have the authorization token. For more details, see Find the Project Authorization Token.

If you do not have an authorization token, contact GoodData Support.

This procedure assumes that your domain is https://secure.gooddata.com.

If you are a white-labeled customer, replace secure.gooddata.com with your white-labeled domain in the procedure steps when needed. GoodData Free and Growth users, use the domain that you received in your introduction email, such as https://free123ab.na.gooddata.com.

Steps:

  1. Go to https://secure.gooddata.com/gdc/projects.
    The page for creating a project opens.
  2. In the Title field, enter the name for the new project.

  3. In the Authorization Token field, enter your authorization token.

    Do not enter any information for the summary.

    Leave the other project settings at their defaults.

  4. Click Submit.
    The project/workspace is created and the page with the project's URL opens.
    The project/workspace is immediately available on the GoodData Portal.

Create Data Loading Process

In this step, you will create a data load that takes care of moving data from your BigQuery workspace into your GoodData workspace. This process is called Automated Data Distribution (ADD).

Ensure, that you have the following information ready:

  • GoodData workspace's Project ID (Find the Project ID)
  • BigQuery project and dataset
  • BigQuery Service Account key in JSON format so you can extract

    • client e-mail

    • private key

Steps:

  1. Go to the Data Integration Console and click the Projects tab.
  2. Click the workspace that you want use.
    The workspace's page opens.

  3. Click Deploy process.
    The Deploy process to a project screen appears.
  4. Click Create data source.
    The New Data Source screen appears.
  5. Fill in the following fields.
    • Data Warehouse Type: select Google BigQuery
    • Client e-mail: enter your client e-mail:
      Example:

      bigquerytest@acme-project....
    • Your BigQuery project and database
  6. Click Validate Connection.
     
  7. When your connection is validated, fill in the remaining two fields:
    • Output Stage Prefix: enter any prefix you like
    • Data Source Name: enter any name you like
  8. Click Create.
    The Deploy process to project screen reappears.
  9. Enter your process name.
  10. Click Deploy .
    When the process ends, the following screen appears:
  11. Click the Metadata tab and make a note of the Data Source ID. You will use it to create a logical data model.

Create an Output Stage

The Output Stage (OS) is a set of views created in your BigQuery dataset that serve as a source for loading data into your GoodData workspace.

To help you with creating an OS, we use an API that inspects all tables within the dataset defined in your data source and suggests you a SQL DDL script to create views. You will execute those SQL DDLs on top of your dataset.

Ensure, that you have the following information ready:

  • Data Source ID

     If you did not note down the Data Source ID in the previous section...
    Follow these steps:
    1. Click your name in the top right corner and select Data Integration Console.
    2. Click the Projects tab.
    3. Click the relevant Project Name.
      The list of your processes appears.
    4. Click the process you are using and select the Metadata tab.

      Find the Data Source ID at the bottom of the window.

To create the output stage:

  1. Go to the page for generating the Output Stage:

    https://secure.gooddata.com/gdc/dataload/dataSources/{Data_Source_id}/generateOutputStage

    where {Data_Source_id} is the identifier of the Data Source that you created at the previous step (Create Data Loading Process).

  2. Click Generate output stage.

    The link for polling for status of the Output Stage generation displays.
  3. Poll (click the link repeatedly) for the status until the page returns SQL DDLs to create GoodData-compatible views.
    Here is an example of an SQL DDLs:

    USE DATABASE "ACME";
    USE SCHEMA "ANALYTICS";
    --------------------------------------------------
    -- out_orders --
    --------------------------------------------------
    CREATE OR REPLACE VIEW "out_orders" AS SELECT
    
            "id" AS "cp__id",
            "customerid" AS "r__customers",
            "description" AS "a__description",
            "price" AS "f__price",
            "modified_at" AS "d__modified_at"
    
            , x__timestamp AS x__timestamp /* Remove this line if it's not incremental load */
            , x__client_id AS x__client_id /* Remove this line if loading into all client workspaces */
            , x__deleted AS x__deleted /* Remove this line if inserting only */
    
    FROM "orders";
  4. Copy the returned SQL DDLs and paste them into your BigQuery SQL client.
  5. Review the suggested SQL DDLs and modify them, if needed.

    Review the table at the end of this section to ensure the columns in your output stage comply with GoodData conventions.

  6. Execute the SQL DDLs.

The views must comply with the following naming conventions of the GoodData Output Stage:

Element typeColumn prefixColumn name formatExampleNotes
Attributea__a__{attribute_name}a__name
Factf__f__{fact_name}f__price
Primary key (connection point)cp__cp__{attribute_name}cp__id
Foreign key (reference)r__r__{referenced_table_name}

r__orders


Dated__

d__{date_dimension_name}

d__orderdate

If you want to use a shared date dimension for more tables/views, use the same {date_dimension_name}.

Create a Logical Data Model from the Output Stage

Before you load data into your workspace, you need a logical data model (LDM) to determine how the data are handled and displayed.

The LDM enables a layer of abstraction between the information that a GoodData user accesses and the method that is used to store data. We make the process of creating a workspace LDM easy by reading the output stage (we recognize the type of LDM element based on Output Stage view and column names) and generating the model in a JSON format.

Then, you load the model and apply it to your workspace(s). In this guide, the process of creating an LDM is split into several steps brings better flexibility and greater opportunities to customize the model at any step.

Ensure, that you have the following information ready:

  • Data Source ID

Steps:

  1. Go to the page for generating the logical data model:

    https://secure.gooddata.com/gdc/dataload/dataSources/{Data_Source_id}/generateModel

    where {Data_Source_id} is the identifier of the Data Source.
    The link for polling for status of the LDM generation appears.

  2. Poll (click the link repeatedly) for the status until the gray page returns the JSON structure describing the logical data model.
    Note:
    Review warnings if they appear and modify your output stage SQL.
    Here is an example of the JSON structure:

    {
      "projectModelView": {
        "model":
          {
            "projectModel": {
              "datasets": [ {
                "dataset": {
                  "identifier": "dataset.orders",
                  "title": "orders",
                  "anchor": {
                    "attribute": {
                      "identifier": "attr.orders.id",
                      "title": "id",
                      "folder": "orders",
                      "labels": [ {
                        "label": {
                          "identifier": "label.orders.id",
                          "title": "id",
                          "type": "GDC.text",
                          "dataType": "VARCHAR (128)"
                        }
                      } ]
                    }
                  },
                  "attributes": [ {
                    "attribute": {
                      "identifier": "attr.orders.description",
                      "title": "description",
                      "folder": "orders",
                      "labels": [ {
                        "label": {
                          "identifier": "label.orders.description",
                          "title": "description",
                          "type": "GDC.text",
                          "dataType": "VARCHAR (128)"
                        }
                      } ]
                    }
                  } ],
                  "facts": [ {
                    "fact": {
                      "identifier" : "fact.orders.price",
                      "title": "price",
                      "folder": "orders",
                      "dataType": "DECIMAL (12, 2)"
                    }
                  } ],
                  "production": true
                }
              }
            ]
          }
        }
      }
    }
  3. Scroll down to the bottom of the page with the JSON until you see the following form appears:

  4. Enter your project ID o your workspace and select the Preserve data option.

  5. Click Publish.
    The polling screen appears.

  6. When the process finishes successfully, the JSON structure appears. The logical data model is created within your workspace.
    If an error screen appears, review the errors and modify the output stage SQL. Then run the process again.

You can review the logical data model in the Manage section.

Steps:

  1. Go to your workspace.
  2. Click Manage in the top navigation tab.
  3. On the Data tab, click Model.
    The LDM outline appears.
    Example:

    Note: Depending on the complexity and make up of your data, the actual LDM diagram will be different.

Steps:

  1. Go to the Data Integration Console and click the Projects tab.
  2. Select the workspace that you used in the previous step.
  3. Click Create new schedule.
    The New schedule screen appears.
  4. Select the process name.
  5. In the Runs dropdown, set the frequency of execution to manually.
  6. Leave everything else intact.
  7. Click Schedule.
    The schedule is saved and opens for your preview.
    You are now going to manually run the scheduled process.
  8. Click Run.
  9. Confirm Run.

    The schedule is queued for execution and is run as platform resources are available.
    The process may take some time to complete.
  10. If the schedule fails with errors, fix the errors, and run the schedule again. Repeat until the process finishes with a status of OK, which means that the ADD process has loaded the data to your workspace.

  11. (Optional) In the Runs dropdown, set the frequency of execution to whatever schedule fits your business needs. Click Save.
    The schedule is saved.

Next Steps

In this tutorial, you successfully:

  • Set up the connection between your BigQuery workspace and your GoodData workspace
  • Created and scheduled data load (albeit in the manual mode)
  • Created a logical data model

Now with your data are in your GoodData workspace, open Analytical Designer and start analyzing.

For a comprehensive advanced guide on data warehouse integration options, review the Integrate Data Warehouses Directly to GoodData based on Your DW Schema section.