Load Sample Data from BigQuery into GoodData

In this tutorial, you will learn how to integrate GoodData and your existing BigQuery project, and load the sample data from BigQuery into your GoodData workspace.

You will use the data from a sample CSV file that you will import into your BigQuery project.

This integration does not use your BigQuery resources for computation. The actual data processing takes place on the GoodData platform.

BigQuery and GooData integration requires the following role levels: bigquery.dataViewer and bigquery.jobUser.


Contents:

Import Sample Data to BigQuery

Download and review the sample CSV file. We suggest that you examine the data first to help you understand how GoodData processes the rows and columns.

Ensure that your BigQuery project can be accessed by GoodData and do the following:

  • create a BigQuery Service Account
  • create a new dataset named gdtrial
  • in the dataset, create a new table based on the sample CSV
    • name the destination table out_order_lines
    • load the following schema

      a__order_line_id:STRING,
      a__order_id:STRING,
      d__date:DATE,
      a__order_status:STRING,
      a__customer_id:STRING,
      a__customer_name:STRING,
      a__state:STRING,
      a__product_id:STRING,
      a__product_name:STRING,
      a__category:STRING,
      f__price:FLOAT,
      f__quantity:FLOAT
    • set the header rows option to skip 1

Prerequisites - Before You Load the Data

Ensure that you are logged into your GoodData account.

Your GoodData Domain

This tutorial presumes that your GoodData domain is secure.gooddata.com.

  • GoodData Free users
    use the link you received in your GoodData Free confirmation email, for example: https://free123ab.na.gooddata.com .
  • GoodData Growth Users
    use the link you received in your GoodData confirmation email, for example: https://yourcompanyname.na.gooddata.com .
  • White-label customers
    use your own white-label domain

Your GoodData Workspace

Your GoodData workspace (also known as a project) is the place where you load your data, create metrics, share data visualizations and dashboards, create ad hoc analyses, and much more. Each workspace has its own project ID.

To learn how to create a workspace:

 GoodData Free users - click to expand

Your GoodData account comes with five data-ready workspaces. For this tutorial, select any empty workspace.

Once you select a workspace to work with, you can easily rename it in the Manage section:

 GoodData Growth users - click to expand

Your GoodData Growth account allows you to create ten or more workspaces (projects). To create a workspace, you must have the authorization token. For more details, see Find the Project Authorization Token.

Steps:

  1. Click the Add workspace button on your welcome screen.

  2. Enter the name of your workspace and your authorization token.

  3. Click Create.

    Your workspace opens.

To return to your welcome screen and create another workspace, click your name in the upper right corner, click Account, then click Active Project.

You can easily rename the workspace (project) it in the Manage section:

 GoodData Enterprise users - click to expand

To create a new project, also called workspace, you must have the authorization token. For more details, see Find the Project Authorization Token.

If you do not have an authorization token, contact GoodData Support.

This procedure assumes that your domain is https://secure.gooddata.com.

If you are a white-labeled customer, replace secure.gooddata.com with your white-labeled domain in the procedure steps when needed. GoodData Free and Growth users, use the domain that you received in your introduction email, such as https://free123ab.na.gooddata.com.

Steps:

  1. Go to https://secure.gooddata.com/gdc/projects.
    The page for creating a project opens.
  2. In the Title field, enter the name for the new project.

  3. In the Authorization Token field, enter your authorization token.

    Do not enter any information for the summary.

    Leave the other project settings at their defaults.

  4. Click Submit.
    The project/workspace is created and the page with the project's URL opens.
    The project/workspace is immediately available on the GoodData Portal.

To access your workspaces and find their project IDs:

  1. Click your name in the top right corner. 
  2. Click Data Integration Console.
  3. Click Projects.
  4. Click the workspace (project) that you want to use.
    The Project ID is displayed

Tip: Project ID is also a part of your project's URL, for example, https://secure.gooddata.com/admin/disc/#/projects/example12345678987654321 .

You will need your domain URL and the project ID of a workspace to complete multiple steps in the following procedures. We suggest that you note them down.

Gray Pages

You will perform several steps of this tutorial in gray pages - our simple graphical user interface for executing API calls.

Create a Data Load Process

In this step, you create a data load process that takes care of moving data from your BigQuery project into your GoodData workspace. This process is called Automated Data Distribution (ADD).

Ensure that you have the following information ready:

  • BigQuery project and dataset
  • BigQuery Service Account key in JSON format so you can extract

    • client e-mail

    • private key

Steps:

  1. Go to the Data Integration Console and click the Projects tab.
  2. Click on the workspace that you want to use.
    The workspace's page opens.

  3. Click Deploy process.
    The Deploy process to a project screen appears.
  4. Click Create data source.
    The New Data Source screen appears.
  5. Fill in the following fields.
    • Data Warehouse Type: select Google BigQuery
    • Client e-mail: enter your client e-mail
      Example:

      bigquerytest@acme-project....
    • Your BigQuery project
    • Your BigQuery dataset: enter gdtrial
  6. Click Validate Connection.
     
  7. When your connection is validated, fill in the remaining two fields:
    • Output Stage Prefix: for the purpose of this tutorial, enter out_
    • Data Source Name: enter any name you like
  8. Click Create.
    The Deploy process to project screen reappears.
  9. Enter your process name.
  10. Click Deploy.
    When the process ends, the following screen appears:
  11. Click the Metadata tab and make a note of the Data Source ID. You will use it to create a logical data model.

Create a Logical Data Model (LDM)

Before you load data into your workspace, you need a logical data model (LDM) to determine how the data is handled and displayed.

For the purpose of this tutorial, we have already devised an LDM.

To load the sample LDM into your workspace, you execute a DDL script which creates the LDM within the workspace.

Ensure, that you have the following information ready:

  • Data Source ID

     If you did not note down the Data Source ID in the previous section...
    Follow these steps:
    1. Click your name in the top right corner and select Data Integration Console.
    2. Click the Projects tab.
    3. Click the relevant Project Name.
      The list of your processes appears.
    4. Click the process you are using and select the Metadata tab.
       
      Find the Data Source ID at the bottom of the window.

To create the logical data model:

  1. Go to the page for generating the logical data model:

    https://secure.gooddata.com/gdc/dataload/dataSources/{Data_Source_id}/generateModel

    where {Data_Source_id} is the identifier of the Data Source.
    The link for polling for status of the LDM generation appears.

  2. Poll (click the link repeatedly) for the status until the gray page returns the JSON structure describing the logical data model.
    Here is an example of the JSON structure:

    {
      "projectModelView": {
        "model":
          {
            "projectModel": {
              "datasets": [ {
                "dataset": {
                  "identifier": "dataset.orders",
                  "title": "orders",
                  "anchor": {
                    "attribute": {
                      "identifier": "attr.orders.id",
                      "title": "id",
                      "folder": "orders",
                      "labels": [ {
                        "label": {
                          "identifier": "label.orders.id",
                          "title": "id",
                          "type": "GDC.text",
                          "dataType": "VARCHAR (128)"
                        }
                      } ]
                    }
                  },
                  "attributes": [ {
                    "attribute": {
                      "identifier": "attr.orders.description",
                      "title": "description",
                      "folder": "orders",
                      "labels": [ {
                        "label": {
                          "identifier": "label.orders.description",
                          "title": "description",
                          "type": "GDC.text",
                          "dataType": "VARCHAR (128)"
                        }
                      } ]
                    }
                  } ],
                  "facts": [ {
                    "fact": {
                      "identifier" : "fact.orders.price",
                      "title": "price",
                      "folder": "orders",
                      "dataType": "DECIMAL (12, 2)"
                    }
                  } ],
                  "production": true
                }
              }
            ]
          }
        }
      }
    }
  3. Scroll down to the bottom of the JSON page until you see the following form:

  4. Enter the project ID of your workspace and select the Preserve data option.

  5. Click Publish.
    The polling screen appears.

  6. When the process finishes successfully, the JSON structure appears. The logical data model is created within your workspace.

You can review the logical data model in the Manage section:

  1. Go to your workspace.
  2. Click Manage in the top navigation tab.
  3. On the Data tab, click Model.
    The LDM outline appears:

Create and Run a New Schedule

To ensure your GoodData analytics is always using the most up-to-date data, you can create a schedule to automate data loads from your BigQuery dataset to your GoodData workspace. For the purpose of this Getting Started tutorial, you create a schedule that you run manually.

Steps:

  1. Go to the Data Integration Console and click the Projects tab.
  2. Select the workspace that you used in the previous step.
  3. Click Create new schedule.
    The New schedule screen appears.
  4. Select the process name.
  5. In the Runs dropdown, set the frequency of execution to manually.
  6. Leave everything else intact.
  7. Click Schedule.
    The schedule is saved and opens for your preview.
    You are now going to manually run the scheduled process.
  8. Click Run.
  9. Confirm Run.

    The schedule is queued for execution and is run as platform resources are available.
    The process may take some time to complete.
  10. When the process is completed, you will see a green tick-off confirmation icon:

Create Your First GoodData Insights

Now that you loaded the sample data into your GoodData workspace, you will learn how to create simple insights using Analytical Designer available in the Analyze tab.

To go to Analytical Designer from your Data Integration Console, click your account name in the upper right corner, then click Analyze data.

The goal of this tutorial is to break down raw sales figures by order category and status, and examine the pricing structure of your sales.

As you can see, the columns of the original csv sample file appear in the catalog panel on the left. This is possible thanks to the GoodData’s ability to work directly with human readable data modeling metadata (facts, attributes, column names).

Exercise 1 - Order Category and Order Status

To create your first insight:

  1. Drag and drop Order ID onto the Measures panel. This automatically creates a Count of unique Order IDs.
    Analytical Designer applied Count because the Order ID column was annotated as an attribute instead of as a numerical fact.
  2. In the Compare helper, select Product Category from the drop-down menu and click Apply.
    The number of orders is now split by the product category.
  3. Drag and drop Order Status to the Stack By panel to look into the data in more detail..
    The columns are now further split by the status of the orders.
  4. Click the Save button in the top right corner of the screen to save the insight and name the insight it Orders by Status and Category.
    You have just created your first insight! 

Exercise 2 - Sales Pricing Structure

In the following example, your insights will analyze the pricing structure of your sales - the highest priced items and the price range.

Follow these steps:

  1. Click the Clear button in the toolbar to clear the insight.
  2. Drag and drop Price onto the Measures panel.
    This displays the Sum of all prices on all order lines but it does not consider how many times the products were sold at their price.
    Note: You can apply different mathematical functions to this particular column, because the Price column was annotated as a numerical fact.
  3. In the Measures panel, click the arrow to the left of Sum of Price item, and from the drop-down menu select Average to display the average product price.
  4. Drag Category to the View By panel.

You see that the Outdoor category contains the highest priced items. But what is the range of prices?

  1. In the Measures panel, click the arrow to the left of Avg of Price and change Average to Minimum.
  2. Drag and drop Price to the Measures panel again.
    A new Sum of Price item appears.
  3. Click the arrow to the left of Sum of Price, and from the drop-down menu, change Sum to Maximum. You can now see the range of prices for each category.

You can easily handle many analytical queries without needing to write SQL for individual variations.

Next Steps

Now that you created your first insights using our sample data, you can either: