In this tutorial, you will learn how to load your own data from a BigQuery workspace into your GoodData project. This tutorial expands on the Load Sample Data from BigQuery into GoodData guide.
To load data into your workspace, you will perform the following tasks:
- Open a new GoodData workspace
- Create an output stage
- Create a logical data model (LDM)
- Set up your data source
- Start the load
Log in to your BigQuery workspace with the account that you plan to use with GoodData. Ensure that the user you configured in the data source has all necessary privileges and that your BigQuery workspace can be accessed by GoodData. For more information about the required privileges, see GoodData-BigQuery Integration Details.
BigQuery and GooData integration requires the following role levels: bigquery.dataViewer and bigquery.jobUser.
Ensure, that you have the following information ready:
- BigQuery project and dataset
BigQuery Service Account key in JSON format so you can extract
- private key
- GoodData workspace's Project ID (Find the Project ID)
Open a New GoodData Workspace
Ensure that you are logged into your GoodData account. This tutorial presumes that your GoodData domain is
- GoodData Free users
use the link you received in your GoodData Free confirmation email, for example:
- GoodData Growth users
use the link you received in your GoodData confirmation email, for example:
- White-label customers
use your own white-label domain
For the purpose of this tutorial, you will work with a new workspace (also known as a project). See Create a Workspace (Project).
Create a Data Source
To connect your BigQuery workspace and your GoodData workspace, follow these steps:
- Click your name in the top right corner, select Data Integration Console, then click the Data sources tab.
- Click BigQuery as your data warehouse. Alternatively, click Create data source in the bottom left corner.
The connection parameter screen appears.
- Fill in the required fields.
- Click Test connection. If the connection succeeds, the green confirmation message appears.
The screen with your connection details appears.
Create an Output Stage
The Output Stage is a set of views created in your BigQuery dataset that serve as a source for loading data into your GoodData workspace.
To create the output stage:
- On your Connection screen, click Create in the Create output stage gray tab.
The Create output stage window appears.
- GoodData engine will analyze your data structure and create suggested queries that you execute in your data warehouse.
Note the Output Stage naming conventions option.
- Click Copy to clipboard and paste the queries into your BigQuery SQL client.
- In your BigQuery client, review the suggested SQL DDLs and modify them, if needed, to match your needs and comply with GoodData naming conventions.
- Execute the SQL DDLs.
- Close the Create output stage window to return to your Connection screen.
Create a Logical Data Model from the Output Stage
Before you load data into your workspace, you need a logical data model (LDM) to determine how the data are handled and displayed.
The LDM enables a layer of abstraction between the information that a GoodData user accesses and the method that is used to store data. In this step, you use the Output Stage (view and the column names) to create a logical data model.
Then, you load the model and apply it to your workspace(s).
- On your Connection screen, click Publish into workspace.
Enter or select the workspace into which you want to publish your logical data model.
- Click Select.
- On the screen that appears, select the Preserve data option.
- Click Publish.
If your logical data model is published successfully, the following message appears:
Note: If publishing LDM fails, you will see an error message prompting you to make necessary corrections.
- Click the Visit data load page link.
The Data Load Process screen opens within the Data Integration Console page. Proceed to the next section to load data from the warehouse into your GoodData workspace.
Review and update your LDM
While in the Data Integration Console, click Model data in the top navigation console to open the LDM Modeler interface where you can review and Update a Logical Data Model.
Note: Depending on the complexity and make up of your data, the actual LDM diagram will be different.
Create a Data Load Process
In this step, you will create a data load that takes care of moving data from your BigQuery workspace into your GoodData workspace. This process is called Automated Data Distribution (ADD) and it can be deployed to multiple GoodData workspaces.
Note: The following guide presumes that you have successfully published your logical data model and continue to create a data load process. You can start creating a data load process at any time by clicking Create data load process in the Data process tab.
To continue with the process following creating the logical data model, follow these steps:
- Click Deploy Process.
The Deploy process to a project screen appears. ADD and the data source you created are preselected.
- On the next screen, enter your Process Name of choice.
- Click Deploy.
When the process ends, the following screen appears:
Create and Run a New Schedule
To ensure your GoodData analytics is always using the most up-to-date data, you can create schedule to automate data loads between your BigQuery workspace and your GoodData workspace. For the purpose of this Getting Started tutorial, you create a manual schedule.
- Go to the Data Integration Console and click the Projects tab.
- Select the project that you used in the previous step.
- Click Create new schedule.
The new schedule screen appears.
- Select the process name.
- In the Runs dropdown, set the frequency of execution to manually.
- Leave everything else intact.
- Click Schedule.
The schedule is saved and opens for your preview.
You are now going to manually run the scheduled process.
- Click Run.
- Confirm Run.
The schedule is queued for execution and is run as platform resources are available.
The process may take some time to complete.
- If the schedule fails with errors, fix the errors, and run the schedule again. Repeat until the process finishes with a status of OK, which means that the ADD process has loaded the data to your workspace.
- (Optional) In the Runs dropdown, set the frequency of execution to whatever schedule fits your business needs. Click Save.
The schedule is saved.
Summary and Next Steps
In this tutorial, you successfully:
- Set up the connection between your BigQuery cluster and your GoodData workspace
- Created and scheduled data load (albeit in the manual mode)
- Created a logical data model
Now with your data are in your GoodData workspace, you can either:
- open Analytical Designer to start begin analyzing your data
- or you can review and Update a Logical Data Model