In this tutorial, you will learn how to integrate GoodData and your existing Snowflake instance, and load the sample data from Snowflake into your GoodData workspace.
You will use the data from a sample CSV file that you will import into your Snowflake instance.
This integration does not use your Snowflake resources for computation. The actual data processing takes place on the GoodData platform.
Import Sample Data to Snowflake
Download and review the sample CSV file. We suggest that you examine the data first to help you understand how GoodData processes the rows and columns.
Log in to your Snowflake instance with the account that you plan to use with GoodData. Ensure that the user you configured in the data source has all necessary privileges and that your Snowflake instance can be accessed by GoodData. For more information about the required privileges, see GoodData-Snowflake Integration Details.
You will now execute an SQL script that will create a separate Snowflake:
Execute the following SQL script without:
-- Create “gdtrial” database CREATE OR REPLACE DATABASE gdtrial; -- Create “gdtrial” schema CREATE OR REPLACE SCHEMA gdtrial; -- Create “out_csv_order_lines” table CREATE OR REPLACE TABLE gdtrial.out_csv_order_lines ( a__order_line_id VARCHAR(255), a__order_id VARCHAR(255), d__date date, a__order_status VARCHAR(255), a__customer_id VARCHAR(255), a__customer_name VARCHAR(255), a__state VARCHAR(255), a__product_id VARCHAR(255), a__product_name VARCHAR(255), a__category VARCHAR(255), f__price DECIMAL(12,2), f__quantity DECIMAL(15,6) ); -- Create file format describing format of order_lines.csv file CREATE OR REPLACE FILE FORMAT "GDTRIAL"."GDTRIAL".gdtrialfileformat TYPE = 'CSV' COMPRESSION = 'NONE' FIELD_DELIMITER = ',' RECORD_DELIMITER = '\n' SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = 'NONE' TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = 'NONE' ESCAPE_UNENCLOSED_FIELD = '\134' DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N'); -- Create S3 stage with sample data CREATE OR REPLACE STAGE gdstage file_format = gdtrialfileformat url = 's3://gdc-prod-gdtrial/data'; -- Copy sample CSV from S3 into table COPY INTO out_csv_order_lines FROM @gdstage/step1/order_lines.csv file_format = (format_name = gdtrialfileformat);
Prerequisites - Before You Load the Data
Ensure that you are logged into your GoodData account.
Your GoodData Domain
This tutorial presumes that your GoodData domain is
- GoodData Free users
use the link you received in your GoodData Free confirmation email, for example:
- GoodData Growth Users
use the link you received in your GoodData confirmation email, for example:
- White-label customers
use your own white-label domain
Your GoodData Workspace
Before you load any data to the GoodData platform, you must create at least one workspace (also known as a project). The method depends on your GoodData pricing plan.
For details, see Create a Workspace (Project).
Your GoodData workspace is the place where you load your data, create metrics, share data visualizations and dashboards, create ad hoc analyses, and much more. Each workspace has its own project ID.
Create a Data Source
To connect your Snowflake instance and your GoodData workspace, follow these steps:
- Click your name in the top right corner, select Data Integration Console, then click the Data sources tab.
- Select Snowflake as your data warehouse. Alternatively, click Create data source in the bottom left corner.
The connection parameter screen appears.
- Fill in the required fields.
For your Snowflake schema enter
- Click Test connection. If the connection succeeds, the green confirmation message appears.
The screen with your connection details appears.
In the next section, you will publish the logical data model into your workspace.
Create a Logical Data Model
In this particular scenario, you will skip the Create output stage procedure. This is because you already prepared your Snowflake instance by executing our SQL script that prepares the sample data for the Logical Data Model stage. To learn how to work with your own data, see Load Your Own Snowflake Data into GoodData Workspace.
Before you load data into your workspace, you need a logical data model (LDM) to determine how the data are handled and displayed.
The LDM enables a layer of abstraction between the information that a GoodData user accesses and the method that is used to store data.
- On your Connection screen, click Publish into workspace.
Enter or select the workspace into which you want to publish your logical data model.
- Click Select.
- On the screen that appears, select the Preserve data option.
- Click Publish.
If your logical data model is published successfully, the following message appears:
- Click the Visit data load page link.
The Data Load Process screen opens within the Data Integration Console page. Proceed to the next section to load data from the warehouse into your GoodData workspace.
Review your logical data model
While in the Data Integration Console, click Model data in the top navigation console to open the LDM Modeler interface and review your logical data model:
Create Data Loading Process
In this step, you will create a data load that takes care of moving data from your Snowflake instance into your GoodData workspace. This process is called Automated Data Distribution (ADD) and it can be deployed to multiple GooData workspaces.
Note: The following guide presumes that you have successfully published your logical data model and continue to create a data load process. You can start creating a data load process at any time by clicking Create data load process in the Data process tab.
To continue with the process following creating the logical data model, follow these steps:
- Click Deploy Process.
The Deploy process to a project screen appears.
- Select the data source you want to use. The Automated Data Distribution option is preselected.
Note: If you only have a single data source, it is preselected.
- On the next screen, enter your Process Name of choice.
- Click Deploy.
When the process ends, the following screen appears:
Create and Run a New Schedule
To ensure your GoodData analytics is always using the most up-to-date data, you can create schedule to automate data loads between your Snowflake instance and your GoodData workspace. For the purpose of this Getting Started tutorial, you create a manual schedule.
- Go to the Data Integration Console and click the Projects tab.
- Select the project that you used in the previous step.
- Click Create new schedule.
The new schedule screen appears.
- Select the process name.
- In the Runs dropdown, set the frequency of execution to manually.
- Leave everything else intact.
- Click Schedule.
The schedule is saved and opens for your preview.
You are now going to manually run the scheduled process.
- Click Run.
- Confirm Run.
The schedule is queued for execution and is run as platform resources are available.
The process may take some time to complete.
- When the process finishes with a status of OK, it means that the ADD process has loaded the data to your workspace.
- (Optional) In the Runs dropdown, set the frequency of execution to whatever schedule fits your business needs. Click Save.
The schedule is saved.
Create Your First GoodData Insights
To go to Analytical Designer from your Data Integration Console, click your account name in the upper right corner, then click Analyze data.
The goal of this tutorial is to break down raw sales figures by order category and status, and examine the pricing structure of your sales.
As you can see, the columns of the original csv sample file appear in the catalog panel on the left. This is possible thanks to the GoodData’s ability to work directly with human readable data modeling metadata (facts, attributes, column names).
Exercise 1 - Order Category and Order Status
To create your first insight:
- Drag and drop Order ID onto the Measures panel. This automatically creates a Count of unique Order IDs.
Analytical Designer applied Count because the Order ID column was annotated as an attribute instead of as a numerical fact.
- In the Compare helper, select Product Category from the drop-down menu and click Apply.
The number of orders is now split by the product category.
- Drag and drop Order Status to the Stack By panel to look into the data in more detail..
The columns are now further split by the status of the orders.
- Click the Save button in the top right corner of the screen to save the insight and name the insight it Orders by Status and Category.
You have just created your first insight!
Exercise 2 - Sales Pricing Structure
In the following example, your insights will analyze the pricing structure of your sales - the highest priced items and the price range.
Follow these steps:
- Click the Clear button in the toolbar to clear the insight.
- Drag and drop Price onto the Measures panel.
This displays the Sum of all prices on all order lines but it does not consider how many times the products were sold at their price.
Note: You can apply different mathematical functions to this particular column, because the Price column was annotated as a numerical fact.
- In the Measures panel, click the arrow to the left of Sum of Price item, and from the drop-down menu select Average to display the average product price.
- Drag Category to the View By panel.
You see that the Outdoor category contains the highest priced items. But what is the range of prices?
- In the Measures panel, click the arrow to the left of Avg of Price and change Average to Minimum.
- Drag and drop Price to the Measures panel again.
A new Sum of Price item appears.
- Click the arrow to the left of Sum of Price, and from the drop-down menu, change Sum to Maximum. You can now see the range of prices for each category.
You can easily handle many analytical queries without needing to write SQL for individual variations.
Now that you created your first insights using our sample data, you can either:
- turn the insights into KPI Dashboards
- Load Your Own Snowflake Data into GoodData Workspace
- or you can Update a Logical Data Model