GoodData offers a comprehensive data load pipeline that covers extracting data from data sources, transforming the data according to predefined rules, and distributing it to your workspaces, where you can immediately start consuming it to create reports, insights, and dashboards.
The GoodData data preparation and distribution model is based on the data hub concept (see https://en.wikipedia.org/wiki/Data_hub).
Data pipeline components
The GoodData data pipeline consists of the following components:
All data pipeline components are an integral part of the GoodData platform, are hosted and run at GoodData's side.
The service workspace is a workspace where you prepare data for all client workspaces. To prepare your data, you deploy bricks such as downloaders, integrators and executors.
Data Integration Console
Data Integration Console (DISC) allows you to automate, orchestrate, and monitor your GoodData Integration. All data pipeline components are consolidated into a workflow that consists of logically interconnected components. For more information, see Data Integration Console.
A GoodData brick is a component that performs a specific task based on the brick's type.
For more information about the bricks, see Brick Reference.
Agile Data Warehousing Service
Agile Data Warehousing Service, or ADS (also known as 'Data Warehouse'), is a fully managed, columnar data warehousing service for the GoodData platform. For more information about ADS functions and configurations, see Data Warehouse Reference.
You can use SQL Executor along with ADS to orchestrate the flow of SQL transformations to pre-aggregate, de-normalize, or process time-series data, build snapshots, consolidate multiple data sources, and so on.
Big Data Storage
Big Data Storage (BDS) is a part of the staging area sitting between the data sources and ADS. BDS uses the Amazon S3 object storage and acts as permanent storage for data extracts.
Automated Data Distribution
Automated Data Distribution (ADD) is a GoodData component that enables you to quickly upload data from ADS and distribute it to one or multiple workspaces. For more information about ADD, see Automated Data Distribution Reference.
Client workspace (also known as 'project' or 'datamart') is a set of interrelated data sets and metrics, reports, and dashboards built on top of those data sets. You can have one or multiple workspaces. For more information about workspaces, see Workspace Hierarchy.
How the data pipeline components interact
The following high-level process outlines how the data pipeline works and how its components interact:
Read next: Before You Start