Platform Architecture

The GoodData Platform has been built from the ground up using modern distributed computing principles. Platform components are grouped into several internal services with independent release cycles. This section describes the key subset of available services.

The Cloud Infrastructure layer provides a unified infrastructure (including operating system, monitoring infrastructure, logging, and deployment) for the upper stack services.

The individual services interact with each other using a well-defined REST APIs, as well as GoodData Compute Fabric (GCF) tasks. The GoodData Compute Fabric provides a scalable, highly available asynchronous task distribution and orchestration mechanism. It controls the lifecycle of service-specific workers. Workers are remote processes handling the tasks. Individual workers can be added and removed dynamically, enabling elasticity and horizontal scaling of the platform. Shared Storage (DFS) is used for passing states between tasks and services.

The Cloud Control Center (C3) is a service maintaining information about projects, users, services and their associations.

Higher-level services exist on top of these core services supporting various analytical use-cases. These services are exposed through a public REST API.

Extensible Analytics Engine (XAE) translates multidimensional queries on top of a logical data model into a highly optimized tree of DB queries. These queries are executed on the physical model of the underlying analytical database (workspace). The engine enables multi-level caching, so that each node of the query tree can be persisted for later reuse.

Workspace (also known as Project) provides storage for analytical data, metadata, and materialized query caches. They enable real-time queries and visualizations and can be scaled horizontally on a per-project basis.

Data Integration Services provide the ETL engine that enables the development and automated execution of ETL processes. It allows extraction of data from a vast number of external systems through provided connectors for common third party services such as Salesforce, Facebook, Zendesk or Google Analytics, as well as generic REST, SOAP, JDBC connectors and so on. The ETL service then transforms the data and loads it into GoodData platform for real-time querying. As part of the Data Integration Services we provide runtime for running Clover transformation graphs as well as custom Ruby code.

Agile Data Warehousing Service (ADS) provides scalable storage for all customer data and enables data cleansing, pre-aggregations, snapshotting, and other transformations. These are executed in the warehouse itself before the data is loaded into the datamarts. Data in ADS can be accessed directly using JDBC.

End-users can access the GoodData Platform through the GoodData Client, a pure JavaScript (HTML5) application running in a web browser. The Client adheres to the Model/View/Controller architecture and interacts with the platform via the public REST API. It uses a combination of synchronous HTTP requests and asynchronous polling for long-running operations.

For creation and deployment of ETL graphs and logical models, users can use a desktop application called CloudConnect Designer. The application is written in Java, based on Eclipse Rich Client Platform.

Simplified Deployment Schema