Data modeling in GoodData revolves around the concept of a logical data model (LDM). You create and modify a logical data model for a particular workspace using our fully integrated LDM Modeler.
A logical data model (LDM) represents the relationship between data objects in a workspace (also known as a project). The logical data model provides a layer of abstraction so that you do not have to interact with the physical data model.
In the LDM Modeler, you assemble datasets and other objects, customize them, and build the relations between these objects to define data relationships within the workspace.
GoodData platform is designed to enable users to quickly create and modify logical data models for your projects. However, be careful in how you assemble the components of your data model. When possible, deploy experienced modelers to your projects, particularly if they involve multiple interacting datasets or specialized use cases for your data.
If you are new to BI data modeling, invest time and effort to study data modeling practices prior to implementing a production GoodData solution.
Logical Data Model and the GoodData Platform
In GoodData, the entire data model is broken down into two components:
- logical data model
The logical data model describes the fields in use in your project, how they are organized into datasets, and their connections between datasets. Logical data models are created through an intuitive graphical interface, described later.
physical data model
The physical data model is the technical definition of how the data elements in the logical data model are written to the data warehouse. When the logical data model is created and published, the platform automatically creates or updates the physical data model, so that the tables in the data warehouse reflect the logical data model.
On the GoodData platform, data modelers never need to create or directly interact with the physical data model. This level of abstraction enables greater focus on the data relationships and simpler management of the architecture of the project.
You can see the logical data model as the contract between the data loading process and the datamart, and between the datamart and the analytical queries. Your LDM maps the incoming data to the physical data model, which is used to store the content in the data warehouse. The logical data model provides a layer of abstraction so that users do not need to interact with the physical data model.
The logical data model enables a layer of abstraction between the information the GoodData user is accessing and the method by which the data is stored. This layer of abstraction allows continuous improvement of the physical data model and the tools used to access and maintain it without interfering with the user’s definition of the data architecture.
LDM Roles in the Platform
In the GoodData platform, the logical data model is used to perform the following basic functions:
- data loading
When an ETL graph is executed, the GoodData platform references the logical data model to determine how the incoming data is written into the designated dataset.
- data querying
Any request for data submitted from a GoodData interface is passed through the logical data model to retrieve the data and to return it to the querying client.
LDM and SQL data modeling
- In terms of SQL, datasets are set up with left joins from table to table.
- Any dataset can perform calculations on data in its own table, as well as data in any parent tables higher in the model as long as it is connected to it.
The key difference from SQL is that in GoodData, it is not always a good idea to keep tables in the normalized form. For example, a Sales table has a Location ID field that is connected to the location table, which consists of an ID and Name. Instead of creating a 2nd dataset consisting of just the locations, you may find advantages in transforming the data before loading and replacing the Location ID with the Location Name in the Sales table.
GoodData performs all calculations on demand. Minimizing the number of connections improves performance.
Logical Data Model and Your GoodData Workspace
The logical data model describes the relationships between abstracted data elements, sets of data that are organized by logical connection rather than associated by how and where they are stored. In GoodData, these logical sets are called datasets. You can think of datasets as of virtual tables.
Each workspace (also known as project) requires a logical data model. When properly constructed, the LDM defines the datasets and the connections between them, and delivers the power to calculate predefined metrics and reports without forcing the platform to do complicated joins or lookups.
The following image represents a simple logical data model visualization in the LDM Modeler. The model is based on the datasets that are used in the GoodData Demo Workspace.
Logical Data Model Components Overview
A logical data model is built from the following components:
A fact is a numerical piece of data. Values can be arbitrary. Facts may be stored in integer or decimal format. Facts are the data sources for aggregation, which is accomplished by metric function.
An attribute is a field containing a discrete set of alphanumeric or numeric data. For example, you could create an attribute called “Eye Color” containing values Blue, Brown, Green, and Other. Or, for a restaurant database, you could create an attribute called, “Table Size,” which may contain only the values 2, 4, 6, 8, and 10. Attributes are used primarily for slicing metrics in reporting.
A dimension is a set of related attributes.
A dataset is a basic organizational unit of a logical data model and represents a set of related facts and/or attributes.
- connection point
Datasets are associated with each other through a connection point. A connection point functions like a database primary key; it identifies the field in the data that contains information to uniquely identify the data in other fields in the dataset. At the other end of a connection point is a reference, which is the foreign key pointing into the primary identifier for a dataset. Using connection points and references, the logical data model forces Left Joins to define calculation paths.
Connection points create a relation between two datasets.
For a comprehensive description of logical data model components, see the Logical Data Model Components in GoodData section.
Mapping Your Source Data to LDM Components
See how source data from your enterprise systems is mapped to basic types in the logical data model.
Source Data Type
attribute in a dataset
All text values are stored as attributes.
fact in a dataset OR attribute in a dataset
Numbers that you want to aggregate by sum, count, minimum, maximum, or average must be stored as facts.
Numbers that you want to use to slice your data (for example, Table Size) are stored as attributes.
The same input data can be used in both ways.
Date dataset stores date-related information. This specific dataset includes a pre-built hierarchy of attributes and dimensions for aggregating by date.
Data Modeling Tools in GoodData
GoodData LDM Modeler is the primary tool for creating, testing, and deploying logical data models in workspaces on the GoodData platform. You can also use APIs and the form-based interface called Gray Pages.
GoodData LDM Modeler
GoodData LDM Modeler is designed for workspace administrators and is a part of the Data Integration Console (DISC).
To access the modeler:
- Click your name in the top right corner.
- Click Disk Integration Console.
- Select the workspace/project that you want to create the model for.
- Click Model data.
The LDM Modeler interface opens.
Through the LDM Modeler's drag-and-drop interface, you can build datasets, populate them with fields, and create the connections that enable flexible reporting within your workspace.
In GoodData, data modelers do not need to create the physical tables and relationships within the datastore.
GoodData REST APIs
The GoodData REST APIs enable developers to programmatically interact with all aspects of their GoodData domain. User, project, and ETL provisioning can be managed through these structured APIs. Additionally, you can modify individual objects within your project, such as metrics, reports, dashboards, Data Permissions filters, and the logical data model.
To get started with GoodData REST APIs, see API Reference.
To make small changes to the technical definition of your deployed projects, developers can utilize the gray pages, a form-based interface. Select links displayed in the gray pages to navigate the internals of your project definition and use the available form fields to submit changes at the endpoints.
Do not use gray pages as a primary way to interact with your projects, as effective navigation requires an understanding of the technical layout of a project definition and management of the internal identifiers of your workspace. GoodData recommends using other interfaces where possible.
If you are using Google Chrome, the GoodData Extension Tool provides links into the gray pages to simplify navigation from the current project in the GoodData Portal.
To get started with the gray pages, see Access the Gray Pages for a Project.
The CloudConnect data modeler is a part of the legacy CloudConnect application.
For more information about its usage, see Data Modeling Using the CloudConnect Tool.
Review Your Logical Data Model
As the LDM Modeler is typically restricted for workspace administrators, users can review the existing logical data model in the workspace's Manage section.
The following image shows both the Workspace (left) and the LDM Modeler view of a single model.