This article provides a basic overview of how data modeling is executed in GoodData.
All BI platforms require the creation and deployment of a data model.
- In most general terms, the data model defines how data is stored in the datastore.
- At the conceptual level, it defines the fields in the datamart and their relationships.
- At the technical level, it defines how the fields are stored in the tables in the database and how those database tables are connected to each other.
While the GoodData platform has been designed to enable rapid creation of data models for your projects, care must be taken in how you assemble the components of your data model. Wherever possible, deploy experienced modelers to your projects, particularly if they involve multiple interacting datasets or specialized use cases for your data. Depending on your experience, further education might be required.
Most data models fall into one of two categories of design:
- star schema
- snowflake schema
Each schema has comparative strengths and weaknesses, and preferred use cases.
Your choice of the schema can have a significant impact on the flexibility and overall capabilities of your entire project. Before you begin, verify that you established the success criteria for your project and evaluated the two schemas in detail. For more information on schema topics, see Recommended Reading.
A star schema design is built around the concept of flatness, in which several parent tables are connected to several child tables. Star schemas are generally used in situations where multiple datasets are split by the same set of parent tables.
For example, a star schema can be used when a business needs to connect datasets like appointments, sales and invoices to location, representative, and product. In this model, the sales transactions dataset would be connected to all three.
In GoodData, connections between datasets are created through a connection point. For more information, see LDM Components.
A snowflake scheme design establishes a more hierarchical architecture. In snowflake schemas, chains of tables are all connected through this hierarchy.
For example, a snowflake schema is useful in chaining datasets for departments, budgets, invoices, and transactions. All of these datasets are connected using left joins to each other in a row.
Logical Data Model (LDM) vs. Physical Data Model
In GoodData, the entire data model is broken down into two components:
- logical data model
The logical data model describes the fields in use in your project, how they are organized into datasets, and their connections between datasets. Logical data models are created through an intuitive graphical interface, described later.
physical data model
The physical data model is the technical definition of how the data elements in the logical data model are written to the data warehouse. When the logical data model is created and published, the platform automatically creates or updates the physical data model, so that the tables in the data warehouse reflect the logical data model.
When building projects in the GoodData platform, data modelers never need to create or directly interact with the physical data model. This level of abstraction enables greater focus on the data relationships and simpler management of the architecture of the project.
The logical data model is the contract between the data loading process and the datamart and between the datamart and the analytical queries. It maps the incoming data to the physical data model, which is used to store the content in the data warehouse. The logical data model provides a layer of abstraction so that users do not need to interact with the physical data model.
The logical data model enables a layer of abstraction between the information the GoodData user is accessing and the method by which the data is stored. This layer of abstraction allows continuous improvement of the physical data model and the tools used to access and maintain it without interfering with the user’s definition of the data architecture.
Logical Data Model
The logical data model describes the relationships between abstracted data elements, sets of data that are organized by logical connection rather than associated by how and where they are stored. In GoodData, these logical sets are called datasets.
Each project requires a logical data model. When properly constructed, the LDM defines the datasets and the connections between them, and delivers the power to calculate predefined metrics and reports without forcing the platform to do complicated joins or lookups.
In the following image, the three boxes define datasets of different types. More information on these types is provided later.
|yellow||A dataset that contains at least one fact, which is numerical data stored for purposes of creating metrics in the project.|
|green||A dataset that contains attributes, which are numerical or text-based fields used to slice data in the project.|
|blue||GoodData also supports a special data model object for managing time-based data. The Date dataset can be added to your projects to manage attribute information and to enable aggregation at the day, week, month, quarter, and year level.|
LDM Roles in the Platform
In the GoodData platform, the logical data model is used to perform the following basic functions:
- data loading
When an ETL graph is executed, the GoodData platform references the logical data model to determine how the incoming data is written into the designated dataset.
- data querying
Any request for data submitted from a GoodData interface is passed through the logical data model to retrieve the data and to return it to the querying client.
LDM and SQL data modeling
- In terms of SQL, datasets are set up with left joins from table to table.
- Any dataset can perform calculations on data in its own table, as well as data in any parent tables higher in the model as long as it is connected to it.
The key difference from SQL is that in GoodData, it is not always a good idea to keep tables in the normalized form. For example, a Sales table has a Location ID field that is connected to the location table, which consists of an ID and Name. Instead of creating a 2nd dataset consisting of just the locations, you may find advantages in transforming the data before loading and replacing the Location ID with the Location Name in the Sales table.
GoodData performs all calculations on demand. Minimizing the number of connections improves performance.
The following are the key components of the logical data model:
|fact||A fact is a numerical piece of data. Values can be arbitrary. Facts may be stored in integer or decimal format. Facts are the data sources for aggregation, which is accomplished by metric function.|
|attribute||An attribute is a field containing a discrete set of alphanumeric or numeric data. For example, you could create an attribute called “Eye Color” containing values Blue, Brown, Green, and Other. Or, for a restaurant database, you could create an attribute called, “Table Size,” which may contain only the values 2, 4, 6, 8, and 10. Attributes are used primarily for slicing metrics in reporting.|
|dimension||In GoodData, a set of related attributes is called a dimension.|
|connection point||Datasets are associated with each other through a connection point. A connection point functions like a database primary key; it identifies the field in the data that contains information to uniquely identify the data in other fields in the dataset. |
At the other end of a connection point is a reference, which is the foreign key pointing into the primary identifier for a dataset.
Using connection points and references, the logical data model forces Left Joins to define calculation paths.
Mapping from Source Data to LDM Object Type
In the table below, you can see how source data from your enterprise systems is mapped to basic types in the logical data model. These types are described in more detail in later sections.
|Source Data Type||LDM Object Type||Description|
|text value||attribute in a dataset||All text values that are passed into GoodData are stored as attributes.|
|numeric value||fact in a dataset||Numbers that you want to aggregate by sum, count, minimum, maximum, or average must be stored as facts.|
|numeric value||attribute in a dataset||Numbers that you want to use to slice your data (e.g Table Size) are stored as attributes.|
The same input data can be used in both ways.
|Date||date dataset||Date-related information is stored in a special object, the Date dataset. This dataset includes a pre-built hierarchy of attributes and dimensions for aggregating by date.|
The following object types pertain to structures within the data modeler; all of these object types use attribute (string) data as input.
|LDM Object Type||Description|
|relation||In the LDM Modeler, you create relations between datasets to describe the logical relationships between them. Relations and the connection points that anchor them are the essential tools for defining relationships in your model.|
|label||The label object is used to apply additional descriptors to an attribute. For example, the Person attribute may have the following labels: FirstName, LastName, SocialSecurity#, and others. When you define an attribute in LDM Modeler, a default label is automatically created for you.|
|hyperlink||A hyperlink is a label that describes an attribute using an URL. Hyperlinks are discussed later.|
The primary tool for creating, testing, and deploying logical data models is the CloudConnect LDM Modeler.
CloudConnect LDM Modeler
Download CloudConnect LDM Modeler from the Downloads page.
Integrated into CloudConnect Designer, the LDM Modeler tool is used to rapidly assemble the logical data models for your project. In GoodData, a logical data model represents the fields and datasets of your GoodData project and their relationships. Through a drag-and-drop interface, you can build datasets, populate them with fields, and create the connections that enable flexible reporting within your projects. Your LDMs can be published to one or more projects directly through the interface.
In GoodData, data modelers do not need to create the physical tables and relationships within the datastore. This physical data model is automatically created by the GoodData platform based on the LDM that you publish to the project.
To get started with CloudConnect LDM Modeler:
- Download: If you have already downloaded and installed CloudConnect Designer, you are ready to use LDM Modeler.
- Documentation: Data Modeling and Logical Data Model
GoodData REST APIs
The GoodData REST APIs enable developers to programmatically interact with all aspects of their GoodData domain. User, project, and ETL provisioning can be managed through these structured APIs. Additionally, you can modify individual objects within your project, such as metrics, reports, dashboards, Data Permissions filters, and the logical data model.
To get started with GoodData REST APIs, see API Reference.
To make small changes to the technical definition of your deployed projects, developers can utilize the gray pages, a form-based interface. Select links displayed in the gray pages to navigate the internals of your project definition and use the available form fields to submit changes at the endpoints.
The gray pages should not be used for most interactions with your projects, as effective navigation requires an understanding of the technical layout of a project definition and management of the internal identifiers of your project. GoodData recommends using other interfaces where possible.
If you are using Google Chrome, the GoodData Extension Tool provides links into the gray pages to simplify navigation from the current project in the GoodData Portal.
To get started with the gray pages, see Access the Gray Pages for a Project.
For an overview of data modeling, see Data Modeling Overview.
Adamson, Christopher; Star Schema: The Complete Reference; McGraw Hill, 2010.
Kimball, Ralph et al.; The Data Warehouse Lifecycle Toolkit, Second Edition; Wiley, 2008.