There are two layers to the data model that is used for storing your project's data. If you have a background in data modeling, you are familiar with database schema. The database schema is used for accessing and storing data in the database. GoodData software components automatically generate the database schema on the platform. Users do not need to create or manage database schemas. In fact, most users do not interact with them at all.
The database schema corresponds to the physical data model.
Through CloudConnect LDM Modeler, you create a simpler, more intuitive logical data model. The logical data model describes the relationships between abstracted data elements, sets of data that are organized by logical connection, instead of associated based on how and where they are stored. These logical sets are called datasets.
A dataset contains one or more related fields of data.
A fact is a numerical piece of data, which in a business environment is used to measure a business process. Values can be arbitrary. Facts may be stored in integer or decimal format.
There are three types of facts: additive, non-additive, and semi-additive.
- Additive facts can be used in computations, such as summing them together.
- Non-additive facts cannot be added. For example, Unit Prices cannot be added together to produce meaningful information.
Semi-additive facts can be added but only within a certain context. For example, Inventory for a month is additive only within a month time period.
Data modelers are interested in additive facts in most cases. Additive facts are the data sources for aggregation, which is accomplished by metric function. Semi-additive facts require special context around them and must be managed carefully. Non-additive facts can often be derived from other additive facts. For example, Unit Price can be derived from dividing Extended Price by Quantity.
A collection of facts that measure the same business process are stored in a single data unit called a fact table. A fact table contains individual fact values and pointers to associated attributes used as context for the fact data. Fact tables have relatively few columns and many rows.
All facts in a fact table are recorded at a specific level of detail. This level of detail is called the grain of the fact. The grain of a fact table is largely determined by the attribute references in it. For example, if an inventory fact table contains the Quantity fact and the Location and Product attributes, the grain of table is described as "Quantity BY Location by Product."
- A general rule of thumb is to include the lowest level of source data in your facts. These facts can always be rolled up. For example, if you measure a business process by individual city data, you can always roll up that information to the state level. However, if data is stored at the state level, there is no way to decompose that data to individual city level.
- Where possible, be consistent in your units of measure before the data is inserted into the system. For example, do not store Fact 1 in inches and Fact 2 in centimeters. Even if you never intend to use these facts together, another user building a report may decide to use them in a report.
An attribute is a field containing a discrete set of alphanumeric or numeric values. An attribute is used to describe a fact in some way. For example, you could create an attribute called "Channel," which could contain values "Twitter", "Facebook", "Web Site", "Direct", and "Other". These attributes could be used to describe the numerical facts used to describe the sales funnel. In reports, they are used to group, slice, filter, and reorder facts.
An attribute can contain numerical values. For example, for a restaurant management reporting solution, you might create an attribute called, "Table Size," which numerical values 1,2,4, and 8, which could be used to slice the numerical facts in the solution.
Numerical data can be both facts and attributes. For example, you may wish to track Age as a fact and as an attribute to enable segmentation.
A set of related attributes is called a dimension. For example, Address, City, State, and Zip Code may be related in a dimension called, "Location." Each attribute in the dimension is a discrete entity, yet they are all related to each other.
A dimension is stored in a dimension table. A dimension table is shallow and wide.
Your dimensions should always have consistent definitions and contents. Dimensions that share identical structures are called conformed dimensions. Conformed dimensions are easier to create insightful reporting because of consistency between the data.
For example, the State attribute should not use two-letter abbreviations (CA) along with full state names (California). Queries using this malformed attribute will not be able to match the two versions of the state name.
Whenever possible, share dimensions between fact tables to ensure consistency. Shared dimensions are always conformed.
A dataset is a related set of facts, attributes or both, which are stored together in the datamart.
Datasets are associated with each other through connections. A relation is a data model object used to describe a relationship between one dataset and another. A relation joins the two datasets through a single connection point. A connection point functions like a database primary key; it should identify the field in the originating dataset that contains information to uniquely identify the data in other fields in the dataset.
When a relation is made between an attribute and a fact dataset, a foreign key field is inserted into the target dataset. This foreign key is populated by references to the primary key values in the dataset at the other end of the relation.
GoodData also supports a special data model object for managing time-based data. The Date dataset can be added to your projects to manage attribute information and to enable aggregation at the day, week, month, quarter, and year level.
For more information on using these objects in your designs, see Recommended Data Modeling Practices.
Also in this section:
- Basics of Data Modeling
- Logical Model versus Physical Data Model
- Logical Data Modeling in the Platform
- Recommended Data Modeling Practices