Facts in Logical Data Models
A fact is a numerical piece of data, which in a business environment is used to measure a business process. Values can be arbitrary.
Facts in a dataset are identified by the following icon:
The following image shows an LDM Modeler fact dataset comprising two facts.
Facts can be stored with the following data types:
Data Type | Notes |
---|---|
Decimal(M,D) | Decimal values can be configured as follows:
|
Integer (INT) | Integer values in the range of min(-2147483648) to max(2147483647) |
Large Integer (BIGINT) | Integer values in the range of min(-1e+15) to max(1e+15) |
Double | Double values are discouraged, because they are approximations of input values. It is possible to specify them through a MAQL DDL statement. |
There are three types of facts:
- Additive facts Additive facts can be used in computations, such as summing them together.
- Non-additive facts Non-additive facts cannot be added. For example,
Unit Prices
cannot be added together to produce meaningful information. - Semi-additive facts Semi-additive facts can be added but only within a certain context. For example,
Inventory for a month
is additive only within a month.
Data modelers are mostly interested in additive facts.
Additive facts are the data sources for aggregation, which is accomplished by metric function. Semi-additive facts require special context around them and must be managed carefully. Non-additive facts can often be derived from other additive facts. For example, Unit Price
can be derived from dividing Extended Price by Quantity
.
Fact Datasets
A collection of facts that measure the same business process are stored in a single data unit called a fact dataset (also known as a fact table). A fact dataset contains individual fact values and pointers to associated attributes used as context for the fact data. Fact datasets have relatively few columns and many rows.
All facts in a fact dataset are recorded at a specific level of detail. This level of detail is called the grain of the fact. The grain of a fact dataset is largely determined by the attribute references in it. For example, if an inventory fact dataset contains the Quantity
fact and the Location
and Product
attributes, the grain of the dataset is described as Quantity BY Location BY Product
.
Designing Fact Datasets
Depending on the type of underlying business process, fact datasets can be designed in one of the following major ways:
- Transactional fact dataset for discrete events. One row corresponds to a single transaction, usually connected to one date dimension. Rows are added, not updated.
- Accumulating fact dataset for evolving events. This method models a process passing through several predefined stages, usually connected to multiple date dimensions. One row corresponds to an execution of the process, with individual stages of the process tracked in separate columns. Rows must be updated throughout the stages.
- Snapshotted fact dataset for recurring events. Multiple rows are required to capture the status of the business process at different moments, each of which is timestamped. The fact table is connected to a Snapshot date dimension, which is used for filtering, not for aggregations. Rows are added, not updated.
Recommended Practices
- Include the lowest level of source data in your facts. These facts can always be rolled up. Example: If you measure a business process by individual city data, you can always roll up that information to the state level. However, if data is stored at the state level, there is no way to decompose that data to individual city level.
- Where possible, be consistent in your units of measure before the data is inserted into the system. Example: Do not store Fact 1 in inches and Fact 2 in centimeters. Even if you never intend to use these facts together, another user building a report may decide to use them in an insight.
- Facts placed in the same fact table must be at the same level of detail (grain) and from the same business process. If two events occur at different times, they belong in different fact tables.
- Do not connect fact tables directly to other fact tables. When combining facts from multiple fact tables in the GoodData Portal, you should connect them through a common dimension. Example: Suppose your source file contains data that should become attributes and facts on a set of payment transactions. In addition to the payment amount, a payment transaction might also have a check number, payee name and payment description, all of which become attributes in your model. Ideally, you should create a dimension for the payment information containing the attributes and a fact table containing only the facts of the transaction. For example, ordered units and shipped units most likely belong in separate fact tables because the order is created before the shipment is made.
- Avoid placing attributes in fact datasets. Fact datasets should contain facts and foreign keys to attributes stored in other dimensions.
- Verify that you understand the level of detail (the grain) in each fact dataset. In general, the dimensions indicate the level of detail.
- Where possible, avoid including non-additive facts, which cannot be used in metric computations.
- Instead of storing a ratio or percentage, store the facts that can be used to calculate the percentage or ratio. The calculation become a metric.
- Instead of storing unit prices, store the extended price (
units * unit price
). Put the calculation of the unit price or average price into a metric.