Explicit lifting is a feature of GoodData’s Extensible Analytics Engine that grants you greater flexibility in the way you aggregate metrics in reports.
Prior to release of this feature, a metric could only be sliced by those attributes of larger granularity than the fact used in the metric (found to the left of the fact), and only by those attributes connected to the fact by one or more arrows.
In the LDM diagram below, a metric aggregated from the Value fact, such as
SELECT SUM(Value) can be aggregated by any of the attributes in red. Notice that these are the attributes with arrows that point to Value.
Attributes (in red) that are related to the Value fact
But now, with explicit lifting you can traverse against the grain of aggregation using the BY keyword to lock the aggregation level at an attribute that would otherwise not be able to slice a particular metric.
In the following example we refer to the same project LDM to see how explicit lifting allows us to slice a Number of Residents metric by Shop_ID, even though Shop_ID does not appear to be related to the Number of Residents fact.
Explicit lifting to Shop_ID
This aggregation is described in greater detail in Example 1 below.
Example 1 - Lifting one aggregation level
Consider the following metric representing total number of residents in a town.
Number of Residents:
From the project’s logical data model, displayed below, you can see that due to the relationship between the Number of Residents fact and its attribute ancestors, Town_ID and County, our Number of Residents metric can be sliced by town (Town_ID) or by county.
The Number of Residents fact and its related attributes
The resulting report would look like the following:
Number of Residents sliced by Town and County
With explicit lifting, you can slice metrics by certain attributes besides those that are related to the metric’s foundational fact. For example, with a slight modification to its definition, we could slice our Number of Residents metric by Shop_ID.
Greater flexibility afforded by explicit lifting
In order to carry out such an aggregation previously, we’d have to change our metric to specifically aggregate it at the Shop_ID level.
Number of Residents:
Now, we can slice Number of Residents by Shop_ID. This means our report will show the number of residents in the town where each shop is located. Our metric is effectively aggregated at the Town_ID level.
The number of residents belonging to the town where each shop is located
The real value of this report comes into play when we compare our number of residents metric side by side with a metric that represents the gross revenue of each shop: the total value of all transactions aggregated at the Shop_ID level. We’ll start by defining a flexible metric, that we can then break down by SHOP_ID once added to a report.
As it is formulated, this metric can be broken down by any related attribute of larger granularity (see red attributes in first graphic, above) by adding those attributes at the report level.
In this case, we’ve added County, Town_ID, and Shop_ID attributes to our report, allowing us to look for correlations between number of residents in the town in which a shop is located, and the gross revenue of each shop. Do shops tend to perform better when opened in more populous towns?
Do shops located in towns with more residents perform better?
For a closer look at the correlation between our two metrics, we could remove the County and Town_ID attributes from our report and visualize our modified report as a scatter chart.
Example 2 - Lifting through multiple aggregation levels
Using this same project as an example, we might wish to review the managers running the most successful shops. This analysis is available by calculating the total transaction value associated with each shop manager.
At first glance, it might not seem like this use case is possible to carry out, given that Manager is not related to the Value fact in our LDM. But with explicit lifting, we can take Value aggregated at the Shop_ID level and specify that these values should be aggregated at the Records of Employee level. This will allow us to slice by an attribute related to Records of Employee, Manager.
Explicit lifting allows us to associate data aggregated at the Shop level with the managers at those shops
Our first step is to write a metric that aggregates the Value of individual transactions at the level of each Shop. In essence this would calculate the total value of all transactions ever carried out at each shop. Our metric is defined as follows:
Total Value by Shop:
Next, we’ll specify that these values should in fact be aggregated at the Records of Employee level.
Total Value by Records of Employee:
Finally, we’ll allow our metric to roll up to the Manager level by introducing a MAX aggregation function.
Total Value by Records of Employee, Aggregated:
In the example above, the nesting of SELECTs around the SUM prevents double-counting (assuming a 1:1 relationship between Managers and Shops); aggregating with SUM, rather than MAX, would multiply each value by the number of employees assigned to the manager and shop.
However, if one manager manages multiple shops, this nested metric results in double-counting of values, which is a common issue with M:N relations.
Our resulting report displays an aggregation of individual transaction values – the total revenue from transactions – sliced by shop manager.
Total value (gross revenue) associated with various managers’ shops
While each of the Total Values displayed are associated with a Manager, they are effectively aggregated at the shop level. Note that matching values for two managers does not mean that they work in the same shop; Alexander and Michael are managers in different shops.
Total value by manager, visualized
With our new report, we can see who manages the top performing shops and the gross revenue earned by each manager’s shop.
Example 3 - Advanced explicit lifting
Another useful metric could be the average amount of time that elapses between customers’ first two transactions. Are repeat customers likely to allow more if less time to elapse before their second purchase?
We can use explicit lifting to answer this question by locking Transaction Date at the individual transaction level with our BY keyword. We can then calculate the difference between the transaction dates for each shopper. Our goal is to display this length of time for each shopper, as well as to return an average for the time elapsed between transactions across all shoppers.
Let’s revisit the data model and how these aggregation levels are related:
Explicit lifting allows us to carry out date arithmetic at the records of transaction level
First, let’s return the dates of the transactions. No aggregation is necessary here; we’re simply selecting the date values of the various transactions as they would appear in a fact table.
Transaction date lifted:
A good way to remember whether an aggregation function is necessary in any given expression is to visualize the process you’re trying to carry out using the logical data model diagram available in the project. The following guidelines are helpful for conceptualizing the aggregation:
|Traversal direction||Required aggregation|
|Moving along a left-to-right arrow||This is explicit lifting. It requires no aggregation.|
|Moving along a right-to-left arrow||This is aggregation. It requires an aggregation function.|
Now that we’ve selected the transaction date of each record, we’ll create one metric that filters for first purchase dates and a second metric that filters for second purchase dates. Each of our two new metrics will reference the original Transaction date lifted metric:
First purchase date:
Second purchase date:
In the above metric definitions, we’ve aggregated Transaction date lifted with a MAX function to filter it by purchase number. Since the retrieved values are limited to those that correspond toequaling a specific value, in most cases, the metric returns a single value, regardless of the aggregation function. However, MAX was chosen for the following reasons:
- SUM works as long as there is only one value in the data that corresponds to having a Purchase Number = 2. If there are multiple values where Purchase Number = 2, then the metric double-counts values.
- AVG is a safer calculation, as it most closely resembles the intention of the metric. However, AVG takes longer to calculate. For efficiency, you should use MIN or MAX.
- If the reasoning behind the aggregation is unclear, refer to the table above on remembering when aggregation is required.
By creating a final metric that computes the difference between our purchase date metrics, we’ll be ready to break our final result down by Shopper, which will be our report-level attribute.
Duration Between First and Second Visits:
The resulting report shows duration between first and second visits broken down by the Shopper attribute.
Each shopper’s duration between first two transactions, sorted from smallest to largest
By adding a Total Value metric to our report, defined SELECT SUM(Value), we can look for correlations in duration between initial transactions and total amount spent.
Duration between first two transactions with a new metric displaying total amount spent by each shopper
To find the average duration between the first two transactions across all shoppers, we can modify our metric and remove all report attributes:
Average duration between transactions:
Average duration across all shoppers
As you can see, explicit lifting delivers more ways to slice and dice your data and discover valuable new insights.