Working with a dimensional model is incredibly easy, especially with the GoodData report building user interface. You have your data in a table (e.g. an Excel spreadsheet, SQL result set, Salesforce entities etc), an analytical engine turns your columns into facts and attributes and voila - facts can be summed and averaged, attributes can be counted, and all these measures can be sliced or filtered by the attribute columns.
If our table contained support ticket data, it could also be represented as a star schema with few fact columns (“Replies”, “Time to 1st Reply” and “Time to Resolve”), the “Ticket” connection point and few attributes (“User”, “Employee” and “Status”).
GoodData uses a more general term “data set” for a group of attributes and facts that were created from a single table, regardless they formally form a dimension or a fact table. This article will stick with this terminology, though it may sometimes use the traditional “fact table” or “dimension” terms to indicate the data set characteristics.
You can connect one data set to another one if the underlying data forms a natural hierarchy. For example, we can replace the “Employee” attribute with a new dimension with an “Employee” connection point and “Team” attribute and connect it to “Topic”. Likewise, we can add a new dimension with “User” connection point and “Account” attribute and connect it to “Ticket” instead of the original “User” attribute. Finally, adding date dimension is another example of adding a data set to a hierarchy. In this case we’ve added two date dimensions which include attributes like “Week”, “Month”, “Year” and so on.
Many to Many Relations
Sooner or later, we encounter a situation when we need to analyze data that do not form a hierarchy.
For example, tickets may be organized using tags. Since a ticket can be tagged with multiple tags and multiple tickets can be tagged with a single tag, the resulting logical data model will look like the following:
At a glance, it looks like a hierarchy. However, as the facts are not referenced by tags directly, we do not have the benefit of automated slicing and filtering by the Tag attribute. In other words, with this model one would not be able to readily determine the time to resolve of just those tickets tagged urgent.
The good news is that there’s a workaround. The ticket facts can be sliced by the Tag attribute using a specially constructed metric that will compute the aggregation for every row in the Ticket Tag Association table. In other words, it will treat the association table as a fact table with the ticket specific metric value (e.g. average resolution time) duplicated for each ticket in that table. This solution is described in the Modeling Many-to-Many relationships in GoodData article.
But how can we filter the aggregations of ticket facts by tags?
Filtering by a tag can rephrased as including only tickets such that there is at least one record in the association table that points to the selected tag.
This condition can be expressed by the following MAQL expression:
For example, if we want our report to show an average time to first reply and to reflect a dashboard filter based on the Tag attribute, we can write the average time to first reply metric like this:
Alternatively, we can also implement this filtering rule by creating a separate metric such as
SELECT COUNT(Ticket Tag Association) BY Ticket, ALL OTHER and use that metric in a report level numeric range filter. This solution will allow an analyst to run ad hoc reports using the report building page without being limited by the pre-built dashboard reports.
It’s important to note that in order to make this technique fully functional, we must ensure that every ticket is given at least one tag, otherwise untagged tickets would be excluded from all computations when no tag was selected - this is because the
COUNT submetric will always return zero for untagged tickets.
The best way to do this is to add a special empty tag value before loading data into GoodData (i.e., during ETL) that will associate untagged tickets to this extra tag. First, it will ensure untagged tickets are included in computations, this solution will actually bring a value to end users, as they will be able to filter for untagged tickets just by selecting that empty item from the Tag filter drop-down.
Independent Fact Tables
The most typical example of a multi-hierarchical model is a situation when the reporting requirements naturally lead to multiple independent fact tables. When analyzing a survey results, our model will usually look more or less like this:
For a hotel customer satisfaction survey, the Choice Responses data set may include responses to questions such as “What was the purpose of stay?” with options “Business”, “Leisure” and possibly others. Meanwhile, Rating Responses may include a response to Questions like “What was your overall satisfaction?”
How would you create a report that would compare the overall satisfaction between “Business” and “Leisure” customers? You may notice the similarities of this scenario to the previous example. Just as tagged tickets were included in the Ticket Tag Association data set, here the Survey Responses are “tagged” with options and questions. This kind of tagging is captured in the Choice Responses data set.
To determine the overall satisfaction of “business” and “leisure” guests, we can write a metric as follows:
For ether readability, you may want to split the metric into a number of separate metrics such as Overall Satisfaction and # Purpose of Stay Responses, so the final metric may look like:
Again, similarly to the tickets example, we can exclude the Option = ‘Business’ expression and leave the “Purpose of Stay” filter up to a dashboard filter.
Obviously, filtering by a sub metric is not specific to COUNT sub-metrics. The same technique can be used for a report that computes a number of unsatisfied business travelers. For example, you could count the survey responses from those whose purpose of stay is business and whose overall satisfaction rating is below 7 out of 10 in the following way: