Many-to-Many Reporting

To use the Many-to-Many feature described in this section, you must ensure your project uses XAE version 3. Although it is possible to deploy Many-to-Many with a lower XAE version, this feature will not work as intended. For instructions on how to upgrade, see XAE Version 3 Prerequisites.

A many-to-many relationship is similar to a one-to-many relationship in that tables are joined together but the relationship is more complex. For example, a marketing automation system allows you to assign one or more predefined ‘tags’ to each marketing campaign. With these tags, you can organize campaigns in any way we want.

CampaignChannelTagsTotal CostMQL
CIO Retail SummitTradeshowEvent, DigOut2018, Salesteam,172,350.36148
Game Time SavingsDigitalLinkedIn, Spring1617,439.8233
Quarterly Trends CampaignEmailThoughtLeadership, Spring16, Developers,

5,991.37

16
Apparel Nurture CampaignEmailNurture, Spring16,4,522.9127
Why E-Commerce DataContentDevelopers,9.023.1117

Every campaign can have multiple tags, and every defined tag can be attached to multiple campaigns. This is an example of a many-to-many relationship, a very useful structure that allows us to ask questions such as:

  • What is the overall spend per channel for ‘Nurture’ campaigns?
  • What has been the total cost of our ‘Spring16’ marketing program?
  • For each program, how many Marketing Qualified Leads (MQL) did we generate, and what was the cost per MQL?

The following article explains how to model many-to-many relationships on the GoodData platform.

To learn more about many-to-many relationships, see Analytics for the Modern World.

Contents:

Supported operations

Modeling Many-to-Many Relations
You can create a model with a multi valued dimension/attribute that (indirectly) connects to a fact table via a bridge table. The XAE engine understands this is an intentional many-to-many relation in the data model.

Use of Multi-Valued Attributes for Creation of Insights and Reports
Use a fact (measure) and multi-valued attribute (or attribute from multi-valued dimension) in Analytical Designer to create an insight so that multi-valued attribute slices data (in view by) and/or filter data (in filters).

Visibility of Many-to-Many Relation in the model
View the data model via the manage page to see multi-valued attribute is usable via many-to-many relation. The relation is clearly differentiated from other M:N relations between fact tables that are not directly usable for slicing and filtering.

Filtering over M:N
Filter measures over M:N relation without impacting the result by double counting so that you can use filters (e.g. in Mandatory User Filters (MUFs) or as dashboard filters) without worry about the impact on the accuracy of existing reports.

Modeling via CloudConnect

In order to support modeling of M:N edge you must use the latest version of CloudConnect found at Downloads.

  1. Current 1:N model contains standard relation between Ticket and TicketTag (bridge dataset), where Tag is the multivalued dataset.
    1. There is no need for a dummy fact nor a surrogate key in the bridge table. For a primary key it is recommended to use fact table grain.
  2. The goal of the designer is to allow slicing and filtering of Resolution Time by Tag ID.
  3. Click on TicketTag dataset and drag the arrow in opposite direction than the standard relation.
  4. M:N edge represented by a dotted arrow and operations (filtering, slicing) by the Tag is enabled (after the model is published to the server).

Relation modeling behaves as usual in GoodData for 1:N

  • aggregation of the fact is possible in the opposite direction of the arrows from the fact
  • filtering of the fact is possible in the direction of the arrow from the filter to fact. To filter with a tag, you can aggregate the fact to that dimension level.


Restrictions to M:N

The following restrictions are part of prerequisites for upgrade to XAE v3 and using M:N.

M:N edge can be created only on top of existing 1:N edge (implicitly guaranteed by MAQL DDL syntax).

M:N edge cannot be contained in alternative paths (the same check as for computed edges; error message Multivalue edge(s) detected in alternative LDM path(s) ).

1) M:N edge cannot be added if there is already an alternative physical path in the model (internally referred “diamond”).

Error: Multivalue edge(s) detected in alternative LDM path(s) (type a)
You cannot create a multivalue edge C-D because there is an alternative path: You can go from A to D via C OR B.

2) M:N edges cannot constitute alternative paths - two alternative paths that use different M:N edges cannot exist.


Error: Multivalue edge(s) detected in alternative LDM path(s)) (type b)
You cannot create multivalue edge A-C because there is alternative path: You can go from B to A via C OR D (using different multivalue edges: A-C OR A-D).

3) The connection point of the bridge dataset can’t be used as AC (bridge dataset must be “center of star”; error message Path(s) from multivalue edge(s) detected in LDM )

Error: Path(s) from multivalue edge(s) detected in LDM
You cannot create multivalue edge A-C because there is “path from multivalue edge”: the edge C-D.
Because “bridge” attribute C cannot be referenced from other attributes.

Bridge Dataset Limitations

  • Native filtering by bridge dataset is not supported.
  • A workaround is possible using the BY statement to filter multivalue attribute by bridge dataset and filter the fact by the result.
  • Using a fact in a bridge table to calculate a weight of a relation is not supported. Example: Different commissions for Sales reps and various products.

Negative filters results when used in M:N model

The syntax WHERE Attribute NOT IN (values) returns records, where the result include at least one different value from the filtered value.
The syntax WHERE NOT Attribute IN (values) returns records, where the result include at least one different value from the filtered value.
Analytical designer generates NOT ( [attribute] IN ( [element1, element2, ...] )) for negative filters.

Records:
Ticket ID: 123, Tags: (xxx, abc, orange)
Ticket ID: 222, Tags: (xxx)
Ticket ID: 666, Tags: (abc)
Ticket ID: 999, Tags: NULL

Queries and Results: HOW TICKETS
WHERE Tag IN (“xxx”)
>>> 123, 222

WHERE TAG = (“xxx”)
>>> 123, 222

WHERE Tag NOT IN (“xxx)
>>> 123, 666

WHERE TAG != (“xxx”)
>>> 123, 666

WHERE NOT Tag IN (“xxx”)
>>> 123, 666

WHERE NOT Tag = (“xxx”)
> 123, 666

WHERE Tag NOT IN ( xxx, abc, orange )
>>> No result

WHERE NOT Tag IN ( xxx, abc, orange )
>>> No result

Similar behaviour for:
Tag NOT LIKE vs. NOT Tag LIKE
Date NOT BETWEEN vs. NOT Date BETWEEN

Performance recommendations

Computing an aggregated metric over datasets in a M:N relation may bring performance problems in case the LDM is not designed properly, as the bridge table scales according to size of the connected datasets.

Behavior of AND filters in M:N model

Filtering by multiple attributes returns a result where all conditions that are logically related to each other are applied together on the result. (Instead of application of particular conditions for each dimension before consolidating the final result).
Consider the following model and data:

TICKET DATASET

"Ticket_id","status","MN_Date","Resolution_time"  
"support01","new",2018-04-13,20  
"support02","backlog",2017-09-09,50  
"support03","hacking",2016-02-14,3 0

BRIDGE TICKETTAG DATASET

"MN_tickettag_id","stage","Ticket_id","Tag_ID","date_2"  
"tickettag01","user created","support01","tag01",2018-04-12  
"tickettag02","system created","support03","tag03",2017-11-15"  
"tickettag03","system created","support02","tag05",2018-10-22"  
"tickettag07","system created","support01","tag02",2018-04-12"  
"tickettag08","system created","support02","tag02",2018-06-01"  
"tickettag10","system created","support01","tag04",2018-05-22"  
"tickettag12","system created","support01","tag07",2018-04-13"

TAG DATASET

"Tag_ID","Tag Name" "Risk_level","Internal costs"  
"tag01","Reporting","High",200  
"tag02","Tier 1", "High",300  
"tag03","Uploads","Medium",200  
"tag04","Performance","High",300  
"tag05","Exporters","Medium",200  
"tag07","CloudConnect","Low",100

EMPLOYEE/TICKET ASSIGNMENT DATASET

"Assignment_ID","MN_Ticket_ID","MN_Employee_ID"  
"AS1",support01","EMP1"  
"AS2","support02","EMP2"  
"AS3","support03","EMP3"  
"AS10","support01","EMP2"  
"AS11","support02","EMP3"  
"AS12","support03","EMP4"

EMPLOYEE DATASET

"MN_Employee_ID", "MN_Employee_Name", "MN_Employee_City", "MN_Employee_Salary"  
"EMP1", "Gerry Weber", "London", 1150  
"EMP2", "Laura Sanchez", "Málaga", 950  
"EMP3", "Conchita Martinez", "Marbella", 700  
"EMP4", "Jon Olsson", "Puerto Banus", 2000


Results:
Filtering by Employees = All and MN_Stage = system created and MN_Tag_ID = tag01 will lead to zero result as there is no exact match on the resolution Time of tickets, which would have Tag01 and were created by the system and assigned to any employee.

Double counting in M:N model

Double counting occurs in report computation with a M:N model. It happens anytime when user slices by multivalued attribute as the values are calculated for each multivalued attribute (and the fact may be related to multiple multivalued attributes).