A connection point is an attribute in your data model that does the following:
- It is a primary key of the dataset, which enables the system to distinguish individual records.
- It is an attribute that enables connecting this dataset to another one using its values to make a relation. See Building Relations between Objects.
To connect two datasets together, define a connection point (primary key) in the first dataset, and a reference (foreign key) in the second dataset. Together, they form a relation.
Connection points do not appear in the GoodData Portal. However, they are important identifiers of uniqueness within a dataset. For example, you must define a connecting point in each dataset that is loaded using incremental data loads.
Follow these rules for connection points:
- Each value of the connection point must be unique. If the data contains multiple rows with identical connection point values, only one row is loaded into the project, and all other rows are dropped.
- Each value of the connection point must have a corresponding value in the reference attribute of the other dataset.
- The values for the reference attribute are not necessarily unique.
- You can have only one connection point per dataset, but there can be many references to that connection point in other datasets.
- Via the connection point, a dataset can be connected to many other datasets, where there are references to the connection point. However, any single reference in a dataset can point back to only one connection point.
- When connected, two datasets form a hierarchical relationship, with the referenced dataset being on a higher level in the hierarchy.
When building reports, you can build your reporting relationships from the transaction data, containing the references to the unique identifiers contained in the other dataset.
Connection point example
To better understand the concept, let's review the following example:
This table illustrates payment transactions made by individual customers. Each transaction has a Transaction ID to serve as a unique identifier. This field is an ideal candidate for a connection point.
A customer may not necessarily make just one purchase, so this transaction ID may show up multiple times in the record of payments if a customer makes multiple purchases.
In CloudConnect, the data model looks like the following:
When the data model is published, here is how it looks like in the Portal:
Now, let's normalize the transaction data into two tables: one for transactions and the other for customers. Here are the customers with unique Customer ID values:
Note the new Customer ID column, which contains a unique value for each customer. Each row in the table represents a unique customer.
Now, referencing the Customer ID values, the transaction data looks like the following:
Customer names are no longer referenced in the transactional data. Instead, their IDs are used to refer to the other dataset, where unique values are maintained. These are the references to the connection point data.
In this example, an entire field has been removed from the transaction data.
In CloudConnect, the data model now looks like the following:
This is how this model appears in the Portal:
In the above example, you can see how for each transaction (Amount), there is a referenced transaction ID, each of which references a customer ID. From the customer ID, you can derive the First Name and Last Name values in the attribute table.