In this article we are going to show you data modeling within GoodData platform using our REST API. We are going to explain relationship between the Logical Data Model (LDM), Physical Data Model (PDM) and MAQL DDL language. MAQL DDL is GoodData proprietary data definition language and we outline its basic concepts and usage in this article.
The LDM (Logical Data Model) is necessary for creating your reports. It consists of attributes and facts that GoodData users add to their reports.
The PDM (Physical Data Model) is used for the data storage and query. It is de-facto a DBMS schema (tables, columns, primary/foreign keys etc.).
The DLI (Data Loading Interface) or SLI (Single Loading Interface) are used for loading data to the GoodData projects.
LDM describes the logical structure of organization’s data in terms like datasets, attributes and facts. Most analytical tools and platforms will force you to develop both LDM and PDM that are perfectly aligned. Unlike the average tools, GoodData interface with user only via the LDM. The corresponding PDM and DLI (SLI respectively) are automatically generated from the LDM. The logical data model can be created and modified easily thanks to the MAQL DDL language statements.
Creating a Simple Model
Suppose we want to model simple dataset with employees with their salaries and departments. Every employee is in one department, but a department may have more employees. To model such a situation we put these two attributes into single dataset hierarchy (see the picture). To generate this piece of model we use the following MAQL DDL:
The physical representation of the previous example is very simple. The names of tables and columns are derived from identifiers mentioned in the MAQL DDL commands. Every attribute has its own table with columns containing primary key of the attribute, foreign key to related attributes, textual labels, and facts. The keys in MAQL DDL define relationships among attributes. The figure below outlines the PDM that has been automatically generated from the LDM above.
The last piece of the puzzle is the DLI that is necessary for data loading. Alternatively you may use SLI providing a special feature: you don’t have to load primary keys of the attributes, since these are generated automaticaly in GoodData platform. In addition these identifiers are persistent and any string will always get the same ID. Both DLI and SLI APIs provide templates of self-describing packages which are possible to load to the platform. The data package is a ZIP archive that contains the data (delimited file) and a manifest that describes how the data map to the project’s LDM and PDM. The figure below shows the data file only. As you can see this is de-normalized (flattened d_employee_department and f_employee PDM tables) version of the PDM.
Now, we are going to inspect the MAQL DDL statements that created the LDM above. We have created the “HR Demo” dataset with the Department and Employee attributes and the Salary fact.
Creating the Employee Dataset
First, we have to declare the dataset itself. Here is the corresponding MAQL DDL statement:
Creating the Salary fact
Then, we may create the fact Salary and put it in the PDM table hr_empl, column f_salary. Remember that the fact is represented as a database column.
Creating the Employee attribute
The first command line creates the LDM attribute named Employee. The LDM attribute maps to a PDM table hr_empl with primary key id_empl. We can also add a foreign key, but we are not going to do it so right now. Stay tuned, we will show it later. As mentioned above, the attribute is defined as a PDM table.
The second and the third row add labels (also called display forms) to the Employee attribute. As you’ve guessed already, labels are text representations of the attribute. For example, the Employee attribute has two: SSN and Full Name. Later you’ll be able to choose which label is more suitable for your report. Both labels and ID map to the columns in the same PDM table. You’ll see these names later in the DLI, SLI respectively.
Creating the Department attribute
Now, we will repeat the exercise and create the Department attribute. The only difference is that we’ll add a foreign key to the hr_empl table now. This will enable us to aggregate employees by departments since it is many to one.
In the second line of the script, we again added the label to the Department attribute. Label is defined as nm_deptname column in the table hr_empl table.
Generating / Synchronizing PDM and DLI
Finally we add all the attributes and facts to the dataset HR Demo, and call SYNCHRONIZE MAQL DDL command, since all the PDM tables, columns, keys etc. are automatically generated after calling it. Due to that it is usually the last command of a MAQL DDL script.
This is the magic statement that generates the PDM and the DLI or SLI.
Remember that SYNCHRONIZE command will erase all data and require to load it back. If you want to preserve the data in the tables, use PRESERVE DATA option in addition.
The figure below describes the relationship between the MAQL DDL, LDM, PDM and DLI.
We have called SYNCHRONIZE command and we’ve ended up with an empty GoodData project. The last step is to populate it with some useful data. We can use the DLI or SLI REST API to do the job. The DLI API is a combination of a private WebDav storage where we upload the self-describing data package and a simple call (md/etl/pull2) that starts the data loading process. You find more in the data upload documentation.
The self-describing data package is a ZIP archive, that contains the data file with predefined columns and a simple manifest that describes the mapping between the data file and the project’s PDM. The data package template resides on this URI:
You can simply download the template, unzip it, populate the data file with your data (you’ll need to preserve the CSV header row and the column sequence), re-pack it, upload it to the DLI API’s WebDav location, and call the md/etl/pull2 API to start the data loading process. Here we discuss both the loading methods we provide.
In our case go to
and download the archive and uncompress it. There are several files: upload_info.json and one or more CSV files. We don’t need to edit the JSON file, it is already prepared exactly for the dataset. We just fill the CSV with our data. Each CSV file is for one PDM table, and you can define your own primary key values here. You have absolute control over the data in the PDM.
To upload data via DLI we may use the following commands (Linux, MAC OS):
Then go to
and start the loading process with parameter User staging directory = my-first-upload.
The SLI template can be found at
and it’s again a ZIP archive containing upload_info.json and a CSV file. Since the SLI mechanism controls table keys, we cannot specify our own identifiers, and hence the CSV file has simplified structure. Also, there is only one for both attributes. The SLI method is more comfortable, but it can be slower for large datasets.
To upload data via SLI we may use the following commands (Linux, MAC OS):
Then go to
and start the loading process with parameter User staging directory = my-second-upload.
That’s all about the MAQL DDL, PDM, DLI, SLI and LDM for today. See you next time!