Add Custom Fields to the LDMs in Client Workspaces within the Same Segment
Creating custom fields in the logical data model (LDM) as a feature is supported only for workspaces where data is loaded from the GoodData Data Warehouse (ADS) using Automated Data Distribution (ADD; see Data Preparation and Distribution Pipeline). You cannot use custom fields in workspaces that load data from any other data warehouse or object storage service.
In a standard multi-tenant setup, all client workspaces within a segment share exactly the same logical data model (LDM) inherited and distributed from the segment’s master workspace (see Managing Workspaces via Life Cycle Management (LCM)). In some cases, however, you may need to customize the LDM of some client workspaces within the segment. For example, some entities in the LDM may require workspace-specific (custom) fields in addition to the standard fields common for all the client workspaces. Typically, these workspace-specific fields are custom fields created on a per-client basis directly in the source application.
To implement a solution for such cases and be able to manage the custom fields in an automated way, you can use a set of the GoodData platform features and tools that, when use together, allow you to have multiple workspaces managed in a centralized manner and customize their LDMs as you need. These features and tools are:
- GoodData Data Warehouse (ADS; see Data Warehouse Reference)
- Custom Field Creator
- Automated Data Distribution (ADD; see Data Preparation and Distribution Pipeline) initiated by Schedule Executor
- SQL Executor
- Rollout Brick
Limitations
- The solution documented in this article is supported only for workspaces where data is loaded from ADS using ADD.
- You can add custom fields only to datasets that already exist in a client workspace’s LDM.
- You can add only attributes, facts, and dates as custom fields.
- Attributes added as custom fields can have only one label and can be used only in a one-to-many relationship (many-to-many relationships are not supported for those attributes).
Implementation Steps
To implement the solution for adding and managing custom fields in the client workspaces' LDMs, complete the following steps:
- Define the custom fields in ADS.
- Create the custom fields in the LDM.
- Prepare the data to load to the custom fields.
- Configure the ADD data loading process.
- Load the data to the custom fields.
- Preserve the custom fields.
Review the best practices to make sure that you manage the custom field in the most efficient way.
Define the Custom Fields in ADS
Define the custom fields that you want to add to the client workspaces' LDMs (the field type, the title, the client workspace and the dataset it should be added to, and so on) and the data to load to those custom fields.
Steps:
Create a table in ADS with the following columns, and name it
__cf_custom_fields_definition
:Column name Type Mandatory? Default Description source_identifier string yes n/a The client ID of the workspace where the custom fields to create belong to dataset string yes n/a The identifier of the dataset where the custom fields should be created
Example:
dataset.customers
cf_identifier string yes n/a The identifier of the custom field (the same as in the LDM)
Example:
attr.customers.cf_category
NOTE: We recommend that you use the
cf_
prefix in the last section of the identifier. For example, useattr.customers.cf_category
instead ofattr.customers.category
.cf_type string yes n/a The type of the custom field
Possible values:
attribute
fact
date
cf_title string yes n/a The title of the custom field as it will be visible to the end user
Example:
Category
cf_alias string no n/a The alias of the custom field
The alias must be based on the identifier of the custom field (see
cf_identifier
earlier in this table) and follow the naming conventions for the Output Stage objects in ADD (see Naming Convention for Output Stage Objects).Example:
a__cf_category
cf_select string no n/a The SQL expression used to extract the data from the Output Stage table and to load to the custom field
For more information about storing and formatting the data, see Prepare the data to load to the custom fields.
Example:
MAPLOOKUP(MAPJSONEXTRACTOR(custom_fields), 'cf_1')
cf_datatype string no VARCHAR(128)
for attributesDECIMAL(12,2)
for factsThe datatype of the custom field
NOTE: Do not specify the datatype for date custom fields.
cf_folder_title string no n/a The title of the folder where the custom field should be created
If the folder does not exist, it will be created.
You can see the folders on the Manage page, under the Data tab, for attributes and facts correspondingly. By default, if the folder is not specified, the custom field is created outside of any folder.
NOTE: Do not specify the folder for date custom fields.
x__timestamp timestamp no n/a (For incremental processing only) The timestamp when the last successful incremental load completed
When Custom Field Creator runs successfully, it stores to the workspace's metadata the greatest value in the
x__timestamp
column of the query table. Next time Custom Field Creator retrieves the last greatest value from the workspace's metadata, and only those records whose value of thex__timestamp
column is greater than the timestamp stored in the workspace's metadata are taken from the query table and are processed to create the custom fields.The incremental processing works similarly to the incremental load mode with the
x__timestamp
column. For more information, see Naming Convention for Output Stage Objects.You can use the following SQL script to create this table:
CREATE TABLE __cf_custom_fields_definition ( source_identifier varchar(255), dataset varchar(255), cf_identifier varchar(255), cf_type varchar(255), cf_title varchar(255), cf_alias varchar(255), cf_select varchar(10000), cf_datatype varchar(255), cf_folder_title varchar(255) ) ORDER BY source_identifier UNSEGMENTED ALL NODES;
Fill in the
__cf_custom_fields_definition
table with information about the custom fields.- Add one row for each single custom field within each client workspace.
- Do not add rows for the standard (not custom) fields that are part of the segment’s master LDM.
- Do not add rows for the client workspaces that do not have any custom fields.
For example, you have the Customers
dataset with some standard (not custom) fields that are part of the segment’s master workspace LDM. You want to add custom fields to this dataset in different client workspaces:
Category
andSize
to the client workspace with the client ID ofm4801
Manager
andRevenue
to the client workspace with the client ID ofa9800
In this case, the __cf_custom_fields_definition
table may look like this:
source_identifier | dataset | cf_identifier | cf_type | cf_title | cf_alias | cf_select |
---|---|---|---|---|---|---|
m4801 | dataset.customers | attr.customers.cf_category | attribute | Category | a__cf_category | MAPLOOKUP(MAPJSONEXTRACTOR(custom_fields), 'cf_category') |
m4801 | dataset.customers | attr.customers.cf_size | attribute | Size | a__cf_size | MAPLOOKUP(MAPJSONEXTRACTOR(custom_fields), 'size') |
a9800 | dataset.customers | attr.customers.cf_mng | attribute | Manager | a__cf_mng | MAPLOOKUP(MAPJSONEXTRACTOR(custom_fields), 'manager') |
a9800 | dataset.customers | fact.customers.cf_revenue | fact | Revenue | f__cf_revenue | MAPLOOKUP(MAPJSONEXTRACTOR(custom_fields), 'revenue') |
You can integrate processing of this table into your data pipeline and generate it, for example, based on exports from the source system.
Now, you are going to create the custom fields in the client workspaces based on the __cf_custom_fields_definition
table.
Create the Custom Fields in the LDM
To create the custom fields in the client workspaces' LDMs, you are going to use Custom Field Creator.
Custom Field Creator only adds custom fields to the LDM. It does not modify or delete any fields from the LDM.
Steps:
Deploy Custom Field Creator in the service workspace (see Deploy a Data Loading Process for a Data Pipeline Brick).
Schedule Custom Field Creator in the service workspace (see Schedule a Data Loading Process). When scheduling Custom Field Creator, add general parameters to its schedule (see “Schedule Parameters” in Custom Field Creator).
Run the Custom Field Creator process (see Run a Scheduled Data Loading Process on Demand). Custom Field Creator reads the data from the
__cf_custom_fields_definition
table, scans the LDMs in the client workspaces, and adds any custom fields that are present in the table and are not present in the LDMs.
Now, you are going to prepare the data to load to the custom fields.
Prepare the Data to Load to the Custom Fields
The data to load to custom fields in a dataset must be stored in the same Output Stage table where the data for standard (not custom) fields of this dataset is stored. When adding the data for the custom fields to the Output Stage table, format the data as flexible structures such as JSON.
If the data comes in a different format, you can transform it using SQL Executor.
For example, you have the Customers
dataset with some standard (not custom) fields that are part of the segment’s master workspace LDM: ID
, Name
, and Country
. You added the custom fields to this dataset based on the __cf_custom_fields_definition
table:
Category
andSize
to the client workspace with the client ID ofm4801
Manager
andRevenue
to the client workspace with the client ID ofa9800
To load data to these custom fields, add the custom_fields
column to the out_customers
table, which is the Output Stage table for the Customers
dataset, and fill it with the data to load:
cp__id | a__name | a__country | custom_fields | x__client_id |
---|---|---|---|---|
1 | Customer A | Czechia | {"category":"A","size":"Small"} | m4801 |
2 | Customer B | USA | {"category":"B","size":"Small"} | m4801 |
10 | Customer X | Vietnam | {"manager":"John D.","revenue":35000} | a9800 |
21 | Customer Y | USA | {"manager":"John D."} | a9800 |
Notice the following:
- The data to load is prepared for the
Category
andSize
custom fields in the client workspace with the client ID ofm4801
. - The data to load is prepared for the
Manager
andRevenue
custom fields in the client workspace with the client ID ofa9800
. In the last row, there is no data for theRevenue
custom field.
Now, you are going to configure the ADD data loading process for loading data to the client workspaces.
Configure the ADD Data Loading Process
Once you have modified a client workspace’s LDM to contain a custom field for which the corresponding Output Stage table does not physically contain a column with data, you no longer can load data to this client workspace by simply executing the associated ADD data loading process. The client workspaces now require a different SQL query to extract the data for each custom field. Therefore, you need to run the ADD data loading process in UNRESTRICTED mode that allows you to dynamically affect the SQL query and data load.
To run the ADD data load process in UNRESTRICTED
mode, you need to use Schedule Executor configured with a special parameter, dataload_parameters_query
. This parameter ensures that the data will be loaded to each client workspace and this data will include the data for their workspace-specific custom fields. For each client workspace, Schedule Executor runs an SQL query that returns the following information for this workspace:
- The list of columns with the data to load to the custom fields in the workspace
- The SQL expressions to extract the data from these columns
This information will be passed as parameters to the API running behind the ADD data loading process and will be applied to the corresponding client workspaces.
To retrieve this information, Schedule Executor queries a special table called __cf_dataload_parameters
. This table aggregates the data from the other tables (including the existing __cf_custom_fields_definition
table) in the format that the ADD data loading task accepts.
The __cf_dataload_parameters
table does not exist in ADS yet. You are now going to create this table and fill it in with information.
Steps:
Create a table in ADS that lists all datasets (by their identifiers) in all client workspaces (by their client IDs) where the data should be loaded to regardless of whether the datasets have any custom fields. Name the table
__cf_workspace_datasets
. You can use the following SQL script to create this table:CREATE TABLE __cf_workspace_datasets ( client_id varchar(255) NOT NULL, dataset varchar(255) NOT NULL, upload_mode varchar(255), load_delete_side_table varchar(255), include_lslts_condition varchar(255), update_lslts varchar(255) ) ORDER BY client_id, dataset UNSEGMENTED ALL NODES;
Optionally, you can add columns that represent specific ADD parameters (such as load mode, deletion mode, and so on; see Automated Data Distribution).
Create a table in ADS, and name it
__cf_dataload_parameters
. Use the following SQL script to create this table:CREATE TABLE __cf_dataload_parameters ( client_id varchar(255), GDC_DATALOAD_SINGLE_RUN_LOAD_MODE varchar(255), GDC_DATALOAD_SKIP_VALIDATE_MAPPING varchar(255), GDC_DATALOAD_DATASETS varchar(65000) ) ORDER BY client_id UNSEGMENTED ALL NODES;
Deploy SQL Executor in the service workspace (see Deploy a Data Loading Process for a Data Pipeline Brick).
Schedule SQL Executor in the service workspace (see Schedule a Data Loading Process). When scheduling SQL Executor, add general parameters to its schedule (see “Schedule Parameters” in SQL Executor).
Download the pre-defined transformation files for SQL Executor (the files are archived into
unrestricted_custom_fields.zip
):templates/unrestricted_parameters.liquid
is a Liquid template file that is used to generate the transformation SQL query.param_definitions/custom_field_parameters.json
is a parameter file. If you use different names of the tables or columns than those that are used in this article, you can change them in this file.99_generate_dataload_parameters.sql
is an SQL file that invokes the Liquid template together with the parameter file.
Unzip the archive, and place the files to the folder that is specified by the
folder
parameter in the schedule (see “Schedule Parameters” in SQL Executor).You can combine these transformation files with your standard SQL transformations.Run the SQL Executor process (see Run a Scheduled Data Loading Process on Demand). SQL Executor executes the Liquid template and fills in the
__cf_dataload_parameters
table with the information that will be passed as parameters to the API running behind the ADD data loading process and will be applied to the corresponding client workspaces.
Now, you are going to load the data to the custom fields.
Load the Data to the Custom Fields
To load the data to the custom fields, you need to run the ADD data load process in UNRESTRICTED
mode by using Schedule Executor configured with the dataload_parameters_query
parameter.
Steps:
In each client workspace with the custom fields where the data has to be loaded, schedule the ADD data loading process (see Schedule a Data Loading Process). When scheduling the ADD data loading process, add the
mode
parameter to its schedule and set it to the value that will later be used by Schedule Executor as an indicator to run this schedule (for example,custom_fields
).Deploy Schedule Executor in the service workspace (see Deploy a Data Loading Process for a Data Pipeline Brick).
Create a schedule for Schedule Executor in the service workspace (see Schedule a Data Loading Process). When scheduling Schedule Executor, add general and environment-specific parameters to its schedule (see “Schedule Parameters” in Schedule Executor). In particular, make sure that you:
Add the
dataload_parameters_query
parameter and set it to the following:SELECT client_id, GDC_DATALOAD_SINGLE_RUN_LOAD_MODE, GDC_DATALOAD_SKIP_VALIDATE_MAPPING, GDC_DATALOAD_DATASETS FROM __cf_dataload_parameters
Add the
list_of_modes
parameter and set it to the value of themode
parameter in the ADD data loading processes that you specified at Step 1 of this procedure.
Run the Schedule Executor process (see Run a Scheduled Data Loading Process on Demand). Schedule Executor runs the ADD data loading processes in
UNRESTRICTED
mode. The data for both standard (not custom) fields and the custom fields is loaded to the client workspaces.
Now, you are going to learn how to preserve the custom fields in the client workspaces.
Preserve the Custom Fields
Configure the Rollout Brick
Now that you have successfully modified the LDMs of the client workspaces and loaded the data to the custom fields, make sure these custom fields and any dashboards, insights, and metrics that might be using them are not deleted when the rollout brick (see Rollout Brick) runs and synchronizes all the client workspaces including the LDMs with the latest version of the segment’s master workspace.
To preserve the custom fields and the loaded data, add the synchronize_ldm
parameter to the scheduled rollout brick process and set it to diff_against_master
(see “Advanced Settings” in Rollout Brick). This prevents the rollout brick from comparing the LDM of the master workspace against the LDMs of the client workspaces.
In contrast to the default diff_against_master_with_fallback
value of the synchronize_ldm
parameter, the diff_against_master
value does not result in a workspace-level fallback in case of a failure during LDM synchronization. In an unlikely event of inconsistencies occurring between the standard (not custom) portions of the LDM in the client workspaces (for example, after failed rollout), you may need to fix the inconsistencies manually.
Maintain the Tables
Do not delete the custom fields from the
__cf_custom_fields_definition
table after the custom fields have been added to the client workspaces' LDMs. If you delete the custom fields from this table, the subsequent data load will fail.Removing the custom fields from the table will not remove them from the client workspace. Automated removal of custom fields is not supported.Update the
__cf_workspace_datasets
table every time you have created a new client workspace or added a new dataset to a client workspace’s LDM.You can define__cf_workspace_datasets
as a database view based on your workspace provisioning table to keep them synchronized.
Best Practices
- If you have a lot of client workspaces, consider using the
x__timestamp
column (see Define the Custom Fields in ADS), and run Custom Field Creator in incremental processing mode to reduce its run time during repeated executions. - Run Custom Field Creator in the following cases:
- After you have defined new custom fields in the
__cf_custom_fields_definition
table to make sure that the new custom fields will be added to the existing client workspaces. - After you have created new client workspaces (see Provisioning Brick) to make sure that the custom fields are added to those newly created workspaces.
- After you have defined new custom fields in the