Add Custom Fields to the LDMs in Client Workspaces within the Same Segment
Creating custom fields in the logical data model (LDM) as a feature is supported only for workspaces where data is loaded from the GoodData Data Warehouse (ADS) using Automated Data Distribution (ADD; see Data Preparation and Distribution Pipeline). You cannot use custom fields in workspaces that load data from any other data warehouse or object storage service.
In a standard multi-tenant setup, all client workspaces within a segment share exactly the same logical data model (LDM) inherited and distributed from the segment’s master workspace (see Managing Workspaces via Life Cycle Management (LCM)). In some cases, however, you may need to customize the LDM of some client workspaces within the segment. For example, some entities in the LDM may require workspace-specific (custom) fields in addition to the standard fields common for all the client workspaces. Typically, these workspace-specific fields are custom fields created on a per-client basis directly in the source application.
To implement a solution for such cases and be able to manage the custom fields in an automated way, you can use a set of the GoodData platform features and tools that, when use together, allow you to have multiple workspaces managed in a centralized manner and customize their LDMs as you need. These features and tools are:
- GoodData Data Warehouse (ADS; see Data Warehouse Reference)
- Custom Field Creator
- Automated Data Distribution (ADD; see Data Preparation and Distribution Pipeline) initiated by Schedule Executor
- SQL Executor
- Rollout Brick
Limitations
- The solution documented in this article is supported only for workspaces where data is loaded from ADS using ADD.
- You can add custom fields only to datasets that already exist in a client workspace’s LDM.
- You can add only attributes, facts, and dates as custom fields.
- Attributes added as custom fields can have only one label and can be used only in a one-to-many relationship (many-to-many relationships are not supported for those attributes).
Implementation Steps
To implement the solution for adding and managing custom fields in the client workspaces' LDMs, complete the following steps:
- Define the custom fields in ADS.
- Create the custom fields in the LDM.
- Prepare the data to load to the custom fields.
- Configure the ADD data loading process.
- Load the data to the custom fields.
- Preserve the custom fields.
Review the best practices to make sure that you manage the custom field in the most efficient way.
Define the Custom Fields in ADS
Define the custom fields that you want to add to the client workspaces' LDMs (the field type, the title, the client workspace and the dataset it should be added to, and so on) and the data to load to those custom fields.
Steps:
- Create a table in ADS with the following columns, and name it - __cf_custom_fields_definition:- Column name - Type - Mandatory? - Default - Description - source_identifier - string - yes - n/a - The client ID of the workspace where the custom fields to create belong to - dataset - string - yes - n/a - The identifier of the dataset where the custom fields should be created - Example: - dataset.customers- cf_identifier - string - yes - n/a - The identifier of the custom field (the same as in the LDM) - Example: - attr.customers.cf_category- NOTE: We recommend that you use the - cf_prefix in the last section of the identifier. For example, use- attr.customers.cf_categoryinstead of- attr.customers.category.- cf_type - string - yes - n/a - The type of the custom field - Possible values: - attribute
- fact
- date
 - cf_title - string - yes - n/a - The title of the custom field as it will be visible to the end user - Example: - Category- cf_alias - string - no - n/a - The alias of the custom field - The alias must be based on the identifier of the custom field (see - cf_identifierearlier in this table) and follow the naming conventions for the Output Stage objects in ADD (see Naming Convention for Output Stage Objects).- Example: - a__cf_category- cf_select - string - no - n/a - The SQL expression used to extract the data from the Output Stage table and to load to the custom field - For more information about storing and formatting the data, see Prepare the data to load to the custom fields. - Example: - MAPLOOKUP(MAPJSONEXTRACTOR(custom_field), ‘cf_1')- cf_datatype - string - no - VARCHAR(128)for attributes- DECIMAL(12,2)for facts- The datatype of the custom field - NOTE: Do not specify the datatype for date custom fields. - cf_folder_title - string - no - n/a - The title of the folder where the custom field should be created - If the folder does not exist, it will be created. - You can see the folders on the Manage page, under the Data tab, for attributes and facts correspondingly. By default, if the folder is not specified, the custom field is created outside of any folder. - NOTE: Do not specify the folder for date custom fields. - x__timestamp - timestamp - no - n/a - (For incremental processing only) The timestamp when the last successful incremental load completed - When Custom Field Creator runs successfully, it stores to the workspace's metadata the greatest value in the - x__timestampcolumn of the query table. Next time Custom Field Creator retrieves the last greatest value from the workspace's metadata, and only those records whose value of the- x__timestampcolumn is greater than the timestamp stored in the workspace's metadata are taken from the query table and are processed to create the custom fields.- The incremental processing works similarly to the incremental load mode with the - x__timestampcolumn. For more information, see Naming Convention for Output Stage Objects.- You can use the following SQL script to create this table: - CREATE TABLE __cf_custom_fields_definition ( source_identifier varchar(255), dataset varchar(255), cf_identifier varchar(255), cf_type varchar(255), cf_title varchar(255), cf_alias varchar(255), cf_select varchar(10000), cf_datatype varchar(255), cf_folder_title varchar(255) ) ORDER BY source_identifier UNSEGMENTED ALL NODES;
- Fill in the - __cf_custom_fields_definitiontable with information about the custom fields.- Add one row for each single custom field within each client workspace.
- Do not add rows for the standard (not custom) fields that are part of the segment’s master LDM.
- Do not add rows for the client workspaces that do not have any custom fields.
 
For example, you have the Customers dataset with some standard (not custom) fields that are part of the segment’s master workspace LDM. You want to add custom fields to this dataset in different client workspaces:
- Categoryand- Sizeto the client workspace with the client ID of- m4801
- Managerand- Revenueto the client workspace with the client ID of- a9800
In this case, the __cf_custom_fields_definition table may look like this:
| source_identifier | dataset | cf_identifier | cf_type | cf_title | cf_alias | cf_select | 
|---|---|---|---|---|---|---|
| m4801 | dataset.customers | attr.customers.cf_category | attribute | Category | a__cf_category | MAPLOOKUP(MAPJSONEXTRACTOR(custom_fields), 'cf_category') | 
| m4801 | dataset.customers | attr.customers.cf_size | attribute | Size | a__cf_size | MAPLOOKUP(MAPJSONEXTRACTOR(custom_fields), 'size') | 
| a9800 | dataset.customers | attr.customers.cf_mng | attribute | Manager | a__cf_mng | MAPLOOKUP(MAPJSONEXTRACTOR(custom_fields), 'manager') | 
| a9800 | dataset.customers | fact.customers.cf_revenue | fact | Revenue | f__cf_revenue | MAPLOOKUP(MAPJSONEXTRACTOR(custom_fields), 'revenue') | 
You can integrate processing of this table into your data pipeline and generate it, for example, based on exports from the source system.
Now, you are going to create the custom fields in the client workspaces based on the __cf_custom_fields_definition table.
Create the Custom Fields in the LDM
To create the custom fields in the client workspaces' LDMs, you are going to use Custom Field Creator.
Custom Field Creator only adds custom fields to the LDM. It does not modify or delete any fields from the LDM.
Steps:
- Deploy Custom Field Creator in the service workspace (see Deploy a Data Loading Process for a Data Pipeline Brick). 
- Schedule Custom Field Creator in the service workspace (see Schedule a Data Load). When scheduling Custom Field Creator, add general parameters to its schedule (see “Schedule Parameters” in Custom Field Creator). 
- Run the Custom Field Creator process (see Run a Scheduled Data Loading Process on Demand). Custom Field Creator reads the data from the - __cf_custom_fields_definitiontable, scans the LDMs in the client workspaces, and adds any custom fields that are present in the table and are not present in the LDMs.
Now, you are going to prepare the data to load to the custom fields.
Prepare the Data to Load to the Custom Fields
The data to load to custom fields in a dataset must be stored in the same Output Stage table where the data for standard (not custom) fields of this dataset is stored. When adding the data for the custom fields to the Output Stage table, format the data as flexible structures such as JSON.
If the data comes in a different format, you can transform it using SQL Executor.
For example, you have the Customers dataset with some standard (not custom) fields that are part of the segment’s master workspace LDM: ID, Name, and Country. You added the custom fields to this dataset based on the __cf_custom_fields_definition table:
- Categoryand- Sizeto the client workspace with the client ID of- m4801
- Managerand- Revenueto the client workspace with the client ID of- a9800
To load data to these custom fields, add the custom_fields column to the out_customers table, which is the Output Stage table for the Customers dataset, and fill it with the data to load:
| cp__id | a__name | a__country | custom_fields | x__client_id | 
|---|---|---|---|---|
| 1 | Customer A | Czechia | {"category":"A","size":"Small"} | m4801 | 
| 2 | Customer B | USA | {"category":"B","size":"Small"} | m4801 | 
| 10 | Customer X | Vietnam | {"manager":"John D.","revenue":35000} | a9800 | 
| 21 | Customer Y | USA | {"manager":"John D."} | a9800 | 
Notice the following:
- The data to load is prepared for the CategoryandSizecustom fields in the client workspace with the client ID ofm4801.
- The data to load is prepared for the ManagerandRevenuecustom fields in the client workspace with the client ID ofa9800. In the last row, there is no data for theRevenuecustom field.
Now, you are going to configure the ADD data loading process for loading data to the client workspaces.
Configure the ADD Data Loading Process
Once you have modified a client workspace’s LDM to contain a custom field for which the corresponding Output Stage table does not physically contain a column with data, you no longer can load data to this client workspace by simply executing the associated ADD data loading process. The client workspaces now require a different SQL query to extract the data for each custom field. Therefore, you need to run the ADD data loading process in UNRESTRICTED mode that allows you to dynamically affect the SQL query and data load.
To run the ADD data load process in UNRESTRICTED mode, you need to use Schedule Executor configured with a special parameter, dataload_parameters_query. This parameter ensures that the data will be loaded to each client workspace and this data will include the data for their workspace-specific custom fields. For each client workspace, Schedule Executor runs an SQL query that returns the following information for this workspace:
- The list of columns with the data to load to the custom fields in the workspace
- The SQL expressions to extract the data from these columns
This information will be passed as parameters to the API running behind the ADD data loading process and will be applied to the corresponding client workspaces.
To retrieve this information, Schedule Executor queries a special table called __cf_dataload_parameters. This table aggregates the data from the other tables (including the existing __cf_custom_fields_definition table) in the format that the ADD data loading task accepts.
The __cf_dataload_parameters table does not exist in ADS yet. You are now going to create this table and fill it in with information.
Steps:
- Create a table in ADS that lists all datasets (by their identifiers) in all client workspaces (by their client IDs) where the data should be loaded to regardless of whether the datasets have any custom fields. Name the table - __cf_workspace_datasets. You can use the following SQL script to create this table:- CREATE TABLE __cf_workspace_datasets ( client_id varchar(255) NOT NULL, dataset varchar(255) NOT NULL, upload_mode varchar(255), load_delete_side_table varchar(255), include_lslts_condition varchar(255), update_lslts varchar(255) ) ORDER BY client_id, dataset UNSEGMENTED ALL NODES;- Optionally, you can add columns that represent specific ADD parameters (such as load mode, deletion mode, and so on; see Automated Data Distribution). 
- Create a table in ADS, and name it - __cf_dataload_parameters. Use the following SQL script to create this table:- CREATE TABLE __cf_dataload_parameters ( client_id varchar(255), GDC_DATALOAD_SINGLE_RUN_LOAD_MODE varchar(255), GDC_DATALOAD_SKIP_VALIDATE_MAPPING varchar(255), GDC_DATALOAD_DATASETS varchar(65000) ) ORDER BY client_id UNSEGMENTED ALL NODES;
- Deploy SQL Executor in the service workspace (see Deploy a Data Loading Process for a Data Pipeline Brick). 
- Schedule SQL Executor in the service workspace (see Schedule a Data Load). When scheduling SQL Executor, add general parameters to its schedule (see “Schedule Parameters” in SQL Executor). 
- Download the pre-defined transformation files for SQL Executor (the files are archived into - unrestricted_custom_fields.zip):- templates/unrestricted_parameters.liquidis a Liquid template file that is used to generate the transformation SQL query.
- param_definitions/custom_field_parameters.jsonis a parameter file. If you use different names of the tables or columns than those that are used in this article, you can change them in this file.
- 99_generate_dataload_parameters.sqlis an SQL file that invokes the Liquid template together with the parameter file.
 
- Unzip the archive, and place the files to the folder that is specified by the - folderparameter in the schedule (see “Schedule Parameters” in SQL Executor).You can combine these transformation files with your standard SQL transformations.
- Run the SQL Executor process (see Run a Scheduled Data Loading Process on Demand). SQL Executor executes the Liquid template and fills in the - __cf_dataload_parameterstable with the information that will be passed as parameters to the API running behind the ADD data loading process and will be applied to the corresponding client workspaces.
Now, you are going to load the data to the custom fields.
Load the Data to the Custom Fields
To load the data to the custom fields, you need to run the ADD data load process in UNRESTRICTED mode by using Schedule Executor configured with the dataload_parameters_query parameter.
Steps:
- In each client workspace with the custom fields where the data has to be loaded, schedule the ADD data loading process (see Schedule a Data Load). When scheduling the ADD data loading process, add the - modeparameter to its schedule and set it to the value that will later be used by Schedule Executor as an indicator to run this schedule (for example,- custom_fields).
- Deploy Schedule Executor in the service workspace (see Deploy a Data Loading Process for a Data Pipeline Brick). 
- Create a schedule for Schedule Executor in the service workspace (see Schedule a Data Load). When scheduling Schedule Executor, add general and environment-specific parameters to its schedule (see “Schedule Parameters” in Schedule Executor). In particular, make sure that you: - Add the - dataload_parameters_queryparameter and set it to the following:- SELECT client_id, GDC_DATALOAD_SINGLE_RUN_LOAD_MODE, GDC_DATALOAD_SKIP_VALIDATE_MAPPING, GDC_DATALOAD_DATASETS FROM __cf_dataload_parameters
- Add the - list_of_modesparameter and set it to the value of the- modeparameter in the ADD data loading processes that you specified at Step 1 of this procedure.
 
- Run the Schedule Executor process (see Run a Scheduled Data Loading Process on Demand). Schedule Executor runs the ADD data loading processes in - UNRESTRICTEDmode. The data for both standard (not custom) fields and the custom fields is loaded to the client workspaces.
Now, you are going to learn how to preserve the custom fields in the client workspaces.
Preserve the Custom Fields
Configure the Rollout Brick
Now that you have successfully modified the LDMs of the client workspaces and loaded the data to the custom fields, make sure these custom fields and any dashboards, insights, and metrics that might be using them are not deleted when the rollout brick (see Rollout Brick) runs and synchronizes all the client workspaces including the LDMs with the latest version of the segment’s master workspace.
To preserve the custom fields and the loaded data, add the synchronize_ldm parameter to the scheduled rollout brick process and set it to diff_against_master (see “Advanced Settings” in Rollout Brick). This prevents the rollout brick from comparing the LDM of the master workspace against the LDMs of the client workspaces.
In contrast to the default diff_against_master_with_fallback value of the synchronize_ldm parameter, the diff_against_master value does not result in a workspace-level fallback in case of a failure during LDM synchronization. In an unlikely event of inconsistencies occurring between the standard (not custom) portions of the LDM in the client workspaces (for example, after failed rollout), you may need to fix the inconsistencies manually.
Maintain the Tables
- Do not delete the custom fields from the - __cf_custom_fields_definitiontable after the custom fields have been added to the client workspaces' LDMs. If you delete the custom fields from this table, the subsequent data load will fail.Removing the custom fields from the table will not remove them from the client workspace. Automated removal of custom fields is not supported.
- Update the - __cf_workspace_datasetstable every time you have created a new client workspace or added a new dataset to a client workspace’s LDM.You can define- __cf_workspace_datasetsas a database view based on your workspace provisioning table to keep them synchronized.
Best Practices
- If you have a lot of client workspaces, consider using the x__timestampcolumn (see Define the Custom Fields in ADS), and run Custom Field Creator in incremental processing mode to reduce its run time during repeated executions.
- Run Custom Field Creator in the following cases: - After you have defined new custom fields in the __cf_custom_fields_definitiontable to make sure that the new custom fields will be added to the existing client workspaces.
- After you have created new client workspaces (see Provisioning Brick) to make sure that the custom fields are added to those newly created workspaces.
 
- After you have defined new custom fields in the