Add Custom Fields to the LDMs in Client Workspaces within the Same Segment

Creating custom fields in the logical data model (LDM) as a feature is supported only for workspaces where data is loaded from the GoodData Data Warehouse (ADS) using Automated Data Distribution (ADD; see Data Preparation and Distribution Pipeline). You cannot use custom fields in workspaces that load data from any other data warehouse or object storage service.

In a standard multi-tenant setup, all client workspaces within a segment share exactly the same logical data model (LDM) inherited and distributed from the segment's master workspace (see Managing Workspaces via Life Cycle Management (LCM)). In some cases, however, you may need to customize the LDM of some client workspaces within the segment. For example, some entities in the LDM may require workspace-specific (custom) fields in addition to the standard fields common for all the client workspaces. Typically, these workspace-specific fields are custom fields created on a per-client basis directly in the source application.

To implement a solution for such cases and be able to manage the custom fields in an automated way, you can use a set of the GoodData platform features and tools that, when use together, allow you to have multiple workspaces managed in a centralized manner and customize their LDMs as you need. These features and tools are:

Contents:

Limitations

  • The solution documented in this article is supported only for workspaces where data is loaded from ADS using ADD.
  • You can add custom fields only to datasets that already exist in a client workspace's LDM.
  • You can add only attributes, facts, and dates as custom fields.
  • Attributes added as custom fields can have only one label and can be used only in a one-to-many relationship (many-to-many relationships are not supported for those attributes).

Implementation Steps

To implement the solution for adding and managing custom fields in the client workspaces' LDMs, complete the following steps:

  1. Define the custom fields in ADS.
  2. Create the custom fields in the LDM.
  3. Prepare the data to load to the custom fields.
  4. Configure the ADD data loading process.
  5. Load the data to the custom fields.
  6. Preserve the custom fields.

Review the best practices to make sure that you manage the custom field in the most efficient way.

Define the Custom Fields in ADS

Define the custom fields that you want to add to the client workspaces' LDMs (the field type, the title, the client workspace and the dataset it should be added to, and so on) and the data to load to those custom fields.

Steps:

  1. Create a table in ADS with the following columns, and name it __cf_custom_fields_definition:

    Column nameTypeMandatory?DefaultDescription
    source_identifierstringyesn/aThe client ID of the workspace where the custom fields to create belong to
    datasetstringyesn/a

    The identifier of the dataset where the custom fields should be created

    Example: dataset.customers

    cf_identifierstringyesn/a

    The identifier of the custom field

    Example: attr.customers.cf_category

    NOTE: We recommend that you use the cf_ prefix in the last section of the identifier. For example, use attr.customers.cf_category instead of attr.customers.category.

    cf_typestringyesn/a

    The type of the custom field

    Possible values:

    • attribute
    • fact
    • date
    cf_titlestringyesn/a

    The title of the custom field as it will be visible to the end user

    Example: Category

    cf_aliasstringnon/a

    The alias of the custom field

    The alias must be based on the identifier of the custom field (see cf_identifier earlier in this table) and follow the naming conventions for the Output Stage objects in ADD (see Naming Convention for Output Stage Objects).

    Example: a__cf_category

    cf_selectstringnon/a

    The SQL expression used to extract the data from the Output Stage table and to load to the custom field

    For more information about storing and formatting the data, see Prepare the data to load to the custom fields.

    Example: MAPLOOKUP(MAPJSONEXTRACTOR(custom_fields), 'category')

    cf_datatypestringno

    VARCHAR(128) for attributes

    DECIMAL(12,2) for facts

    The datatype of the custom field

    NOTE: Do not specify the datatype for date custom fields.

    cf_folder_titlestringnon/a

    The title of the folder where the custom field should be created

    If the folder does not exist, it will be created.

    You can see the folders on the Manage page, under the Data tab, for attributes and facts correspondingly. By default, if the folder is not specified, the custom field is created outside of any folder.

    NOTE: Do not specify the folder for date custom fields.

    x__timestamptimestampnon/a

    (For incremental processing only) The timestamp when the last successful incremental load completed

    When Custom Field Creator runs successfully, it stores to the workspace's metadata the greatest value in the x__timestamp column of the query table. Next time Custom Field Creator retrieves the last greatest value from the workspace's metadata, and only those records whose value of the x__timestamp column is greater than the timestamp stored in the workspace's metadata are taken from the query table and are processed to create the custom fields.

    The incremental processing works similarly to the incremental load mode with the x__timestamp column. For more information, see Naming Convention for Output Stage Objects.

    You can use the following SQL script to create this table:

    CREATE TABLE __cf_custom_fields_definition
    (
       source_identifier varchar(255),
       dataset varchar(255),
       cf_identifier varchar(255),
       cf_type varchar(255),
       cf_title varchar(255),
       cf_alias varchar(255),
       cf_select varchar(10000),
       cf_datatype varchar(255),
       cf_folder_title varchar(255)
    )
       ORDER BY source_identifier
       UNSEGMENTED ALL NODES;
  2. Fill in the __cf_custom_fields_definition table with information about the custom fields.
    • Add one row for each single custom field within each client workspace.
    • Do not add rows for the standard (not custom) fields that are part of the segment's master LDM.
    • Do not add rows for the client workspaces that do not have any custom fields.

For example, you have the Customers dataset with some standard (not custom) fields that are part of the segment's master workspace LDM. You want to add custom fields to this dataset in different client workspaces:

  • Category and Size to the client workspace with the client ID of m4801
  • Manager and Revenue to the client workspace with the client ID of a9800

In this case, the __cf_custom_fields_definition table may look like this:

source_identifierdatasetcf_identifiercf_typecf_titlecf_aliascf_select
m4801dataset.customersattr.customers.cf_categoryattributeCategorya__cf_categoryMAPLOOKUP(MAPJSONEXTRACTOR(custom_fields), 'category')
m4801dataset.customersattr.customers.cf_sizeattributeSizea__cf_sizeMAPLOOKUP(MAPJSONEXTRACTOR(custom_fields), 'size')
a9800dataset.customersattr.customers.cf_mngattributeManagera__cf_mngMAPLOOKUP(MAPJSONEXTRACTOR(custom_fields), 'manager')
a9800dataset.customersfact.customers.cf_revenuefactRevenuef__cf_revenueMAPLOOKUP(MAPJSONEXTRACTOR(custom_fields), 'revenue')

You can integrate processing of this table into your data pipeline and generate it, for example, based on exports from the source system.

Now, you are going to create the custom fields in the client workspaces based on the __cf_custom_fields_definition table.

Create the Custom Fields in the LDM

To create the custom fields in the client workspaces' LDMs, you are going to use Custom Field Creator.

Custom Field Creator only adds custom fields to the LDM. It does not modify or delete any fields from the LDM.

Steps:

  1. Deploy Custom Field Creator in the service workspace (see Deploy a Data Loading Process for a Data Pipeline Brick).
  2. Schedule Custom Field Creator in the service workspace (see Schedule a Data Load).
    When scheduling Custom Field Creator, add general parameters to its schedule (see "Schedule Parameters" in Custom Field Creator).
  3. Run the Custom Field Creator process (see Run a Scheduled Data Loading Process on Demand).
    Custom Field Creator reads the data from the __cf_custom_fields_definition table, scans the LDMs in the client workspaces, and adds any custom fields that are present in the table and are not present in the LDMs.

Now, you are going to prepare the data to load to the custom fields.

Prepare the Data to Load to the Custom Fields

The data to load to custom fields in a dataset must be stored in the same Output Stage table where the data for standard (not custom) fields of this dataset is stored. When adding the data for the custom fields to the Output Stage table, format the data as flexible structures such as JSON.

If the data comes in a different format, you can transform it using SQL Executor.

For example, you have the Customers dataset with some standard (not custom) fields that are part of the segment's master workspace LDM: ID, Name, and Country. You added the custom fields to this dataset based on the __cf_custom_fields_definition table:

  • Category and Size to the client workspace with the client ID of m4801
  • Manager and Revenue to the client workspace with the client ID of a9800

To load data to these custom fields, add the custom_fields column to the out_customers table, which is the Output Stage table for the Customers dataset, and fill it with the data to load:

cp__ida__namea__countrycustom_fieldsx__client_id
1Customer ACzechia{"category":"A","size":"Small"}m4801
2Customer BUSA{"category":"B","size":"Small"}m4801
10Customer XVietnam{"manager":"John D.","revenue":35000}a9800
21Customer YUSA{"manager":"John D."}a9800

Notice the following:

  • The data to load is prepared for the Category and Size custom fields in the client workspace with the client ID of m4801.
  • The data to load is prepared for the Manager and Revenue custom fields in the client workspace with the client ID of a9800. In the last row, there is no data for the Revenue custom field.

Now, you are going to configure the ADD data loading process for loading data to the client workspaces.

Configure the ADD Data Loading Process

Once you have modified a client workspace's LDM to contain a custom field for which the corresponding Output Stage table does not physically contain a column with data, you no longer can load data to this client workspace by simply executing the associated ADD data loading process. The client workspaces now require a different SQL query to extract the data for each custom field. Therefore, you need to run the ADD data loading process in UNRESTRICTED mode that allows you to dynamically affect the SQL query and data load.

To run the ADD data load process in UNRESTRICTED mode, you need to use Schedule Executor configured with a special parameter, dataload_parameters_query. This parameter ensures that the data will be loaded to each client workspace and this data will include the data for their workspace-specific custom fields. For each client workspace, Schedule Executor runs an SQL query that returns the following information for this workspace:

  • The list of columns with the data to load to the custom fields in the workspace
  • The SQL expressions to extract the data from these columns

This information will be passed as parameters to the API running behind the ADD data loading process and will be applied to the corresponding client workspaces.

To retrieve this information, Schedule Executor queries a special table called __cf_dataload_parameters. This table aggregates the data from the other tables (including the existing __cf_custom_fields_definition table) in the format that the ADD data loading task accepts.

The __cf_dataload_parameters table does not exist in ADS yet. You are now going to create this table and fill it in with information.

Steps:

  1. Create a table in ADS that lists all datasets (by their identifiers) in all client workspaces (by their client IDs) where the data should be loaded to regardless of whether the datasets have any custom fields. Name the table __cf_workspace_datasets.
    You can use the following SQL script to create this table:

    CREATE TABLE __cf_workspace_datasets
    (
       client_id varchar(255) NOT NULL,
       dataset varchar(255) NOT NULL,
       upload_mode varchar(255),
       load_delete_side_table varchar(255),
       include_lslts_condition varchar(255),
       update_lslts varchar(255)
    )
       ORDER BY client_id,
                dataset
       UNSEGMENTED ALL NODES;

    Optionally, you can add columns that represent specific ADD parameters (such as load mode, deletion mode, and so on; see Automated Data Distribution).

  2. Create a table in ADS, and name it __cf_dataload_parameters.
    Use the following SQL script to create this table:

    CREATE TABLE __cf_dataload_parameters
    (
       client_id varchar(255),
       GDC_DATALOAD_SINGLE_RUN_LOAD_MODE varchar(255),
       GDC_DATALOAD_SKIP_VALIDATE_MAPPING varchar(255),
       GDC_DATALOAD_DATASETS varchar(65000)
    )
       ORDER BY client_id
       UNSEGMENTED ALL NODES;
  3. Deploy SQL Executor in the service workspace (see Deploy a Data Loading Process for a Data Pipeline Brick).
  4. Schedule SQL Executor in the service workspace (see Schedule a Data Load).
    When scheduling SQL Executor, add general parameters to its schedule (see "Schedule Parameters" in SQL Executor).
  5. Download the pre-defined transformation files for SQL Executor (the files are archived into unrestricted_custom_fields.zip):

    • templates/unrestricted_parameters.liquid is a Liquid template file that is used to generate the transformation SQL query.
    • param_definitions/custom_field_parameters.json is a parameter file. If you use different names of the tables or columns than those that are used in this article, you can change them in this file.
    • 99_generate_dataload_parameters.sql is an SQL file that invokes the Liquid template together with the parameter file.
  6. Unzip the archive, and place the files to the folder that is specified by the folder parameter in the schedule (see "Schedule Parameters" in SQL Executor).

    You can combine these transformation files with your standard SQL transformations.

  7. Run the SQL Executor process (see Run a Scheduled Data Loading Process on Demand).
    SQL Executor executes the Liquid template and fills in the __cf_dataload_parameters table with the information that will be passed as parameters to the API running behind the ADD data loading process and will be applied to the corresponding client workspaces.

Now, you are going to load the data to the custom fields.

Load the Data to the Custom Fields

To load the data to the custom fields, you need to run the ADD data load process in UNRESTRICTED mode by using Schedule Executor configured with the dataload_parameters_query parameter.

Steps:

  1. In each client workspace with the custom fields where the data has to be loaded, schedule the ADD data loading process (see Schedule a Data Load).
    When scheduling the ADD data loading process, add the mode parameter to its schedule and set it to the value that will later be used by Schedule Executor as an indicator to run this schedule (for example, custom_fields).
  2. Deploy Schedule Executor in the service workspace (see Deploy a Data Loading Process for a Data Pipeline Brick).
  3. Create a schedule for Schedule Executor in the service workspace (see Schedule a Data Load).
    When scheduling Schedule Executor, add general and environment-specific parameters to its schedule (see "Schedule Parameters" in Schedule Executor). In particular, make sure that you:

    • Add the dataload_parameters_query parameter and set it to the following:

      SELECT client_id, GDC_DATALOAD_SINGLE_RUN_LOAD_MODE, GDC_DATALOAD_SKIP_VALIDATE_MAPPING, GDC_DATALOAD_DATASETS FROM __cf_dataload_parameters
    • Add the list_of_modes parameter and set it to the value of the mode parameter in the ADD data loading processes that you specified at Step 1 of this procedure.
  4. Run the Schedule Executor process (see Run a Scheduled Data Loading Process on Demand).
    Schedule Executor runs the ADD data loading processes in UNRESTRICTED mode. The data for both standard (not custom) fields and the custom fields is loaded to the client workspaces.

Now, you are going to learn how to preserve the custom fields in the client workspaces.

Preserve the Custom Fields

Configure the Rollout Brick

Now that you have successfully modified the LDMs of the client workspaces and loaded the data to the custom fields, make sure these custom fields and any dashboards, insights, and metrics that might be using them are not deleted when the rollout brick (see Rollout Brick) runs and synchronizes all the client workspaces including the LDMs with the latest version of the segment's master workspace.

To preserve the custom fields and the loaded data, add the synchronize_ldm parameter to the scheduled rollout brick process and set it to diff_against_master (see "Advanced Settings" in Rollout Brick). This prevents the rollout brick from comparing the LDM of the master workspace against the LDMs of the client workspaces.

In contrast to the default diff_against_master_with_fallback value of the synchronize_ldm parameter, the diff_against_master value does not result in a workspace-level fallback in case of a failure during LDM synchronization. In an unlikely event of inconsistencies occurring between the standard (not custom) portions of the LDM in the client workspaces (for example, after failed rollout), you may need to fix the inconsistencies manually.

Maintain the Tables

  • Do not delete the custom fields from the __cf_custom_fields_definition table after the custom fields have been added to the client workspaces' LDMs. If you delete the custom fields from this table, the subsequent data load will fail.

    Removing the custom fields from the table will not remove them from the client workspace. Automated removal of custom fields is not supported.

  • Update the __cf_workspace_datasets table every time you have created a new client workspace or added a new dataset to a client workspace's LDM.

    You can define __cf_workspace_datasets as a database view based on your workspace provisioning table to keep them synchronized.

Best Practices

  • If you have a lot of client workspaces, consider using the x__timestamp column (see Define the Custom Fields in ADS), and run Custom Field Creator in incremental processing mode to reduce its run time during repeated executions.
  • Run Custom Field Creator in the following cases:
    • After you have defined new custom fields in the __cf_custom_fields_definition table to make sure that the new custom fields will be added to the existing client workspaces.
    • After you have created new client workspaces (see Provisioning Brick) to make sure that the custom fields are added to those newly created workspaces.
Powered by Atlassian Confluence and Scroll Viewport.