You are viewing our older product's guide. Click here for the documentation of GoodData Cloud, our latest and most advanced product.

Custom Field Creator

Creating custom fields in the logical data model (LDM) as a feature and Custom Field Creator in particular are supported only for workspaces where data is loaded from the GoodData Data Warehouse (ADS) using Automated Data Distribution (ADD; see Data Preparation and Distribution Pipeline). You cannot use custom fields in workspaces that load data from any other data warehouse or object storage service.

Custom Field Creator is a utility that supports the data preparation and distribution pipeline (see Data Preparation and Distribution Pipeline). Custom Field Creator creates additional fields (facts, attributes, or dates) in the logical data model (LDM) of one or more workspaces based on the criteria that you have defined (what fields to create and in what workspaces to do so).

This is a reference article for Custom Field Creator. For the article with a complete process of adding custom fields, see Add Custom Fields to the LDMs in Client Workspaces within the Same Segment.

How Custom Field Creator Works

When Custom Field Creator runs, it creates custom fields in the workspaces according to the criteria that you have defined in queries:

The types of the custom fields to create (fact, attribute, or date)
The properties of the custom fields (name, datatype, and so on)
The workspace where to create the fields in
The dataset in the LDM where each field should be created

Custom Field Creator does not modify or delete any fields from the LDM.

If you use Schedule Executor to run schedules, use it with the dataload_parameters_query parameter to run schedules for the workspaces where the LDM contains datasets with custom fields. For more information, see Schedule Executor and Add Custom Fields to the LDMs in Client Workspaces within the Same Segment.

Configuration File

Custom Field Creator does not require any parameters in the configuration file.

Schedule Parameters

When scheduling Custom Field Creator (see Phases of Building the Data Pipeline -> Production Implementation), provide the parameters from this section in the schedule.

General Parameters

Some parameters must be entered as secure parameters (see Configure Schedule Parameters).

Name	Type	Mandatory?	Secure?	Default	Description
create_mode	string	yes	no	n/a	The identifier of the workspaces where custom fields to create belong to Possible values: `client_id`: The workspaces are identified by the client IDs (see Use Automated Data Distribution). In this case, you must also enter the `domain_name` parameter (see further in this table). `project_id`: The workspaces are identified by the workspace IDs (see Find the Workspace ID). The value of this parameter defines what you provide in the `source_identifier` column of your queries.
domain_name	string	see "Description"	no	n/a	The name of the domain where the workspaces belong to This parameter is mandatory only when `create_mode` is set to `client_id`. Otherwise, do not use it.
gd_encoded_params	JSON	yes	no	n/a	The parameters coding the queries
dry_run	Boolean	no	no	`false`	Specifies whether Custom Field Creator should only generate a log with the custom fields that will be created instead of actual creating of the custom fields. If not set or set to `false`, the custom fields are created. If set to `true`, the custom fields are not created; instead, a log with the custom fields that will be created is generated.
dont_fail_on_error	Boolean	no	no	`false`	Specifies whether Custom Field Creator fails when an error occurs. If not set or set to `false`, Custom Field Creator fails when an error occurs. If set to `true`, Custom Field Creator keeps running if an error occurs. Regardless of what this parameter is set to, all errors are recorded to the log where you can review it later.
number_of_threads	integer	no	no	4	The number of threads that will be used for creating the custom fields

Queries

The queries define the criteria for processing the custom fields: what fields to create and in what workspaces to do so.

Query Structure

You can specify the queries in a GoodData Data Warehouse (ADS) table with the following columns:

Name	Type	Mandatory?	Default	Description
source_identifier	string	yes	n/a	The identifier of the workspace where custom fields to create belong to What you provide in this column depends on the value of the `create_mode` parameter (see General Parameters). If `create_mode` is set to `project_id`, provide the workspace ID. If `create_mode` is set to `client_id`, provide the client ID.
dataset	string	yes	n/a	The identifier of the dataset where the custom fields should be created Example: `dataset.customers`
cf_identifier	string	yes	n/a	The identifier of the custom field Example: `attr.customers.cf_category` NOTE: We recommend that you use the `cf_` prefix in the last section of the identifier. For example, use `attr.customers.cf_category` instead of `attr.customers.category`.
cf_type	string	yes	n/a	The type of the custom field Possible values: `attribute` `fact` `date`
cf_title	string	yes	n/a	The title of the custom field as it will be visible to the end user Example: `Category`
cf_datatype	string	no	`VARCHAR(128)` for attributes `DECIMAL(12,2)` for facts	The datatype of the custom field NOTE: Do not specify the datatype for date custom fields.
cf_folder_title	string	no	n/a	The title of the folder where the custom field should be created If the folder does not exist, it will be created. You can see the folders on the Manage page, under the Data tab, for attributes and facts correspondingly. By default, if the folder is not specified, the custom field is created outside of any folder. NOTE: Do not specify the folder for date custom fields.
data_product	string	no	`default`	(Only when you use client IDs in `source_identifier`) The data product that each workspace belongs to
x__timestamp	timestamp	no	n/a	(For incremental processing only) The timestamp when the last successful incremental load completed When Custom Field Creator runs successfully, it stores to the workspace's metadata the greatest value in the `x__timestamp` column of the query table. Next time Custom Field Creator retrieves the last greatest value from the workspace's metadata, and only those records whose value of the `x__timestamp` column is greater than the timestamp stored in the workspace's metadata are taken from the query table and are processed to create the custom fields. The incremental processing works similarly to the incremental load mode with the `x__timestamp` column (see Naming Convention for Output Stage Objects). TIP: Consider using the `x__timestamp` column when you have a lot of client workspaces to reduce the run time during repeated executions.

Here is an example of the query table where workspace IDs are used to identify the workspaces:

source_identifier	dataset	cf_identifier	cf_type	cf_datatype	cf_title	cf_folder_title	x__timestamp
e863ii0azrnng2zt4fuu81ifgqtyeoj21	dataset.employee	attr.employee.cf_address	attribute	VARCHAR(128)	Address	Employee	2021-01-22 10:22:34
e863ii0azrnng2zt4fuu81ifgqtyeoj21	dataset.employee	fact.sale.cf_sum	fact	DECIMAL(12,2)	Sales Sum	Sales	2021-01-22 3:15:40
fuu81ifgqtyeoj21e863ii0azrnng2zt4	dataset.employee	employeebirthday.dataset.dt	date		Employee Birthday		2021-01-22 6:26:41

Here is an example of the query file where client IDs are used to identify the workspaces and the data product is specified for each workspace:

source_identifier	dataset	cf_identifier	cf_type	cf_datatype	cf_title	cf_folder_title	data_product	x__timestamp
p3489	dataset.employee	attr.employee.cf_address	attribute	VARCHAR(128)	Address	Employee	default	2021-01-22 10:22:34
a9800	dataset.employee	fact.sale.cf_sum	fact	DECIMAL(12,2)	Sales Sum	Sales	testing	2021-01-22 3:15:40
m4801	dataset.employee	employeebirthday.dataset.dt	date		Employee Birthday		default	2021-01-22 6:26:41

Query Source

Custom Field Creator reads the queries from ADS. Specify the query as an SQL query to ADS.

{
  "ads_client": {
    "username": "john.doe@example.com",
    "password": "secret",
    "ads_id": "rtmmgjsqc4zmf64egtu6l6xv2xhxempi"
  },
  "input_source": {
    "type": "ads",
    "query": "SELECT source_identifier, dataset, cf_identifier, cf_type, cf_datatype, cf_title, cf_folder_title, x__timestamp FROM custom_fields_input"
  }
}

Queries as Schedule Parameters

When providing the queries as schedule parameters, you have to code them using the gd_encoded_params parameter (see General Parameters).

To code the queries, follow the instructions in Specifying Complex Parameters. Once done, you should have the gd_encoded_params parameter coding the queries and reference parameters encoding sensitive information that you will be adding as secure parameters.

Example: The queries obtained from ADS (sensitive information present: ADS password) to be coded by the gd_encoded_params and a reference parameter encoding the ADS password

The following will become the value of the gd_encoded_params parameter:

{
  "ads_client": {
    "username": "john.doe@example.com",
    "password": "${ads_client_password}",
    "ads_id": "rtmmgjsqc4zmf64egtu6l6xv2xhxempi"
  },
  "input_source": {
    "type": "ads",
    "query": "SELECT source_identifier, dataset, cf_identifier, cf_type, cf_datatype, cf_title, cf_folder_title, x__timestamp FROM custom_fields_input"
  }
}

In addition, you have to add the reference parameter ads_client_password as a secure parameter.

Schedule Example

The following is an example of how you can specify schedule parameters:

Logs and Error Handling

Custom Field Creator generates a log with all the changes made and errors occurred.

If processing for one workspace fails, Custom Field Creator logs an error and continues the execution as usual.
An execution with some workspaces processed and some failed finishes with a state of “Warning”.
An execution with all workspaces failed finishes with a state of “Failed”.

By default, Custom Field Creator fails when an error occurs. You can change this behavior by setting the dont_fail_on_error parameter to true (see General Parameters).

Best Practices

If you have a lot of client workspaces, consider using the x__timestamp column (see Query Structure), and run Custom Field Creator in incremental processing mode to reduce its run time during repeated executions.
Run Custom Field Creator in the following cases:
- After you have defined new custom fields to make sure that they will be added to the existing client workspaces.
- After you have created new client workspaces (see Provisioning Brick) to make sure that the custom fields are added to those newly created workspaces.

Object Renaming Utility

Date Dimension Loader