Custom Field Creator

Custom Field Creator is a utility that supports the data preparation and distribution pipeline (see Data Preparation and Distribution Pipeline). Custom Field Creator creates additional fields (facts, attributes, or dates) in the logical data model (LDM) of one or more workspaces based on the criteria that you have defined (what fields to create and in what workspaces to do so).

How Custom Field Creator Works

When Custom Field Creator runs, it creates custom fields in the workspaces according to the criteria that you have defined in queries:

  • The types of the custom fields to create (fact, attribute, or date)
  • The properties of the custom fields (name, datatype, and so on)
  • The workspace where to create the fields in
  • The dataset in the LDM where each field should be created

Custom Field Creator does not modify or delete any fields from the LDM.

Configuration File

Custom Field Creator does not require any parameters in the configuration file.

Schedule Parameters

When scheduling Custom Field Creator (see Phases of Building the Data Pipeline -> Production Implementation), provide the parameters from this section in the schedule.

General Parameters

Some parameters must be entered as secure parameters (see Configure Schedule Parameters).

NameTypeMandatory?Secure?DefaultDescription
create_modestringyesnon/a

The identifier of the workspaces where custom fields to create belong to

Possible values:

  • client_id: The workspaces are identified by the client IDs (see Use Automated Data Distribution). In this case, you must also enter the domain_name parameter (see further in this table).
  • project_id: The workspaces are identified by the workspace IDs (see Find the Workspace ID).

The value of this parameter defines what you provide in the source_identifier column of your queries.

domain_namestringsee "Description"non/a

The name of the domain where the workspaces belong to

This parameter is mandatory only when create_mode is set to client_id. Otherwise, do not use it.

gd_encoded_paramsJSONyesnon/aThe parameters coding the queries
dry_runBooleannonofalse

Specifies whether Custom Field Creator should only generate a log with the custom fields that will be created instead of actual creating of the custom fields.

  • If not set or set to false, the custom fields are created.
  • If set to true, the custom fields are not created; instead, a log with the custom fields that will be created is generated.
dont_fail_on_errorBooleannonofalse

Specifies whether Custom Field Creator fails when an error occurs.

  • If not set or set to false, Custom Field Creator fails when an error occurs.
  • If set to true, Custom Field Creator keeps running if an error occurs.

Regardless of what this parameter is set to, all errors are recorded to the log where you can review it later.

number_of_threadsintegernono4

The number of threads that will be used for creating the custom fields

Queries

The queries define the criteria for processing the custom fields: what fields to create and in what workspaces to do so.

Query Structure

You can specify the queries in a GoodData Data Warehouse (ADS) table with the following columns:

NameTypeMandatory?DefaultDescription
source_identifierstringyesn/a

The identifier of the workspace where custom fields to create belong to

What you provide in this column depends on the value of the create_mode parameter (see General Parameters).

  • If create_mode is set to project_id, provide the workspace ID.
  • If create_mode is set to client_id, provide the client ID.
datasetstringyesn/a

The identifier of the dataset where the custom fields should be created

Example: dataset.customers

cf_identifierstringyesn/a

The identifier of the custom field

Example: attr.customers.cf_category

NOTE: We recommend that you use the cf_ prefix in the last section of the identifier. For example, use attr.customers.cf_category instead of attr.customers.category.

cf_typestringyesn/a

The type of the custom field

Possible values:

  • attribute
  • fact
  • date
cf_titlestringyesn/a

The title of the custom field as it will be visible to the end user

Example: Category

cf_datatypestringno

VARCHAR(128) for attributes

DECIMAL(12,2) for facts

The datatype of the custom field

NOTE: Do not specify the datatype for date custom fields.

cf_folder_titlestringnon/a

The title of the folder where the custom field should be created

If the folder does not exist, it will be created.

You can see the folders on the Manage page, under the Data tab, for attributes and facts correspondingly. By default, if the folder is not specified, the custom field is created outside of any folder.

NOTE: Do not specify the folder for date custom fields.

data_productstringnodefault(Only when you use client IDs in source_identifier) The data product that each workspace belongs to
x__timestamptimestampnon/a

(For incremental processing only) The timestamp when the last successful incremental load completed

When Custom Field Creator runs successfully, it stores to the workspace's metadata the greatest value in the x__timestamp column of the query table. Next time Custom Field Creator retrieves the last greatest value from the workspace's metadata, and only those records whose value of the x__timestamp column is greater than the timestamp stored in the workspace's metadata are taken from the query table and are processed to create the custom fields.

The incremental processing works similarly to the incremental load mode with the x__timestamp column (see Naming Convention for Output Stage Objects).

TIP: Consider using the x__timestamp column when you have a lot of client workspaces to reduce the run time during repeated executions.

 

Here is an example of the query table where workspace IDs are used to identify the workspaces:

source_identifierdatasetcf_identifiercf_typecf_datatypecf_titlecf_folder_titlex__timestamp
e863ii0azrnng2zt4fuu81ifgqtyeoj21dataset.employeeattr.employee.cf_addressattributeVARCHAR(128)AddressEmployee2021-01-22 10:22:34
e863ii0azrnng2zt4fuu81ifgqtyeoj21dataset.employeefact.sale.cf_sumfactDECIMAL(12,2)Sales SumSales2021-01-22 3:15:40
fuu81ifgqtyeoj21e863ii0azrnng2zt4dataset.employeeemployeebirthday.dataset.dtdate Employee Birthday 2021-01-22 6:26:41

 

Here is an example of the query file where client IDs are used to identify the workspaces and the data product is specified for each workspace:

source_identifierdatasetcf_identifiercf_typecf_datatypecf_titlecf_folder_titledata_productx__timestamp
p3489dataset.employeeattr.employee.cf_addressattributeVARCHAR(128)AddressEmployeedefault2021-01-22 10:22:34
a9800dataset.employeefact.sale.cf_sumfactDECIMAL(12,2)Sales SumSalestesting2021-01-22 3:15:40
m4801dataset.employeeemployeebirthday.dataset.dtdate Employee Birthday default2021-01-22 6:26:41

Query Source

Custom Field Creator reads the queries from ADS. Specify the query as an SQL query to ADS.

{
  "ads_client": {
    "username": "john.doe@example.com",
    "password": "secret",
    "ads_id": "rtmmgjsqc4zmf64egtu6l6xv2xhxempi"
  },
  "input_source": {
    "type": "ads",
    "query": "SELECT source_identifier, dataset, cf_identifier, cf_type, cf_datatype, cf_title, cf_folder_title, x__timestamp FROM custom_fields_input"
  }
}

Queries as Schedule Parameters

When providing the queries as schedule parameters, you have to code them using the gd_encoded_params parameter (see General Parameters).

To code the queries, follow the instructions in Specifying Complex Parameters. Once done, you should have the gd_encoded_params parameter coding the queries and reference parameters encoding sensitive information that you will be adding as secure parameters.

Example: The queries obtained from ADS (sensitive information present: ADS password) to be coded by the gd_encoded_params and a reference parameter encoding the ADS password

The following will become the value of the gd_encoded_params parameter:

{
  "ads_client": {
    "username": "john.doe@example.com",
    "password": "${ads_client_password}",
    "ads_id": "rtmmgjsqc4zmf64egtu6l6xv2xhxempi"
  },
  "input_source": {
    "type": "ads",
    "query": "SELECT source_identifier, dataset, cf_identifier, cf_type, cf_datatype, cf_title, cf_folder_title, x__timestamp FROM custom_fields_input"
  }
}

In addition, you have to add the reference parameter ads_client_password as a secure parameter.

Schedule Example

The following is an example of how you can specify schedule parameters:

Logs and Error Handling

Custom Field Creator generates a log with all the changes made and errors occurred.

  • If processing for one workspace fails, Custom Field Creator logs an error and continues the execution as usual.
  • An execution with some workspaces processed and some failed finishes with a state of “Warning”.
  • An execution with all workspaces failed finishes with a state of “Failed”.

By default, Custom Field Creator fails when an error occurs. You can change this behavior by setting the dont_fail_on_error parameter to true (see General Parameters).

Best Practices

  • If you have a lot of client workspaces, consider using the x__timestamp column (see Query Structure), and run Custom Field Creator in incremental processing mode to reduce its run time during repeated executions.
  • Run Custom Field Creator in the following cases: 
    • After you have defined new custom fields to make sure that they will be added to the existing client workspaces.
    • After you have created new client workspaces (see Provisioning Brick) to make sure that the custom fields are added to those newly created workspaces.