You are viewing our older product's guide. Click here for the documentation of GoodData Cloud, our latest and most advanced product.

Schedule Executor

Schedule Executor is a utility that supports the data preparation and distribution pipeline (see Data Preparation and Distribution Pipeline). Schedule Executor runs schedules in one or more workspaces based on the criteria that you have defined (what workspaces to look into and what schedules to run).

How Schedule Executor Works

When Schedule Executor runs, it searches through the workspaces that you have defined (either all the workspaces that Schedule Executor can access or only the workspaces in a specific environment), collects all the schedules from those workspaces, and looks for the parameter called mode in the collected schedules (see Configure Schedule Parameters). If the parameter exists in a schedule and is set to one of the values of the list_of_modes parameter in Schedule Executor itself, the schedule is run.

For example, you have schedules in your workspaces set up like this:

If you execute Schedule Executor with the list_of_modes parameter set to red|blue, the following will happen:

Schedule Executor will run Schedule A because the mode parameter of Schedule A is set to red, and red is one of the values of the list_of_modes parameter in Schedule Executor.
Schedule Executor will not run Schedule B because the value of Schedule B’s mode parameter is not listed among the values of the list_of_modes parameter in Schedule Executor.
Schedule C will not be even considered for running because it does not have the mode parameter at all.

Schedule Executor only starts the schedules. It does not wait for them to finish. Schedule Executor often finishes before the schedules.

Configuration File

Schedule Executor does not require any parameters in the configuration file.

Schedule Parameters

When scheduling Schedule Executor (see Phases of Building the Data Pipeline -> Production Implementation), provide the parameters from this section in the schedule.

General Parameters

Some parameters must be entered as secure parameters (see Configure Schedule Parameters).

Name	Type	Mandatory?	Secure?	Default	Description
list_of_modes	string	yes	no	n/a	See list_of_modes.
number_of_schedules_in_batch	integer	no	no	1000	The number of schedules to be started in one run
delay_between_batches	integer	no	no	1	The number of seconds between batch executions
thread_count	integer	no	no	5	The number of threads that will be used for running schedules
set_retry	Boolean	no	no	`false`	Specifies whether a failed schedule should be restarted. If not set or set to `false`, a failed schedule is not restarted. If set to `true`, a failed schedule is automatically restarted.
dry_run	Boolean	no	no	`false`	Specifies whether Schedule Executor should only generate a log with the schedules that will be run instead of actual running of the schedules. If not set or set to `false`, the schedules are run. If set to `true`, the schedules are not run; instead, a log with the schedules that will be run is generated.
control_parameter	string	no	no	`mode`	The parameter in a schedule that you want to use instead of the default `mode` parameter for specifying the value that Schedule Executor should compare to the values of its list_of_modes parameter Use this parameter when your schedules do not have the `mode` parameter and you cannot add it. You can then pick any other parameter from an existing schedule's parameters that Schedule Executor should look into instead of the `mode` parameter.
work_done_identificator	string	no	no	`ignore`	Specifies whether Schedule Executor should automatically access all the workspaces that you have defined. If not set or set to `ignore`, Schedule Executor automatically accesses all the workspaces that you have defined. If set to the name of a metadata key in the workspace's metadata storage, Schedule Executor relies on the value of the specified metadata key to decide whether to access the workspaces. NOTE: The default value, `ignore`, covers most use cases of using Schedule Executor, and we recommend that you keep it. For complicated data loading scenarios, see this section for information about how to set up the metadata key.
dataload_parameters_query	string	no	no	n/a	The SQL query for getting additional parameters that will be added to the schedules at execution NOTE: If this parameter is specified, the `execution_params` parameter is ignored. Example: You have a workspace with custom fields (such as attributes or facts) in the logical data model's datasets. To do so, you need to run data loading processes for this workspace in `UNRESTRICTED` mode with specifying the columns in the Output Stage table from which the data should be loaded to the custom fields. Use this parameter to instruct Schedule Executor to run the data load processes in `UNRESTRICTED` mode. `"dataload_parameters_query": "SELECT client_id, GDC_DATALOAD_SINGLE_RUN_LOAD_MODE, GDC_DATALOAD_SKIP_VALIDATE_MAPPING, GDC_DATALOAD_DATASETS FROM __cf_dataload_parameters"` For a complete process of adding custom fields and using this parameter, see Add Custom Fields to the LDMs in Client Workspaces within the Same Segment.
execution_params	JSON	no	no	n/a	Additional parameters that will be added to the schedules at execution Format: `"execution_params": { "{parameter_1_name}": "{parameter_1_value}", "{parameter_2_name}": "{parameter_2_value}" }` You must encode this parameter using the `gd_encoded_params` parameter (see Specifying Complex Parameters). NOTE: This parameter is ignored if the `dataload_parameters_query` parameter is specified.

list_of_modes

The list_of_modes parameter contains the values that Schedule Executor will compare to the value of the mode parameter in schedules.

If Schedule Executor finds the mode parameter in a schedule and the parameter’s value is one of the values listed in list_of_modes, the schedule will be run.
If Schedule Executor finds the mode parameter in the schedule but the parameter’s value is not one of the values listed in list_of_modes, the schedule will not be run.
If Schedule Executor does not find the mode parameter in the schedule, the schedule will not be run.

If you provide multiple values, separate them by a vertical bar (|).

"list_of_modes": "red|green|blue"

In this example, Schedule Executor will run the schedules whose mode parameter is set to red, green, or blue, and will ignore all the other schedules.

Environment-Specific Parameters

By default, Schedule Executor runs the schedules in all the workspaces that the user under whom Schedule Executor runs can access.

You can narrow down the list of the workspaces that Schedule Executor should access and collect schedules from by specifying an environment that the workspaces are related to:

Life Cycle Management (LCM): Schedule Executor accesses only the workspaces that are client workspaces within a specific segment in the specified data product and the domain. For more information about LCM, see Managing Workspaces via Life Cycle Management.
Data Warehouse (ADS): Schedule Executor accesses only the workspaces that are returned by the SQL query that you constructed and executed against a specific Agile Data Warehousing Service (ADS) instance.

Some parameters must be entered as secure parameters (see Configure Schedule Parameters).

For LCM, you must schedule Schedule Executor as a domain admin.

Environment	Parameter Name	Type	Mandatory?	Secure?	Default	Description
LCM	domain	string	yes	no	n/a	The name of the domain where the workspaces to access belong to
	segment_list	string	yes	no	n/a	The name of the segments in the specified domain where the workspaces to access belong to If you provide multiple segments, separate them by a vertical bar (`\|`). Example: `"segment_list": "basic\|premium\|gold"`
	data_product	string	no	no	`default`	The data product that contains the segments where the workspaces to access belong to
ADS	ads_instance	string	yes	no	n/a	The ID of the ADS instance to access
	ads_username	string	yes	no	n/a	The access username to the ADS instance
	ads_password	string	yes	yes	n/a	The password for the user that you specified in the `ads_username` parameter
	query	string	yes	no	n/a	The SQL query for getting the workspaces that Schedule Executor should access and collect schedules from To identify the workspaces, the SQL query must refer to either workspace IDs (see Find the Workspace ID) or client IDs (see Use Automated Data Distribution). The expected name for the column storing the workspace IDs is `project_id` (project_id is a legacy term for workspace_ID). The expected name for the column storing client IDs is `client_id`. If the query refers to the client IDs, you must also enter the `ads_domain` and `data_product` parameters (see further in this table). The SQL query can refer to a priority, which is the number that determines in what order the workspaces must be processed (the smaller the number is, the higher the priority is). The expected name for the column storing the priorities is `priority`. Example: `"query": "SELECT client_id, priority FROM clients"`
	ads_domain	string	see "Description"	no	n/a	The name of the domain where the workspaces to access belong to This parameter is mandatory only when you identify the workspaces by their client IDs (that is, the SQL query in the `query` parameter refers to the client IDs). Otherwise, do not use it.
	data_product	string	see "Description"	no	n/a	The data product within the specified domain where the workspaces to access belong to This parameter is mandatory only when you identify the workspaces by their client IDs (that is, the SQL query in the `query` parameter refers to the client IDs). Otherwise, do not use it.

Schedule Examples

Example 1: Run schedules in all accessible workspaces based on the values of the schedules' mode parameter.

Example 2: Run schedules in all the client workspaces in the specified LCM segments based on the values of the schedules' mode parameter.

Example 3: Run schedules in all accessible workspaces based on the values of the schedules' mode parameter but only after ADS Integrator has uploaded some data to ADS and set the ads_integrator_ran metadata key to true. For more information, see this section.

Example 4: Run schedules based on the values of the schedules' mode parameter in the workspaces returned by the SQL query that you constructed and executed against the specified ADS instance.

Resource Limitation

The schedules are run in batches. The default number of schedules in a batch is 1000 (see the number_of_schedules_in_batch parameter), which is relatively large. In certain cases, however, the data loading processes that are executed through the schedules use the same resource (for example, a table in the database), therefore you may need to check for potential conflicts and configure how many schedules Schedule Executor will be running in one batch and/or how much time Schedule Executor should wait before it starts another batch.

To do so, use the number_of_schedules_in_batch parameter and the delay_between_batches parameter (see General Parameters). For example, if you have 100 schedules to run and want Schedule Executor to run those schedules in five batches (20 schedules per batch) and to wait five minutes between batches, set the number_of_schedules_in_batch parameter to 20 and the delay_between_batches parameter to 300 (the number of seconds in five minutes).

Advanced Settings

Run Schedule Executor Under a Different User

If you want to execute Schedule Executor under a different user than the default one, provide the following four parameters in the schedule. Otherwise, do not use any of these parameters and skip this section.

Some parameters must be entered as secure parameters (see Configure Schedule Parameters).

Name	Type	Secure?	Default	Description
CLIENT_GDC_HOSTNAME	string	no	`secure.gooddata.com`	The white-labeled domain name in the format of `your.domain.com` (for example, `analytics.mycompany.com`) The parameter name is case-sensitive and must be written in uppercase.
CLIENT_GDC_PROTOCOL	string	no	`https`	The protocol to transfer data over Explicitly set this parameter to `https`. The parameter name is case-sensitive and must be written in uppercase.
GDC_USERNAME	string	no	n/a	The user under whom you want to execute Schedule Executor The parameter name is case-sensitive and must be written in uppercase.
GDC_PASSWORD	string	yes	n/a	The password for the user that you specified in the `GDC_USERNAME` parameter The parameter name is case-sensitive and must be written in uppercase.

Run Schedule Executor Only After ADS Integrator Has Processed Some Data

If new data is loaded from your data sources frequently (for example, every hour) or without a predictable schedule, you may want to run data loading tasks only after ADS Integrator (see ADS Integrator) has uploaded some data to ADS.

Every time ADS Integrator has processed data and integrated it into ADS, it takes the value of the notification_metadata parameter from the metadata storage of the workspace where it is deployed (typically, it is the service workspace; see Data Preparation and Distribution Pipeline), creates a metadata key with the same name in the metadata storage, and sets this key to true.

You can use this functionality to instruct Schedule Executor to run the schedules only after ADS Integrator has switched the metadata key to true.

Steps:

Set the notification_metadata parameter to some value that you will later be using as the name of the special key for Schedule Executor. To do so, use the API for creating a key-value pair in the workspace metadata storage. For example, you can set it to ads_integrator_ran:
```
"notification_metadata": "ads_integrator_ran"
```
ADS Integrator will be creating and setting the ads_integrator_ran metadata key to true every time it has processed some data and integrated it into ADS.
Use the value from Step 1 as the value of the work_done_identificator parameter in Schedule Executor:
```
"work_done_identificator": "ads_integrator_ran"
```
Schedule Executor will look for this key in the workspace’s metadata. If this key is set to true (which means that ADS Integrator has processed data and integrated it into ADS), Schedule Executor will run the schedules.

After Schedule Executor has run the schedules, it will reset the metadata key to false. Next time Schedule Executor will run the schedules again after ADS Integrator sets it back to true.

If you cannot or do not want to use the notification_metadata parameter, you can set up your own special key in the workspace’s metadata storage and use this key name as a value of the work_done_identificator parameter in Schedule Executor. Schedule Executor will run the schedules only when you set your special key to true.

After Schedule Executor has run the schedules, it will reset the key to false. Whenever you want Schedule Executor to run the schedules again, you must set the key back to true.

Object Renaming Utility