Schedule Executor

Schedule Executor is a utility that supports the data preparation and distribution pipeline (see Data Preparation and Distribution Pipeline). Schedule Executor runs schedules in one or more workspaces based on the criteria that you have defined (what workspaces to look into and what schedules to run).

How Schedule Executor Works

When Schedule Executor runs, it searches through the workspaces that you have defined (either all the workspaces that Schedule Executor can access or only the workspaces in a specific environment), collects all the schedules from those workspaces, and looks for the parameter called mode in the collected schedules (see Configure Schedule Parameters). If the parameter exists in a schedule and is set to one of the values of the list_of_modes parameter in Schedule Executor itself, the schedule is run.

For example, you have schedules in your workspaces set up like this:

If you execute Schedule Executor with the list_of_modes parameter set to red|blue, the following will happen:

  • Schedule Executor will run Schedule A because the mode parameter of Schedule A is set to red, and red is one of the values of the list_of_modes parameter in Schedule Executor.
  • Schedule Executor will not run Schedule B because the value of Schedule B’s mode parameter is not listed among the values of the list_of_modes parameter in Schedule Executor.
  • Schedule C will not be even considered for running because it does not have the mode parameter at all.

Configuration File

Schedule Executor does not require any parameters in the configuration file.

Schedule Parameters

When scheduling Schedule Executor (see Phases of Building the Data Pipeline -> Production Implementation), provide the parameters from this section in the schedule.

General Parameters

Some parameters must be entered as secure parameters (see Configure Schedule Parameters).

NameTypeMandatory?Secure?DefaultDescription

list_of_modes

string

yes

no

n/a

See list_of_modes.

number_of_schedules_in_batch

integernono1000The number of schedules to be started in one run

delay_between_batches

integernono1The number of seconds between batch executions

thread_count

integernono5The number of threads that will be used for running schedules

set_retry

Booleannonofalse

Specifies whether a failed schedule should be restarted.

  • If not set or set to false, a failed schedule is not restarted.
  • If set to true, a failed schedule is automatically restarted.

dry_run

Booleannonofalse

Specifies whether Schedule Executor should only generate a log with the schedules that will be run instead of actual running of the schedules.

  • If not set or set to false, the schedules are run.
  • If set to true, the schedules are not run; instead, a log with the schedules that will be run is generated.

control_parameter

stringnonomode

The parameter in a schedule that you want to use instead of the default mode parameter for specifying the value that Schedule Executor should compare to the values of its list_of_modes parameter

Use this parameter when your schedules do not have the mode parameter and you cannot add it. You can then pick any other parameter from an existing schedule's parameters that Schedule Executor should look into instead of the mode parameter.

work_done_identificator

stringnonoignore

Specifies whether Schedule Executor should automatically access all the workspaces that you have defined.

  • If not set or set to ignore, Schedule Executor automatically accesses all the workspaces that you have defined.
  • If set to the name of a metadata key in the workspace's metadata storage, Schedule Executor relies on the value of the specified metadata key to decide whether to access the workspaces.

NOTE: The default value, ignore, covers most use cases of using Schedule Executor, and we recommend that you keep it. For complicated data loading scenarios, see this section for information about how to set up the metadata key.

dataload_parameters_querystringnonon/a

The SQL query for getting additional parameters that will be added to the schedules at execution

NOTE: If this parameter is specified, the execution_params parameter is ignored.

Example: You have a workspace with custom fields (such as attributes or facts) in the logical data model's datasets. To do so, you need to run data loading processes for this workspace in UNRESTRICTED mode with specifying the columns in the Output Stage table from which the data should be loaded to the custom fields.

Use this parameter to instruct Schedule Executor to run the data load processes in UNRESTRICTED mode.

"dataload_parameters_query": "SELECT client_id, GDC_DATALOAD_SINGLE_RUN_LOAD_MODE, GDC_DATALOAD_SKIP_VALIDATE_MAPPING, GDC_DATALOAD_DATASETS FROM __cf_dataload_parameters"

For a complete process of adding custom fields and using this parameter, see Add Custom Fields to the LDMs in Client Workspaces within the Same Segment.

execution_params

JSONnonon/a

Additional parameters that will be added to the schedules at execution Format:

"execution_params": {
  "{parameter_1_name}": "{parameter_1_value}",
  "{parameter_2_name}": "{parameter_2_value}"
}

You must encode this parameter using the gd_encoded_params parameter (see Specifying Complex Parameters).

NOTE: This parameter is ignored if the dataload_parameters_query parameter is specified.

list_of_modes

The list_of_modes parameter contains the values that Schedule Executor will compare to the value of the mode parameter in schedules.

  • If Schedule Executor finds the mode parameter in a schedule and the parameter’s value is one of the values listed in list_of_modes, the schedule will be run.
  • If Schedule Executor finds the mode parameter in the schedule but the parameter’s value is not one of the values listed in list_of_modes, the schedule will not be run.
  • If Schedule Executor does not find the mode parameter in the schedule, the schedule will not be run.

If you provide multiple values, separate them by a vertical bar (|).

"list_of_modes": "red|green|blue"

In this example, Schedule Executor will run the schedules whose mode parameter is set to red, green, or blue, and will ignore all the other schedules.

Environment-Specific Parameters

By default, Schedule Executor runs the schedules in all the workspaces that the user under whom Schedule Executor runs can access.

You can narrow down the list of the workspaces that Schedule Executor should access and collect schedules from by specifying an environment that the workspaces are related to:

  • Life Cycle Management (LCM): Schedule Executor accesses only the workspaces that are client workspaces within a specific segment in the specified data product and the domain. For more information about LCM, see Managing Workspaces via Life Cycle Management.
  • Data Warehouse (ADS): Schedule Executor accesses only the workspaces that are returned by the SQL query that you constructed and executed against a specific Agile Data Warehousing Service (ADS) instance.

Some parameters must be entered as secure parameters (see Configure Schedule Parameters).

EnvironmentParameter NameTypeMandatory?Secure?DefaultDescription
LCM

domain

string

yes

no

n/a

The name of the domain where the workspaces to access belong to

segment_list

stringyesnon/a

The name of the segments in the specified domain where the workspaces to access belong to

If you provide multiple segments, separate them by a vertical bar (|).

Example:

"segment_list": "basic|premium|gold"
data_productstringnonodefaultThe data product that contains the segments where the workspaces to access belong to
ADS      

ads_instance

string

yes

no

n/a

The ID of the ADS instance to access

ads_usernamestringyesnon/aThe access username to the ADS instance
ads_passwordstringyesyesn/a

The password for the user that you specified in the ads_username parameter

querystringyesnon/a

The SQL query for getting the workspaces that Schedule Executor should access and collect schedules from

  • To identify the workspaces, the SQL query must refer to either workspace IDs (see Find the Workspace ID) or client IDs (see Use Automated Data Distribution). 
    • The expected name for the column storing the workspace IDs is project_id (project_id is a legacy term for workspace_ID).
    • The expected name for the column storing client IDs is client_id.
  • If the query refers to the client IDs, you must also enter the ads_domain and data_product parameters (see further in this table).
  • The SQL query can refer to a priority, which is the number that determines in what order the workspaces must be processed (the smaller the number is, the higher the priority is). The expected name for the column storing the priorities is priority.

Example:

"query": "SELECT client_id, priority FROM clients"
ads_domainstringsee "Description"non/a

The name of the domain where the workspaces to access belong to

This parameter is mandatory only when you identify the workspaces by their client IDs (that is, the SQL query in the query parameter refers to the client IDs). Otherwise, do not use it.

data_productstringsee "Description"non/a

The data product within the specified domain where the workspaces to access belong to

This parameter is mandatory only when you identify the workspaces by their client IDs (that is, the SQL query in the query parameter refers to the client IDs). Otherwise, do not use it.

Schedule Examples

Example 1: Run schedules in all accessible workspaces based on the values of the schedules' mode parameter.

 

Example 2: Run schedules in all the client workspaces in the specified LCM segments based on the values of the schedules' mode parameter.

 

Example 3: Run schedules in all accessible workspaces based on the values of the schedules' mode parameter but only after ADS Integrator has uploaded some data to ADS and set the ads_integrator_ran metadata key to true. For more information, see this section.

 

Example 4: Run schedules based on the values of the schedules' mode parameter in the workspaces returned by the SQL query that you constructed and executed against the specified ADS instance.

Resource Limitation

The schedules are run in batches. The default number of schedules in a batch is 1000 (see the number_of_schedules_in_batch parameter), which is relatively large. In certain cases, however, the data loading processes that are executed through the schedules use the same resource (for example, a table in the database), therefore you may need to check for potential conflicts and configure how many schedules Schedule Executor will be running in one batch and/or how much time Schedule Executor should wait before it starts another batch.

To do so, use the number_of_schedules_in_batch parameter and the delay_between_batches parameter (see General Parameters). For example, if you have 100 schedules to run and want Schedule Executor to run those schedules in five batches (20 schedules per batch) and to wait five minutes between batches, set the number_of_schedules_in_batch parameter to 20 and the delay_between_batches parameter to 300 (the number of seconds in five minutes).

Advanced Settings

Run Schedule Executor Under a Different User

If you want to execute Schedule Executor under a different user than the default one, provide the following four parameters in the schedule. Otherwise, do not use any of these parameters and skip this section.

Some parameters must be entered as secure parameters (see Configure Schedule Parameters).

NameTypeSecure?DefaultDescription
CLIENT_GDC_HOSTNAMEstringnosecure.gooddata.com

The white-labeled domain name in the format of your.domain.com (for example, analytics.mycompany.com)

The parameter name is case-sensitive and must be written in uppercase.

CLIENT_GDC_PROTOCOLstringnohttps

The protocol to transfer data over

Explicitly set this parameter to https.

The parameter name is case-sensitive and must be written in uppercase.

GDC_USERNAME

stringnon/a

The user under whom you want to execute Schedule Executor

The parameter name is case-sensitive and must be written in uppercase.

GDC_PASSWORDstringyesn/a

The password for the user that you specified in the GDC_USERNAME parameter

The parameter name is case-sensitive and must be written in uppercase.

Run Schedule Executor Only After ADS Integrator Has Processed Some Data

If new data is loaded from your data sources frequently (for example, every hour) or without a predictable schedule, you may want to run data loading tasks only after ADS Integrator (see ADS Integrator) has uploaded some data to ADS.

Every time ADS Integrator has processed data and integrated it into ADS, it takes the value of the notification_metadata parameter from the metadata storage of the workspace where it is deployed (typically, it is the service workspace; see Data Preparation and Distribution Pipeline), creates a metadata key with the same name in the metadata storage, and sets this key to true.

You can use this functionality to instruct Schedule Executor to run the schedules only after ADS Integrator has switched the metadata key to true.

Steps:

  1. Set the notification_metadata parameter to some value that you will later be using as the name of the special key for Schedule Executor. To do so, use the API for creating a key-value pair in the workspace metadata storage. For example, you can set it to ads_integrator_ran:

    "notification_metadata": "ads_integrator_ran"
    

    ADS Integrator will be creating and setting the ads_integrator_ran metadata key to true every time it has processed some data and integrated it into ADS.

  2. Use the value from Step 1 as the value of the work_done_identificator parameter in Schedule Executor:

    "work_done_identificator": "ads_integrator_ran"
    

    Schedule Executor will look for this key in the workspace’s metadata. If this key is set to true (which means that ADS Integrator has processed data and integrated it into ADS), Schedule Executor will run the schedules.

After Schedule Executor has run the schedules, it will reset the metadata key to false. Next time Schedule Executor will run the schedules again after ADS Integrator sets it back to true.