Recommended Practices on Managing Data Loads

When you are using the Data Integration Console or other method to execute ETL processes, consider the following:

  • When deploying a new schedule, set up a notification to inform you if the data load process has failed. See  Schedule a Process on the Data Integration Console.
  • If an ETL process is designed to load data to a workspace, data is being loaded into the system when an execution begins. If the execution is stopped, the data that has been loaded remains in the system.
  • The GoodData Platform does not prevent you from uploading duplicate data. Unless your ETL process has been designed to prevent loading duplicates, use ad-hoc executions carefully, because they may create duplicate rows of data.
  • Scheduling ETL processes to execute during business hours may impact the performance of the workspaces into which data is being loaded. Where possible, schedule regular data loads during off-peak hours.
  • All schedule timing is based on UTC. Manual scheduling entries use the cron format.