Delete Old Data while Loading New Data via CloudConnect
CloudConnect is a legacy tool and will be discontinued. We recommend that to prepare your data you use the GoodData data pipeline as described in Data Preparation and Distribution. For data modeling, see Data Modeling in GoodData to learn how to work with Logical Data Modeler.
When setting up your CloudConnect graph to upload data to a dataset, you can configure it so that some old data are deleted from the same or different dataset. To do so, add a GD Dataset Deleter component to your graph.
While you can upload new data by any label of an attribute, deleting the old data can be done only by the primary attribute label.
The GD Dataset Deleter component works only when all the following conditions are met:
- CloudConnect uses batch mode:
GDC_USE_BATCH_SLI_UPLOAD=TRUE
(for more details about loading modes, see Data Loading Modes in CloudConnect). - New data is uploaded to the dataset in incremental mode (see GD Dataset Writer).
If either or both of these conditions are not met, data loading will fail.
You can use GD Dataset Deleter separately to only delete some data from a dataset, without uploading any new data.
GD Dataset Deleter
We assume that you have already learned what is described in:
Summary
GD Dataset Deleter deletes the old data in one transaction with a data upload that is performed using GD Dataset Writer. This helps keep data in your project consistent and avoid situations when new data is already in the dataset but the old data are not yet removed.
Icon
Ports
Port type | Number | Required | Description | Metadata |
---|---|---|---|---|
Input | 0 | yes | For deleted data records | Any |
This component has one input port and no output ports.
The following picture shows GD Dataset Deleter attributes.
When you select this component, you must specify a GoodData project and the dataset from which the data will be deleted. The component takes the current GoodData project by default (the project hash is stored in the GDC_PROJECT_ID
parameter).
The following picture shows the dialog for choosing the target dataset.
The most important attribute of GD Dataset Deleter is Field mapping that defines how the input metadata fields map to the GoodData dataset columns (attribute and facts).
When you define mapping, you choose to delete data records either by their primary keys or fact table grain, or by their attributes or references. To set up mapping, select an input field for each dataset filed from the drop-downs. You can also set up mapping for referenced datasets and date dimensions: for a date filed, specify the corresponding date dimension.
The following picture shows the mapping dialog when you choose to delete data records by their primary keys or fact table grain.
The following picture shows the mapping dialog when you choose to delete data records by their attributes or references.
GD Dataset Deleter Attributes
Attribute | Req | Description | Possible values |
---|---|---|---|
BASIC | |||
GoodData project ID | yes | Specifies the GoodData project where the target dataset resides. | The current project (the project's hash in the GDC_PROJECT_ID parameter) is used by default. |
Data set | yes | The target dataset from which the data will be deleted. | |
Field mapping | yes | Mapping of the input fields to the dataset columns. | |
ADVANCED | |||
Empty input threshold | yes | ||
Max. retry attempts | yes | The maximum number of retries that will be attempted if the previous attempts failed. | The default is 5. |
Pause between retries | yes | The delay between individual retries (in seconds). | The default is 60. |