You are viewing our older product's guide. Click here for the documentation of GoodData Cloud, our latest and most advanced product.

Delete Old Data while Loading New Data via CloudConnect

CloudConnect is a legacy tool and will be discontinued. We recommend that to prepare your data you use the GoodData data pipeline as described in Data Preparation and Distribution. For data modeling, see Data Modeling in GoodData to learn how to work with Logical Data Modeler.

When setting up your CloudConnect graph to upload data to a dataset, you can configure it so that some old data are deleted from the same or different dataset. To do so, add a GD Dataset Deleter component to your graph.

While you can upload new data by any label of an attribute, deleting the old data can be done only by the primary attribute label.

The GD Dataset Deleter component works only when all the following conditions are met:

CloudConnect uses batch mode: GDC_USE_BATCH_SLI_UPLOAD=TRUE (for more details about loading modes, see Data Loading Modes in CloudConnect).
New data is uploaded to the dataset in incremental mode (see GD Dataset Writer).

If either or both of these conditions are not met, data loading will fail.

You can use GD Dataset Deleter separately to only delete some data from a dataset, without uploading any new data.

GD Dataset Deleter

We assume that you have already learned what is described in:

Summary

GD Dataset Deleter deletes the old data in one transaction with a data upload that is performed using GD Dataset Writer. This helps keep data in your project consistent and avoid situations when new data is already in the dataset but the old data are not yet removed.

Icon

Ports

Port type	Number	Required	Description	Metadata
Input	0	yes	For deleted data records	Any

This component has one input port and no output ports.

The following picture shows GD Dataset Deleter attributes.

When you select this component, you must specify a GoodData project and the dataset from which the data will be deleted. The component takes the current GoodData project by default (the project hash is stored in the GDC_PROJECT_ID parameter).

The following picture shows the dialog for choosing the target dataset.

The most important attribute of GD Dataset Deleter is Field mapping that defines how the input metadata fields map to the GoodData dataset columns (attribute and facts).

When you define mapping, you choose to delete data records either by their primary keys or fact table grain, or by their attributes or references. To set up mapping, select an input field for each dataset filed from the drop-downs. You can also set up mapping for referenced datasets and date dimensions: for a date filed, specify the corresponding date dimension.

The following picture shows the mapping dialog when you choose to delete data records by their primary keys or fact table grain.

The following picture shows the mapping dialog when you choose to delete data records by their attributes or references.

GD Dataset Deleter Attributes

Attribute	Req	Description	Possible values
BASIC
GoodData project ID	yes	Specifies the GoodData project where the target dataset resides.	The current project (the project's hash in the GDC_PROJECT_ID parameter) is used by default.
Data set	yes	The target dataset from which the data will be deleted.
Field mapping	yes	Mapping of the input fields to the dataset columns.
ADVANCED
Empty input threshold	yes
Max. retry attempts	yes	The maximum number of retries that will be attempted if the previous attempts failed.	The default is 5.
Pause between retries	yes	The delay between individual retries (in seconds).	The default is 60.

CloudConnect - Parsing XML Files

Paging Using the CloudConnect REST Connector