Automated Data Distribution v2 for Object Storage Services
Automated Data Distribution (ADD) v2 for object storage services is a feature of the GoodData platform that enables you to quickly upload data from an object storage service (for example, Amazon S3 or Microsoft Azure Blob) to one or multiple workspaces.
If you upload data from a data warehouse, see Automated Data Distribution v2 for Data Warehouses.
ADD v2 supposes that you have already run all the needed transformations on top of the data in your object storage service, and the data is ready to be distributed to your workspaces.
Distributing Data to Workspaces
Data distribution is a process of fetching data from your object storage service and loading and distributing it to your customers' workspaces. This process ensures data separation at the workspace level.
You can distribute data to:
- Single workspace When you need to load data to a workspace with a specific logical data model (LDM), you can map all datasets from the LDM to the source files in your object storage service using the naming convention (see Naming Convention for Source Files in Automated Data Distribution v2 for Object Storage Services).
- Multiple workspaces ADD v2 uses the client ID to distinguish what data should be loaded to what workspace.
For more information about organizing your workspaces, see Set Up Automated Data Distribution v2 for Object Storage Services.
How Data Is Distributed Based on the Client ID
Use the client ID if you want to instruct ADD v2 what data should be loaded to what workspaces. ADD v2 then distributes data to your workspaces based on their client IDs.
Each client workspace is assigned a client ID. The data in the source files in your object storage service contains the x__client_id
column. For each data record, this column holds a value of the client ID that corresponds to one of the client IDs assigned to your client workspaces.
At data load, ADD v2 uses the client ID to distinguish what data should be loaded to what workspace. When the value in the x__client_id
column for a data record matches the client ID of some client workspace, this record is loaded to this workspace.
The x__client_id
column must contain a client ID value for each data record. If the x__client_id
column is present but no client ID is specified, ADD v2 fails when the data is being loaded to the corresponding dataset.
In the following picture, the Opportunity source file in the object storage service holds records with information related to multiple workspaces. When the data is loaded to a workspace with a specific client ID, only the records whose value in the x__client_id
column matches the workspace client ID are loaded to the workspace.
Client IDs in Workspaces
You assign the client ID to a workspace when assigning this workspace to a segment (see Set Up Automated Data Distribution v2 for Object Storage Services). Each workspace can have only one client ID assigned. The client ID must be unique within a segment. The client ID can be up to 255 characters long and can contain only numbers, lowercase and uppercase ASCII letters (a-z
, A-Z
), and underscores (_
).
For Life Cycle Management setup (LCM; see Managing Workspaces via Life Cycle Management), LCM acts as an authority, and the client ID is defined within the LCM hierarchy. The client ID within the source files is ignored. If this behavior is unwanted, contact GoodData Support.
Distributing Data to All Client Workspaces
If you want to load data from a source file to all client workspaces, do not add the x__client_id
column to this source file. When the x__client_id
column is not present, ADD v2 does not do any filtering and loads the data to all workspaces.