Before You Start with Data Preparation and Distribution Pipeline
Before you start building your data pipeline, review the following requirements and make sure that you have everything in place.
Services and Infrastructure
- An Amazon S3 bucket that acts as Big Data Storage (BDS; see “Big Data Storage” in Data Preparation and Distribution Pipeline) and where the bricks' configuration file will be stored. You can use one S3 bucket for both storing your configuration file and acting as BDS. For more information about how to work with S3 buckets, see Amazon documentation on S3 buckets.
- An instance at Agile Data Warehousing Service (ADS) where the downloaded source data will be stored. For more information about how to work with ADS, see How to Work with Data Warehouse Instances.
Skills and Knowledge
- Experience with logical data modeling (see Data Modeling in GoodData)
- Being familiar with the GoodData data pipeline components (see Data Preparation and Distribution Pipeline)
- Knowing your analytical use cases Though you technically can build a data pipeline without understanding the use cases behind it, knowing the use cases significantly helps you build an efficient data pipeline, save time, and avoid situations when some data that has been uploaded to workspaces is redundant while some data is missing. Here are some questions that may help you:
- What is the lowest scale of data granularity in insights you are looking for?
- Do you want to upload all the data that you have or just a subset of it?
- How often do you want to upload new data to workspaces? Do you want to load data in full load or incremental load? What is the expected data retention?