You are viewing our older product's guide. Click here for the documentation of GoodData Cloud, our latest and most advanced product.

Google Sheets Downloader

Google Sheets Downloader is a component of the data preparation and distribution pipeline (see Data Preparation and Distribution Pipeline). Google Sheets Downloader downloads data from a Google Sheets spreadsheet. You can either specify the source spreadsheet directly by its ID or point to the Google Drive folder where the spreadsheet is stored.

Source Spreadsheet Requirements

Make sure that your source Google Sheets spreadsheet meets the following requirements:

The source spreadsheet has one or more sheets with data.
In each sheet:
- The first row is a header.
- Each column with data has a name.
- The names of the columns contain only ASCII characters, preferably in lowercase and separated by underscores (not spaces).

Configuration File

Creating and setting up the brick’s configuration file is the third phase of building your data pipeline. For more information, see Build Your Data Pipeline.

For more information about configuring a brick, see Configure a Brick.

Minimum Layout of the Configuration File

The following JSON sample is the minimum layout of the configuration file that Google Sheets Downloader can use. If you are using more bricks, you can reuse some sections of this configuration file for other bricks. For example, the ‘downloaders’ section can list more downloaders, and the ‘integrators’ and ‘ads_storage’ sections can be reused by other downloaders and executors (see Configure a Brick).

This sample describes the use case of one instance of Google Sheets Downloader that downloads two sheets from a Google Sheets spreadsheet.

Copy the sample and replace the placeholders with your values.

You can later enrich the sample with more parameters depending on your business and technical requirements (see Customize the Configuration File).

{
  "entities": {
    "table_1_name": {
      "global": {
        "custom": {
          "hub": ["column_1_name","column_2_name","column_3_name"]
        }
      }
    },
    "table_2_name": {
      "global": {
        "custom": {
          "hub": ["column_1_name","column_2_name","column_3_name"]
        }
      }
    }
  },
  "downloaders": {
    "gsheets_downloader_id": {
      "type": "gs",
      "entities": ["table_1_name","table_2_name"]
    }
  },
  "gs": {
    "client_id": "google_client_id",
    "spreadsheet_id": "spreadsheet_id"
  },
  "integrators": {
    "ads_integrator_1": {
      "type": "ads_storage",
      "entities": ["table_1_name","table_2_name"]
    }
  },
  "ads_storage": {
    "instance_id": "data_warehouse_instance_id",
    "username": "dw_email@address.com",
    "options": {}
  }
}

The placeholders to replace with your values:

Name	Type	Mandatory?	Default	Description
entities -> *table_x_name*	string	yes	n/a	The database table into which Google Sheets Downloader loads the data One table corresponds to one sheet in the source spreadsheet. Specify as many tables for as many sheets with data that are in the source spreadsheet. For example, if the source spreadsheet contains five sheets, specify five tables under ‘entities’. The name of the table must be the same as the name of the sheet from which the data will be downloaded. For example, if you have a sheet named 'company', name the table 'company': `"entities": { "company": { "global": { ... } } }`
entities -> table_x_name -> global -> custom -> hub	array	yes	n/a	The table columns to generate a primary key The table column names must correspond to the column names in the sheet from which the data will be downloaded. Example: `["id"]`
downloaders -> gsheets_downloader_id	string	yes	n/a	The ID of the Google Sheets Downloader instance Example: `gsheets_downloader_1`
downloaders -> gsheets_downloader_id -> entities	array	yes	n/a	The database table into which Google Sheets Downloader loads the data Must be the same as the database table names in 'entities -> table_x_name'
gs -> client_id	string	yes	n/a	Your Google client ID Example: `9414861317621.apps.googleusercontent.com`
gs -> spreadsheet_id	string	yes	n/a	The ID of the Google Sheets spreadsheet to download data from You can find the spreadsheet ID in the URL of the spreadsheet: `https://docs.google.com/spreadsheets/d/{spreadsheet_id}`
integrators -> *ads_integrator_id*	string	yes	n/a	The ID of the ADS Integrator instance Example: `ads_integrator_1`
integrators -> ads_integrator_id -> entities	array	yes	n/a	The names of the entities that this ADS Integrator instance processes Must be the same as the database table names in 'entities -> table_x_name'
ads_storage -> instance_id	string	yes	n/a	The ID of the Agile Data Warehousing Service (ADS) instance into which the data is uploaded
ads_storage -> username	string	yes	n/a	The access username to the Agile Data Warehousing Service (ADS) instance into which the data is uploaded

Customize the Configuration File

You can change default values of the parameters and set up additional parameters.

Describe the structure of the source Google Sheets spreadsheet

You can set the parameters describing the structure of the source Google Sheets spreadsheet.

"gs": {
  "client_id": "google_client_id",
  "spreadsheet_id": "spreadsheet_id",
  "options": {
    "files_structure": {
      "escape_as": "escape_character", // optional
      "db_parser": "gdc|{no_value}", // optional
      "enclosed_by": "quote_character" // optional
    }
  }
}

Name Type Mandatory? Default Description

escape_as

character

n/a

Name	Type	Mandatory?	Default	Description
escape_as	character	no	n/a	An escape character You can define an escape character using any ASCII value in the range from E'\001' to E'\177' inclusive (any ASCII character except `NULL: E'\000'`) Example: `\"`
db_parser	string	no	gdc	The database parser If not set or set to 'gdc', the default GoodData parser (GdcCsvParser) is used. If set to an empty value (`"db_parser": ""`), the default Vertica parser is used.
enclosed_by	string	no	n/a	The quote character within which to enclose data. This allows delimiter characters to be embedded in string values. You can choose any ASCII value in the range E'\001' to E'\177' inclusive (any ASCII character except `NULL: E'\000'`). Example: `\"`

An escape character You can define an escape character using any ASCII value in the range from E'\001' to E'\177' inclusive (any ASCII character except NULL: E'\000')

Example: \"

db_parser

string

gdc

The database parser

If not set or set to 'gdc', the default GoodData parser (GdcCsvParser) is used.
If set to an empty value ("db_parser": ""), the default Vertica parser is used.

enclosed_by

string

n/a

The quote character within which to enclose data. This allows delimiter characters to be embedded in string values. You can choose any ASCII value in the range E'\001' to E'\177' inclusive (any ASCII character except NULL: E'\000').

Example: \"

Specify additional load options for the entities

For each entity, you can set up additional load options.

"entities": {
  "table_1_name": {
    "global": {
      "custom": {
        "hub": ["column_1_name","column_2_name","column_3_name"],
        "skip_rows": number_of_rows, // optional
        "db_parser": "gdc|{no_value}" // optional
      }
    }
  }
}

Name Type Mandatory? Default Description

skip_rows

integer

Name	Type	Mandatory?	Default	Description
skip_rows	integer	no	0	The number of rows from the top of a Google Sheets spreadsheet that Google Sheets Downloader will skip
db_parser	string	no	gdc	The database parser If not set or set to 'gdc', the default GoodData parser (GdcCsvParser) is used. If set to an empty value (`"db_parser": ""`), the default Vertica parser is used.

The number of rows from the top of a Google Sheets spreadsheet that Google Sheets Downloader will skip

db_parser

string

gdc

The database parser

If not set or set to 'gdc', the default GoodData parser (GdcCsvParser) is used.
If set to an empty value ("db_parser": ""), the default Vertica parser is used.

Add multiple instances of Google Sheets Downloader

When you use only one instance of Google Sheets Downloader, it reads its remote location parameters from the ‘gs’ section.

If you use multiple instances of Google Sheets Downloader and each instance has a separate location to download data from, add several ‘gs’ sections and give each section a unique name (for example, ‘gs_users’ or ‘gs_sales’).

"gs_users": {
  "client_id": "google_client_1_id"
},
"gs_sales": {
  "client_id": "google_client_2_id"
}

For each instance of Google Sheets Downloader in the ‘downloaders’ section, add the ‘settings’ parameter and set it to the name of the section that contains settings for this Google Sheets Downloader instance.

"downloaders": {
  "gsheets_downloader_1_id": {
    "type": "gs",
    "settings": "gs_users",
    "entities": ["table_1_name","table_2_name"]
  },
  "gsheets_downloader_2_id": {
    "type": "gs",
    "settings": "gs_sales",
    "entities": ["table_1_name","table_2_name"]
  }
}

Point to a folder containing the source Google Sheets spreadsheet

By default, Google Sheets Downloader looks for a specific source Google Sheets spreadsheet to download data from (see the ‘spreadsheet_id’ parameter in Minimum Layout of the Configuration File). If the source spreadsheet does not have a constant ID (for example, the spreadsheet is re-generated from time to time and its ID changes), point to a folder where the source spreadsheet is stored instead of specifying the spreadsheet ID. This way, you do not rely on the spreadsheet ID but rather on its location.

Make sure that your client ID has access to the Google Drive folder and that the API is enabled.

"gs": {
  "client_id": "google_client_id",
  "folder": "google_drive_folder_id"
},

Name Type Mandatory? Default Description

folder

string

n/a

Name	Type	Mandatory?	Default	Description
folder	string	no	n/a	The ID of the Google Drive folder containing the source Google Sheets spreadsheet to download data from You can find the folder ID in the URL of the folder: `https://drive.google.com/drive/folders/{folder_id}` NOTE: If you use both 'folder' and 'spreadsheet_id' parameters at the same time, the 'folder' parameter is used, and the 'spreadsheet_id' parameter is ignored.

The ID of the Google Drive folder containing the source Google Sheets spreadsheet to download data from

You can find the folder ID in the URL of the folder: https://drive.google.com/drive/folders/{folder_id}

NOTE: If you use both 'folder' and 'spreadsheet_id' parameters at the same time, the 'folder' parameter is used, and the 'spreadsheet_id' parameter is ignored.

A typical use case is downloading data from a single source spreadsheet stored in the specified folder. If you want to download data from multiple source spreadsheets stored in the folder, make sure that all those spreadsheets have the same structure: each source spreadsheet must contain all the sheets that are specified as tables under the ‘entities’ section of the configuration file (see Minimum Layout of the Configuration File). If any source spreadsheet is missing a sheet that is specified in the configuration file, Google Sheets Downloader fails.

When multiple source spreadsheets are processed, the data will be uploaded to Big Data Storage as separate files, one file per source spreadsheet.

Add optional parameters

Depending on your business and technical requirements, you can set one or more optional parameters affecting the data load process.

"gs": {
  "client_id": "google_client_id",
  "spreadsheet_id": "spreadsheet_id",
  "full": true|false, // optional
  "value_render_option": "format" // optional
}

Name	Type	Mandatory?	Default	Description
full	Boolean	no	false	Specifies load mode used for loading data to the database If not set or set to 'false', incremental load is used. If set to 'true', full load is used.
value_render_option	string	no	FORMATTED_VALUE	The format in which the API returns the cell values For available options, see https://developers.google.com/sheets/api/reference/rest/v4/ValueRenderOption.

full

Boolean

false

Specifies load mode used for loading data to the database

If not set or set to 'false', incremental load is used.
If set to 'true', full load is used.

value_render_option

string

FORMATTED_VALUE

The format in which the API returns the cell values

For available options, see https://developers.google.com/sheets/api/reference/rest/v4/ValueRenderOption.

Customize integrators

The ‘integrators’ and ‘ads_storage’ sections define the behavior of integrators.

"integrators": {
  "ads_integrator_1": {
    "type": "ads_storage",
    "entities": ["table_1_name","table_2_name"]
  }
},
"ads_storage": {
  "instance_id": "data_warehouse_instance_id",
  "username": "dw_email@address.com",
  "options": {}
}

If you want to add more parameters or change the defaults, see ADS Integrator.

Schedule Parameters

When scheduling Google Sheets Downloader (see Phases of Building the Data Pipeline -> Production Implementation), provide the parameters from this section in the schedule.

General Parameters

Some parameters must be entered as secure parameters (see Configure Schedule Parameters).

Some parameters contain the names of the nesting parameters, which specifies where they belong in the configuration hierarchy. For example, a parameter entered as gs|options|password is resolved as the following:

"gs": {
  "options": {
    "password": ""
  }
}

Enter such parameters as is (see Schedule Example).

Name	Typ	Mandatory?	Secure?	Default	Description
ID	string	yes	no	n/a	The ID of the Google Sheets Downloader instance being scheduled Must be the same as the value of the 'downloaders -> gsheets_downloader_id' parameter in the configuration file (see Minimum Layout of the Configuration File) Example: `gsheets_downloader_1`
ads_storage\|password	string	yes	yes	n/a	The password to access the ADS instance to upload the data to
gs\|client_secret	string	yes	yes	n/a	The Google client secret
gs\|refresh_token	string	yes	yes	n/a	The OAuth refresh token For information about using OAuth 2.0, see https://developers.google.com/identity/protocols/oauth2.

Schedule Example

The following is an example of how you can specify schedule parameters:

Google Analytics Downloader

Salesforce Downloader