Uploading your data to Good Data can be a bit tricky, so we’ve prepared this topic to help you overcome the most common reasons for upload failure.
General Requirements and Tips
File identification in the ETL is based on the filename, so consistency is imperative. Any alterations to consistency will disrupt the ETL.
Avoid putting dynamic fields or directory names inside of a zip file. The zip file may have a dynamic name, such as the filename being the date of the upload.
In ETL, transformations rely on fields being provided in a consistent columnar order. Column order must be preserved, and all fields must be present.
- Each data entry across a row should be formatted consistently. Files may contain an optional header line that displays column names. If provided, the header must appear in the first line of the file and be identical in format to the other lines. For consistency, column headers should remain the same across uploads.
- Each record, including the header, must contain the same number of fields. Commas, semicolons, pipes or other standard delimiters must be used to separate all fields.
- There should be no delimiter after the last field in each record. The end of each record must be marked by a line break.
- Double quotes are required around any field containing line breaks, double quotes, or the delimiting character. All double quotes that exist within a field must be escaped by a preceding double quote.
The ETL depends on consistent character set formatting. All files should be UTF-8 compliant. If it is not possible, files must always be provided in the same alternative format (i.e. ISO-8859-1).
- Referential integrity depends on consistent key usage. Verify key structure.
- Monitor for null fields and if necessary, verify data set joins.
The date formatting must be consistent throughout the data. GoodData supports a variety of data formats, which can be submitted in CSV format.
For the CSV Uploader,
yyyy-MM-dd is currently the only accepted format for the ETL. Each date must be converted to the
yyyy-MM-dd format before being processed. Time entries appended to the
yyyy-MM-dd format may be submitted to the CSV Uploader in the following form:
Tips for Uploading CSVs Using CloudConnect
If you are using the CSV Reader component to read in a CSV file into your CloudConnect process, some additional configuration is required.
In CloudConnect, you must configure the fields and their format that are passed between components of the ETL graph. In this case, you must define the metadata between the CSV Reader component and the next component in the ETL graph.
To create metadata, create a connection between the output port of the CSV Reader to the input port of the next component in the graph. Then, secondary-click the connecting line and select Create metadata or Edit metadata.
In the image below, you can see the Metadata Editor configuration settings for delimiters in the file for end-of-line and end-of-file.
Wherever possible, use a tool or a library to generate your CSV files. Avoid generating them manually.
All strings should be wrapped in quote marks (“), as in the following example:
"This is a string."
- Remove the thousand separator commas from all numbers (use the plain #######.## format)
- Don’t use the scientific number format (e.g. 12E3)
- Remove all currency or percent characters from your numbers
- For date values, verify that no date is out of range (before 1900 or after 2050)
- Remove all empty trailing rows from your CSV file
- Verify that there are no End of File (EOF) delimiters in your CSV. Open the file and check that there are no delimiters at the end of the last line, including tabs or newlines.
- Bad quotes - a non-matching set of quotes in the file can cause the upload to fail
- Save your CSVs as “Windows CSVs” on Mac (Macs may use an unsupported end of line character)
- Don’t use the Finder’s “Compress File” on Mac to create a ZIP file for upload as it adds some hidden files to the archive. Use command line utilities on a Mac or avoid compression
Using a double quote (“”) symbol to escape a quotation mark (“) inside a given field
This situation occurs when a value inside a field contains quotation marks, which are meant to be entered into the application, such as “Large” Fountain Drink.
Attempting to escape the quotation marks ("Large" Fountain Drink) causes the parser to fail. To properly insert this value with the quotes, use double quotation marks instead (“”Large”” Fountain Drink).
Redundant spaces in a CSV file
Make sure to remove extra spaces between values in your CSV file. Commas used to separate values shouldn’t be followed by an additional space. For example:
Doesn’t work (Note the extra space):
Thanks to your feedback, we have identified these trouble spots, and we are working hard to remove them from the application. Thanks for your patience and support.