Chapter 16. REST: Invoking Complex REST APIs

This example demonstrates invocation of a complex REST APIs. The example graph invokes the Twitter REST API and downloads all pages with the JSON payload. We'll focus on the REST API invocation processing only in this chapter.

The example graph loads the time dimension in the first phase. We've explained the time dimension processing in the Chapter 13, Forex Example: Using the Time Dimension chapter.

The most interesting part of the graph starts in the DataGenerator component that sends one record with one request field to the HTTP Connector component.

The Twitter REST API CloudConnect Graph

Figure 16.1. The Twitter REST API CloudConnect Graph


The HTTP Connector specifies the Twitter search API REST endpoint in the Request URL attribute. The REST invocation is also further described by the GET method and a simple Accept=application/json HTTP request headers.

The HTTP Connector Attributes

Figure 16.2. The HTTP Connector Attributes


The Request URL contains the ${next_page} parameter that is used as a placeholder for few HTTP query parameters that the Twitter REST API uses for paging. The values of these parameters are generated in the Request handling functions that are implemented in the CTL2 language. See the Chapter 13, Forex Example: Using the Time Dimension chapter for more details about the CTL language.

					/**
					 * Custom helper function that extracts the next_page HTTP query parameter value from the 
					 * previous Twitter REST API response
					 * lastResponseBody -  body of the previous request
					 *
					 * returns the next_page HTTP query parameter value from the previous Twitter REST API
					 * response
					**/
					function string extractNextPage(string lastResponseBody) {
						if(lastResponseBody != null && lastResponseBody.length() > 0) {
							string[] next_pages = 
								lastResponseBody.json2xml().find("<next_page>.*?</next_page>");
					    	if(next_pages.length() == 1) {
					    		string next_page = next_pages[0];
					    		return 
								  next_page.find(">.*?<")[0].replace("<","").replace(">","");
					    	}
					    	else {
					    		printLog(warn, "No next_page in the response.");
					    		return null;
					    	}    	
					    }
					    else {
					    	return null;
					    }
					}

					/**
					 * Generates request parameters (usually page numbers, offsets, timestamps, 
					 * signature hashes,etc.)
					 * Called before each request.
					 *
					 * Last response is is only defined if iteration number is greater than one. 
					 * Therefore, for the very first request
					 * lastResponseStatus is 200, lastResponseHeaders and lastResponseBody are empty.
					 *
					 * inputEdgeRecord - contains fields of the input edge record
					 * iterationNumber - starts at 1
					 * lastResponseStatus - HTTP status of the previous request
					 * lastResponseHeaders - HTTP headers of the previous request
					 * lastResponseBody -  body of the previous request
					 *
					 * returns a map of params that can be used in the request URL
					**/
					function map[string, string] generateRequestParameters(map[string, string] inputEdgeRecord,
					  integer iterationNumber, integer lastResponseStatus, 
					  map[string, string] lastResponseHeaders, 
					  string lastResponseBody) {

					    // Copy all input parameters into the request parameters map.
					    map[string, string] requestParams = inputEdgeRecord;
						string next_page = extractNextPage(lastResponseBody);
						if(next_page == null || next_page.length() <= 0) {
							next_page = "?q=gooddata";
						}
					    requestParams["next_page"] = next_page;
					    return requestParams;
					}

					/**
					 * Determines the outcome of the response. 
					 * Used for controlling the paging workflow and detecting errors.
					 * Called after each request response.
					 *
					 * responseStatus - response HTTP status
					 * responseHeaders - response HTTP headers
					 * responseBody -  response body
					 *
					 * returns
					 *  CONTINUE - continue to next iteration (e.g., next page)
					 *  DONE_NO_OUTPUT - last iteration finished, no data will be sent to the output port 
					 *  for the last iteration (no data received from the last iteration)
					 *  DONE_WITH_OUTPUT - last iteration finished, data will be sent to the output for 
					 *  the last iteration (data received from the last iteration)
					 *  RETRY - retry the last failed request
					 *  FATAL_ERROR - fatal error, aborts the HTTP connector run
					**/
					function string checkResponse(integer responseStatus, map[string, string] responseHeaders, 
						string responseBody) {

						string next_page = extractNextPage(responseBody);
						if(next_page == null) {
							return "DONE_WITH_OUTPUT";
						}
					    if (responseStatus >= 200  &&  responseStatus < 300) {
							return "CONTINUE";
					    }
					    else if (responseStatus >= 400 && responseStatus < 500 ) {
					    	// HTTP status "404 - NOT FOUND" could mean there are no more pages
					    	return "DONE_NO_OUTPUT";
					    }
					    else if (responseStatus >= 500) {
					    	// Internal server errors could be temporary 
					        // (this sends the last response to the error output port)
							return "RETRY";
						}
					    else {
					    	// Otherwise abort the HTTP connector run 
					        // (this sends the last response to the error output port)
							return "FATAL_ERROR";
						}
					}

					/**
					 * Updates the request params before each request retry attempt if it failed previously.
					 * Useful for resetting authorization parameter (signatures, tokens, etc.), 
					 * updating timestamp, etc.
					 *
					 * Optional. When not defined, the request stays the same.
					 *
					 * failedRequestParams original parameters of request which failed and should be retried
					 * retryNumber number of current retry, "1" for the first retry
					 * lastResponseStatus - HTTP status of the failed request
					 * lastResponseHeaders - HTTP headers of the failed request
					 * lastResponseBody -  body of the failed request
					 *
					 * returns map of the modified params for the retry request
					**/
					function map[string, string] modifyRequestParamsBeforeRetryAttempt(
						map[string, string] failedRequestParams, integer retryNumber,
					    integer responseStatus, map[string, string] responseHeaders, 
						string responseBody) {

					    // Copy all the previous parameters into the retry request parameters map.
					    map[string, string] modifiedRequestParams = failedRequestParams;

					    /*** Modify the params of the request ***/

					    // Example of timestamp modification
					    // modifiedRequestParams["TIMESTAMP"] = toString(date2long(today()));

					    return modifiedRequestParams;
					}
				

Please note the json2xml function that converts the JSON payload of the Twitter search API response to XML in the Reformat component. The resulting XML file is then processed by the XMLExtract component.