Using a Custom Time Dimension in a Logical Data Model
CloudConnect is a legacy tool and will be discontinued. We recommend that to prepare your data you use the GoodData data pipeline as described in Data Preparation and Distribution. For data modeling, see Data Modeling in GoodData to learn how to work with Logical Data Modeler.
In some situations, you need to slice your data by timestamps. For example:
- To predict the number of call center personnel required at different times of a day, you may want to see data from a call center sliced by hour and minute or by hour and quarter hour (15-minute intervals).
- To plan the movie schedule at a cinema, you may need to know how many tickets are sold for particular movies at different times of day across a selected range of dates.
To handle such situations, we recommend that you split the date and time into separate attributes. To do so:
- Create a date dimension for the date attribute using either a standard (default) calendar, or a custom fiscal calendar.
- Add a custom time dimension to handle the time of day.
You can use the custom time dimension together with a GoodData date dimension for slicing the data, or you can use it independently if you need to see trends by time-of-day across dates.
We recommend that you not create a timestamp dimension because of the large amount of data required. For example, a day consists of 1,440 minutes or 86,400 seconds. A timestamp to the minute for the years 1900 through 2050 would contain 79.4 million rows. A timestamp to the second would contain 4.8 billion rows. Besides, you cannot use date filters and attributes for day, week, quarter, and year with a timestamp dimension.
A custom time dimension, on the other hand, covers a 24-hour period with one row per desired time interval, such as:
- Hour and minute
- Hour, minute, and second
When you are creating a custom time dimension, consider the following:
- Adding attributes for any custom intervals (for example, quarter hour)
- Adding labels for any custom display formats (for example, 12-hour clock vs. 24-hour clock)
- Adding labels for sorting so that hours, minutes, and seconds are displayed without leading zeros, and so that AM times precede PM times when using a 12-hour clock
Custom Time Dimensions in a Logical Data Model
The following picture shows a sample logical data model (LDM) for collecting data from a call center. The LDM has two custom time dimension datasets: Started Time
and Ended Time
.
The following picture shows the attributes and labels in the Started Time
time dimension dataset:
Load the Data into Your Data Warehouse
Follow these steps to create and load Data Warehouse tables for your custom time dimension:
Create a table (
wh_time_of_day
) in your Data Warehouse instance using theTIME
data type.DROP TABLE IF EXISTS wh_time_of_day; CREATE TABLE wh_time_of_day (time_of_day TIME) SEGMENTED BY HASH(0) ALL NODES ;
Generate 24 hours of data:
Create a temporary table (
tmp_hour
) for hours with 24 rows containing a single column that holds values of ‘00’ through ‘23’.CREATE LOCAL TEMPORARY TABLE tmp_hour (hour_of_day CHAR(2) ) ON COMMIT PRESERVE ROWS; INSERT INTO tmp_hour VALUES('00'); ... INSERT INTO tmp_hour VALUES('23');
Create a temporary table (
tmp_min_sec
) for minutes and seconds with 60 rows containing a single column that holds values of ‘00’ through ‘59’.CREATE LOCAL TEMPORARY TABLE tmp_min_sec (min_sec_of_day CHAR(2) ) ON COMMIT PRESERVE ROWS; INSERT INTO tmp_min_sec VALUES('00'); ... INSERT INTO tmp_min_sec VALUES('59');
Load the data to the Data Warehouse table (
wh_time_of_day
). In this example, you are going to truncate and load thewh_time_of_day
table usingINSERT
with aSELECT
of all rows fromtmp_hour
with a Cartesian join (CROSS JOIN
) totmp_min_sec
for minutes. The Cartesian join will join each hour (00-23) with minutes (00-59). If the time dimension needs to be down to the second, add another Cartesian join that will join each hour and minute to each second (00-59).TRUNCATE TABLE wh_time_of_day; INSERT /*+direct*/ INTO wh_time_of_day SELECT TO_TIMESTAMP(a.hour_of_day || ':' || b.min_sec_of_day || ':' || c.min_sec_of_day, 'HH:MI:SS') FROM tmp_hour a CROSS JOIN tmp_min_sec b CROSS JOIN tmp_min_sec c ORDER BY a.hour_of_day, b.min_sec_of_day, c.min_sec_of_day;
Load the output stage table (or GoodData dataset) from the Data Warehouse table (
wh_time_of_day
), using SQL for formatting the desired attributes and labels. The following are some examples of formatting the various attributes and labels:-- SELECT from wh_time_of_day... -- 24 hour clock time formats TO_CHAR(time_of_day, 'HH:MI:SS') AS time_hhmmss, TO_CHAR(time_of_day, 'HH:MI') AS time_hhmm, TO_CHAR(time_of_day, 'HH') AS time_hh, -- 12 hour clock time in hours CASE WHEN LEFT(TO_CHAR(time_of_day, 'HH12'), 1) = '0' THEN RIGHT(TO_CHAR(time_of_day, 'HH12 AM'), 4) ELSE TO_CHAR(time_of_day, 'HH12 AM') END AS time_hour, EXTRACT(HOUR FROM time_of_day) AS time_hour_sort, -- integer for hour sort
-- 12 hour clock down to minutes CASE WHEN LEFT(TO_CHAR(time_of_day, 'HH12'), 1) = '0' THEN RIGHT(TO_CHAR(time_of_day, 'HH12:MI AM'), 7) ELSE TO_CHAR(time_of_day, 'HH12:MI AM') END AS time_minute, (EXTRACT(HOUR FROM time_of_day) * 100) + EXTRACT(MINUTE FROM time_of_day) AS time_minute_sort, -- integer HHMM for sorting
-- 12 hour clock down to seconds CASE WHEN LEFT(TO_CHAR(time_of_day, 'HH12'), 1) = '0' THEN RIGHT(TO_CHAR(time_of_day, 'HH12:MI:SS AM'), 10) ELSE TO_CHAR(time_of_day, 'HH12:MI:SS AM') END AS time_second, (EXTRACT(HOUR FROM time_of_day) * 10000) + (EXTRACT(MINUTE FROM time_of_day) * 100) + CAST(TO_CHAR(time_of_day, 'SS') AS INTEGER) AS time_second_sort, -- integer HHMMSS for sorting
-- half hour intervals (similar for quarter hour intervals) CASE WHEN EXTRACT(MINUTE FROM time_of_day) < 30 AND LEFT(TO_CHAR(time_of_day, 'HH12'), 1) = '0' THEN RIGHT(TO_CHAR(time_of_day, 'HH12'), 1) || ':00 ' || TO_CHAR(time_of_day, 'AM') WHEN EXTRACT(MINUTE FROM time_of_day) < 30 THEN TO_CHAR(time_of_day, 'HH12') || ':00 ' || TO_CHAR(time_of_day, 'AM') WHEN EXTRACT(MINUTE FROM time_of_day) < 60 AND LEFT(TO_CHAR(time_of_day, 'HH12'), 1) = '0' THEN RIGHT(TO_CHAR(time_of_day, 'HH12'), 1) || ':30 ' || TO_CHAR(time_of_day, 'AM') ELSE TO_CHAR(time_of_day, 'HH12') || ':30 ' || TO_CHAR(time_of_day, 'AM') END AS time_halfhour, -- half hour interval sort (integer formatted as HH00 or HH30) CASE WHEN EXTRACT(MINUTE FROM time_of_day) < 30 THEN (EXTRACT(HOUR FROM time_of_day) * 100) ELSE (EXTRACT(HOUR FROM time_of_day) * 100) + 30 END AS time_halfhour_sort