Choose Projection Sorting Criteria
When developing your database projections, start with the default projection, which is a superprojection containing all fields in the table. Soon, however, you may discover that you need to optimize projections for performance or to create custom projections to address specific query use cases.
In a production implementation, it is not necessary to optimize your physical model for each query that you intend to run. Instead, start slowly and add or modify projections to address specific performance issues or requirements of your database.
The sorting criteria that you use in your projections depend on the uses for those projections. Specify your sorting criteria based on requirements for speedy retrieval, memory footprint, and join use cases for the projection.
Example
Suppose you have the following three queries on two tables, table and table2:
SELECT a, b, c FROM table WHERE c = 'xxxx' ORDER BY b
SELECT b, c, SUM(a) FROM table GROUP BY b, c
SELECT t.a, t.b, t2.x FROM table t JOIN table2 t2 ON t.a = t2.y
To build a physical model that is fully optimized for these three queries, create the projections as outlined in the following picture:
- projection1 is sorted by c and then by b. This sorting enables quick location of the filtered value for c and returns the result sorted by b.
- projection2 is sorted by columns used in the GROUP BY clause. This sorting helps to minimize the memory footprint of the GROUP BY query, as the database can retain only group-specific data in memory, instead of maintaining a hash table containing all groups.
- If table and table2 are large, projection3 and projection4 enable a merge join of presorted columns, instead of loading the smaller table into memory and performing a hash join. Note that the columns C and Z are not present in the projections, as they are not used by the query.
When building your Data Warehouse database, you are not required to create optimized query-specific projections from scratch. Start with one superprojection for each table and consider adding additional projections from Vertica to improve performance of slow queries.