Data Warehouse Logical and Physical Model
Schema design elements such as tables and views are considered a database’s logical database model. These objects provide information about available data elements.
However, they do not define how the data is actually stored on the disk or how they are distributed across the nodes within an Data Warehouse cluster. Those structures are part of the physical data model.
The structures that define how table columns are organized on the disk and how the data are distributed in the cluster are called projections:
The following parameters of a physical data representation can be configured using a projection:
- Columns to be included and column encoding (run-length encoding, delta encoding, and so on)
- Sequence of the table columns
- Segmentation: The rows to keep on the same node and the projections to be replicated instead of split
A single table can have multiple projections to support different query types. A projection does not have to necessarily include all table columns. For example, in a large transactional table, you may want to have a projection sorted by time stamp to support queries over the most recent data. You may also need a projection sorted by customer to support a customer segmentation query, which could be expensive. The table would have two additional projections to support these use cases.
A projection is similar to a materialized view in traditional database. Like a materialized view, a projection stores result sets on disk, instead of re-computing them with each query. As data is added or updated, these results are automatically refreshed. For more information on projections, see Physical Schema.
Data Warehouse users create SQL queries against the logical model. The underlying engine automatically selects the appropriate projections. As a feature of Vertica, Data Warehouse databases lack indexes. In place of indexes, you use optimized projections to optimize queries.
Each logical table requires a physical model. This model can be described by a projection for the table that includes all table columns. A projection with all table columns is called a superprojection.
Every table containing data must have at least one superprojection.
A custom superprojection can be manually created before inserting data.
If a superprojection is not defined for the table, it is created automatically when the first row is inserted into the table.
When building your Data Warehouse database, you can start with one superprojection for each table. You can consider adding additional projections from Vertica to improve performance of slow queries.
- Additional projections can be defined using the CREATE PROJECTION command.
- In addition to specifying the columns in the projection, you can specify the compression, sort order, and the distribution of data across the nodes of the cluster (segmentation) for the projection.
For more information about designing and optimizing the data model:
- See Database Schema Design.
- See Performance Tips.
- For additional details, see the Physical Schema section of the Vertica documentation.