Segmentation

In a typical Data Warehouse instance, data can be distributed across three or more nodes of a shared or dedicated cluster. By default, Data Warehouse and the underlying Vertica database can manage automatically this distribution. However, in a high-performance database, developers may require better control over how data is distributed.

In a Vertica database, segmentation controls how data from a table can be distributed across nodes of a cluster. When a table or projection is created, you can specify segmentation parameters to define how data is distributed.

If segmentation is not specified explicitly, data is segmented by a hash of the projection columns; columns with fewer than 8 bytes are listed first, followed by larger columns up to the first 32 columns of the table.

You can specify a custom segmentation using SEGMENTED BY or UNSEGMENTED clauses of the CREATE PROJECTION or CREATE TABLE commands.

The following segmentation options are available:

  • Hash segmentation Using SEGMENTED BY expression ALL NODES clause after CREATE PROJECTION or CREATE TABLE command The expression is expected to return an integer x for each row in the range: 0 <= x < 2^63^. Compute this expression using the HASH function on one or more columns.

  • Replication Using the UNSEGMENTED ALL NODES clause after CREATE PROJECTION or CREATE TABLE command This option is recommended for small tables with no more than a few million rows that are joined with large ones

For more information about tuning the segmentation of your physical design, see Minimize Network Joins.