Statistics Collection

The query optimizer uses statistics about data in your projections to build the optimal query plan.

  • Keep the statistics up-to-date by running the SELECT ANALYZE_STATISTICS SQL command. The ANALYZE_STATISTICS function returns 0 when it completes successfully.
  • The ANALYZE_STATISTICS command works on a 10% sample of the specified disk data. You can override the sample size by using the ANALYZE_HISTOGRAM function instead.
  • Both ANALYZE_STATISTICS and ANALYZE_HISTOGRAM functions auto-commit the current transaction.

Examples:

-- collect statistics for a single table
SELECT ANALYZE_STATISTICS('table');
-- collect statistics for a specific column
SELECT ANALYZE_STATISTICS('table.column');
-- collect table statistics based on a 0.5% sample
SELECT ANALYZE_HISTOGRAM('table', 0.5);
-- collect ble statistics based on all data
SELECT ANALYZE_HISTOGRAM('table', 100);

Significant changes include the following:

  • First data load into a table.
  • Data introduces a significant deviation in the data distribution.
  • A new projection is created and refreshed.
  • The number of rows or minimum/maximum values in table’s columns change by 50%.
  • New primary key values are added to tables with referential integrity constraints (In this case, both parent and child tables should be re-analyzed).
  • The relative table size, compared to tables to which it is being joined, changes materially. For example, if a table becomes only five times larger than the other, when it was previously 50 times larger, statistics should be analyzed.