You are viewing our older product's guide. Click here for the documentation of GoodData Cloud, our latest and most advanced product.
Statistics Collection
CloudConnect is a legacy tool and will be discontinued. We recommend that to prepare your data you use the GoodData data pipeline as described in Data Preparation and Distribution. For data modeling, see Data Modeling in GoodData to learn how to work with Logical Data Modeler.
The query optimizer uses statistics about data in your projections to build the optimal query plan.
- Keep the statistics up-to-date by running the SELECT ANALYZE_STATISTICS SQL command. The ANALYZE_STATISTICS function returns 0 when it completes successfully.
- The ANALYZE_STATISTICS command works on a 10% sample of the specified disk data. You can override the sample size by using the ANALYZE_HISTOGRAM function instead.
- Both ANALYZE_STATISTICS and ANALYZE_HISTOGRAM functions auto-commit the current transaction.
Examples:
-- collect statistics for a single table
SELECT ANALYZE_STATISTICS('table');
-- collect statistics for a specific column
SELECT ANALYZE_STATISTICS('table.column');
-- collect table statistics based on a 0.5% sample
SELECT ANALYZE_HISTOGRAM('table', 0.5);
-- collect ble statistics based on all data
SELECT ANALYZE_HISTOGRAM('table', 100);
Collecting statistics is a CPU- and memory-intensive operation. Run ANALYZE_STATISTICS only after the data in your projections changes significantly.
Significant changes include the following:
- First data load into a table.
- Data introduces a significant deviation in the data distribution.
- A new projection is created and refreshed.
- The number of rows or minimum/maximum values in table’s columns change by 50%.
- New primary key values are added to tables with referential integrity constraints (In this case, both parent and child tables should be re-analyzed).
- The relative table size, compared to tables to which it is being joined, changes materially. For example, if a table becomes only five times larger than the other, when it was previously 50 times larger, statistics should be analyzed.