Data Preparation and Data Evaluation Functions

Data Warehouse supports preprocessing and postprocessing functions to simplify data transformations for Machine Learning (see Transform Data for Batch Predictions and Recommendations).

The following Vertica functions are available for use in Data Warehouse:

  • Data preparation functions

    • APPLY_NORMALIZE - Applies the normalization parameters saved in a model to a set of specified columns in the input table or view.
    • BALANCE - Balances the data.
    • DETECT_OUTLIERS - Removes the outliers from the data.
    • IMPUTE - Imputes missing values with either the mean or the mode, based on observed values for a variable.
    • NORMALIZE - Runs a normalization algorithm on an input table or view.
    • NORMALIZE_FIT - Computes normalization parameters for specific columns in an input table.
    • REVERSE_NORMALIZE - Reverses the normalization transformation.
  • Data evaluation functions 

    • CONFUSION_MATRIX - Returns a confusion matrix based on both predicted and observed values.
    • ERROR_RATE - Returns a table that calculates the rate of incorrect classifications.
    • LIFT_TABLE - Returns a table that compares the predictive quality of a binary classifier model.
    • MSE - Returns a table that displays the mean squared error.
    • ROC - Returns a table that displays the points on a receiver operating characteristic curve.
    • RSQUARED - Returns a table with the R-squared value of the predictions in a linear regression model.

For more information about these functions, see Vertica Machine Learning Functions.