Aspell Lookup Table

Commercial Lookup Table

This lookup table is commercial and can only be used with the commercial license of CloudConnect Designer.

All data records stored in this lookup table are kept in memory. For this reason, to store all data records from the lookup table, sufficient memory must be available. If data records are loaded to aspell lookup table from a data file, the size of available memory should be approximately at least 7 times bigger than that of the data file. However, this multiplier is different for different types of data records stored in the data file.

If you are working with data records that are similar but not fully identical, you should use this type of lookup table. For example, you can use Aspell lookup table for addresses.

In the Aspell lookup table wizard, you set up the required properties. You must give a Name to the lookup table, select the corresponding Metadata, select the Lookup key field that should be used to look up data records from the table (must be of string data type).

You can also specify the Data file URL where the data records of the lookup table will be stored and the charset of data file (Data file charset) The default charset is ISO-8859-1.

You can set the threshold that should be used by the lookup table (Spelling threshold). It must be higher than 0. The higher the threshold, the more tolerant is the component to spelling errors. Its default value is 230. It is the edit_distance value from the query to the results. Words with this value higher that the specified limit are not included in the results.

You can also change the default costs of individual operations (Edit costs):

You need to decide whether the letters with diacritical marks are considered identical with those without these marks. To do that, you need to set the value of Remove diacritical marks attribute. If you want diacritical marks to be removed before computing the edit_distance value, you need to set this value to true. This way, letters with diacritical marks are considered equal to their latin equivalents. (Default value is false. By default, letters with diacritical marks are considered different from those without.)

If you want best guesses to be included in the results, set the Include best guesses to true. Default value is false. Best guesses are the words whose edit_distance value is higher than the Spelling threshold, for which there is no other better counterpart.

At the end, you only need to click OK and then Finish.

Aspell Lookup Table Wizard

Figure 35.15. Aspell Lookup Table Wizard


[Important]Important

If you want to know what is the distance between lookup table and edge values, you must add another field of numeric type to lookup table metadata. Set this field to Autofilling (default_value).

Select this field in the Edit distance field combo.

When you are using Aspell lookup table in LookupJoin, you can map this lookup table field to corresponding field on the output port 0.

This way, values that will be stored in the specified Edit distance field of lookup table will be sent to the output to another specified field.