ApproximativeJoin

We assume that you have already learned what is described in:

If you want to find the right Joiner for your purposes, see Joiners Comparison.

Short summary

ApproximativeJoin merges sorted data from two data sources on a common matching key. Afterwards, it distributes records to the output based on a user-specified Conformity limit.

Component Same input metadata Sorted inputs Slave inputs Outputs Output for drivers without slave Output for slaves without driver Joining based on equality
ApproximativeJoinnoyes12-4yesyesyes

Abstract

ApproximativeJoin is a fuzzy joiner that is usually used in quite special situations. It requires the input be sorted and is very fast as it processes data in the memory. However, it should be avoided in case of large inputs as its memory requirements may be proportional to the size of the input.

The data attached to the first input port is called master as in the other Joiners. The second input port is called slave.

Unlike other joiners, this component uses two keys for joining. First of all, the records are matched in a standard way using Matching Key. Each pair of these matched records is then reviewed again and the conformity (similarity) of these two records is computed using Join key and a user-defined algorithm. The conformity level is then compared to Conformity limit and each record is sent either to the first (greater conformity) or to the second output port (smaller conformity). The rest of the records is sent to the third and fourth output port.

Icon

Ports

ApproximativeJoin receives data through two input ports, each of which may have a different metadata structure.

The conformity is then computed for matched data records. The records with greater conformity are sent to the first output port. Those with smaller conformity are sent to the second output port. The third output port can optionally be used to capture unmatched master records. The fourth output port can optionally be used to capture unmatched slave records.

Port typeNumberRequiredDescriptionMetadata
Input0yesMaster input portAny
1yesSlave input portAny
Output0yesOutput port for the joined data with greater conformityAny, optionally including additional fields: _total_conformity_ and _keyName_conformity_. See Additional fields.
1yesOutput port for the joined data with smaller conformityAny, optionally including additional fields: _total_conformity_ and _keyName_conformity_. See Additional fields.
2noOptional output port for master data records without slave matchesInput 0
3noOptional output port for slave data records without master matchesInput 1

ApproximativeJoin Attributes

AttributeReqDescriptionPossible values
Basic
Join keyyesKey according to which the incoming data flows with the same value of Matching key are compared and distributed between the first and the second output port. Depending on the specified Conformity limit. See Join key. 
Matching keyyesThis key serves to match master and slave records. 
Transform1)Transformation in CTL or Java defined in the graph for records with greater conformity. 
Transform URL1)External file defining the transformation in CTL or Java for records with greater conformity. 
Transform class1)External transformation class for records with greater conformity. 
Transform for suspicious2)Transformation in CTL or Java defined in the graph for records with smaller conformity. 
Transform URL for suspicious2)External file defining the transformation in CTL or Java for records with smaller conformity. 
Transform class for suspicious2)External transformation class for records with smaller conformity. 
Conformity limit (0,1) This attribute defines the limit of conformity for pairs of records. To the records with conformity higher than this value the transformation is applied, to those with conformity less than this value, the transformation for suspicious is applied.0.75 (default) | between 0 and 1
Advanced
Transform source charset Encoding of external file defining the transformation.ISO-8859-1 (default)
Deprecated
Locale Locale to be used when internationalization is used. 
Case sensitive If set to true, upper and lower cases of characters are considered different. By default, they are processed as if they were equal to each other.false (default) | true
Error actions Definition of the action that should be performed when the specified transformation returns some Error code. See Return Values of Transformations. 
Error log URL of the file to which error messages for specified Error actions should be written. If not set, they are written to Console. 
Slave override key In older versions of CloudConnect, slave part of Join key. Join key was defined as the sequence of individual expressions consisting of master field names each of them was followed by parentheses containing the 6 parameters mentioned below. These individual expressions were separated by semicolon. The Slave override key was a sequence of slave counterparts of the master Join key fields. Thus, in the case mentioned above, Slave override key would be fname;lname, whereas Join key would be first_name(3 0.8 true false false false);last_name(4 0.2 true false false false). 
Slave override matching key In older versions of CloudConnect, slave part of Matching key. Matching key was defined as a master field name. Slave override matching key was its slave counterpart. Thus, in the case mentioned above ($masterField=$slaveField), Slave override matching key would be this slaveField only. And Matching key would be this masterField. 

Legend:

1) One of these must be set. These transformation attributes must be specified. Any of these transformation attributes must use a common CTL template for Joiners or implement a RecordTransform interface.

2) One of these must be set. These transformation attributes must be specified. Any of these transformation attributes must use a common CTL template for Joiners or implement a RecordTransform interface.

See CTL Scripting Specifics or Java Interfaces for more information.

See also Defining Transformations for detailed information about transformations.

Advanced Description

CTL Scripting Specifics

When you define your join attributes you must specify a transformation that maps fields from input data sources to the output. This can be done using the Transformations tab of the Transform Editor. However, you may find that you are unable to specify more advanced transformations using this easist approach. This is when you need to use CTL scripting.

For detailed information about CloudConnect Transformation Language see Part XI, CTL - CloudConnect Transformation Language. (CTL is a full-fledged, yet simple language that allows you to perform almost any imaginable transformation.)

CTL scripting allows you to specify custom field mapping using the simple CTL scripting language.

All Joiners share the same transformation template which can be found in CTL Templates for Joiners.

Java Interfaces

If you define your transformation in Java, it must implement the following interface that is common for all Joiners:

Java Interfaces for Joiners