We assume that you have already learned what is described in:
If you want to find the right Transformer for your purposes, see Transformers Comparison.
Dedup removes duplicate records.
Component | Same input metadata | Sorted inputs [ 1) ] | Inputs | Outputs | Java | CTL |
---|---|---|---|---|---|---|
Dedup | - | 1 | 0-1 | - | - | |
[ 1) ] Input records may be sorted only partially, i.e., the records with the same value of the Dedup key are grouped together but the groups are not ordered |
Dedup reads data flow of records grouped by
the same values of the Dedup key. The key is formed
by field name(s) from input records. If no key is specified, the component
behaves like the Unix
head
or tail
command. The groups
don't have to be ordered.
The component can select the specified number of the first or the last records from the group or from the whole input. Only those records with no duplicates can be selected too.
The deduplicated records are sent to output port 0. The duplicate records may be sent through output port 1.
Port type | Number | Required | Description | Metadata |
---|---|---|---|---|
Input | 0 | for input data records | any | |
Output | 0 | for deduplicated data records | equal input metadata [ 1)] | |
1 | for duplicate data records | |||
[ 1)] Metadata can be propagated through this component. |
Attribute | Req | Description | Possible values |
---|---|---|---|
Basic | |||
Dedup key | Key according to which the records are deduplicated. By default, i.e., if the Dedup key is not set, the in Number of duplicates attribute specified number of records from the beginning or the end of all input records is preserved while removing the others. If the Dedup key is set, only specified number of records with the same values in fields specified as the Dedup key is picked up. See Dedup key. | ||
Keep | Defines which records will be preserved. If First ,
those from the beginning. If Last , those from the end.
Records are selected from a group or the whole input. If
Unique , only records with no duplicates are selected. | First (default) | Last | Unique | |
Equal NULL | By default, records with null values of key fields are
considered to be equal. If false ,
they are considered to be different. | true (default) | false | |
Number of duplicates | Maximum number of duplicate records to be selected
from each group of adjacent records with equal key
value or , if key not set, maximum number of records from
the beginning or the end
of all records. Ignored if Unique option
selected. | 1 (default) | 1-N |
The component can process sorted input data as well as partially sorted ones. When setting the fields composing the Dedup key, choose the proper Order attribute:
Ascending - if the groups of input records with the same key field value(s) are sorted in ascending order
Descending - if the groups of input records with the same key field value(s) are sorted in descending order
Auto - the sorting order of the groups of input records is guessed from the first two records with different value in the key field, i.e., from the first records of the first two groups.
Ignore - if the groups of input records with the same key field value(s) are not sorted