We assume that you have already learned what is described in:
If you want to find the right Transformer for your purposes, see Transformers Comparison.
EmailFilter filters input records according to the specified condition.
Component | Same input metadata | Sorted inputs | Inputs | Outputs | Java | CTL |
---|---|---|---|---|---|---|
EmailFilter | - | no | 1 | 0-2 | - | - |
EmailFilter receives incoming records through its input port and verifies specified fields for valid e-mail addresses. Data records that are accepted as valid are sent out through the optional first output port if connected. Specified fields from the rejected inputs can be sent out through the optional second output port if this is connected to other component. Metadata on the optional second output port may also contain up to two additional fields with information about error.
Port type | Number | Required | Description | Metadata |
---|---|---|---|---|
Input | 0 | yes | For input data records | Any |
Output | 0 | no | For valid data records | Input 01) |
1 | no | For rejected data records | Any2) |
Legend:
1): Metadata cannot be propagated through this component.
2): Metadata on the output port 0 contain any of the input data fields plus up to two additional fields. Fields whose names are the same as those in the input metadata are filled in with input values of these fields.
Table 55.2. Error Fields for EmailFilter
Field number | Field name | Data type | Description |
---|---|---|---|
FieldA | the Error field attribute value | string | Error field |
FieldB | the Status field attribute value | integer1) | Status field |
Legend:
1): The following error codes are most common:
0 No
error
- e-mail address accepted.
1 Syntax
error
- any string that does not conform to e-mail
address format specification is rejected with this error
code.
2 Domain
error
- verification of domain failed for the address.
Either the domain does not exist or the DNS system can not
determine a mail exchange server.
3 SMTP handshake
error
- at SMTP
level this error code
indicates that a mail exchange server for specified domain is
either unreachable or the connection failed for other reason (e.g.
server to busy, etc.).
4 SMTP verify
error
- at SMTP
level this error code
means that server rejected the address as being invalid using the
VRFY command. Address is officially invalid.
5 SMTP recipient
error
- at SMTP
level this error code
means the server rejected the address for delivery.
6 SMTP mail
error
- at MAIL
level this error
indicates that although the server accepted the test message for
delivery, an error occurred during send.
Attribute | Req | Description | Possible values |
---|---|---|---|
Basic | |||
Field list | yes | List of selected input field names whose values should be verified as valid or non-valid e-mail addresses. Expressed as a sequence of field names separated by colon, semicolon, or pipe. | |
Level of inspection | Various methods used for the e-mail address verification can be specified. Each level includes and extends its predecessor(s) on the left. See Level of Inspection for more information. | SYNTAX | DOMAIN (default) | SMTP | MAIL | |
Accept empty | By default, even empty field is accepted as a valid
address. This can be switched off, if it is set to
false . See Accept
Conditions for more
information. | true (default) | false | |
Error field | Name of the output field to which error message can be written (for rejected records only). | ||
Status field | Name of the output field to which error code can be written (for rejected records only). | ||
Multi delimiter | Regular expression that serves to split individual field value to multiple e-mail addresses. If empty, each field is treated as a single e-mail address. | [,;] (default) | other | |
Accept condition | By default, record is accepted even if at least one field value is verified
as valid e-mail address. If set to STRICT ,
record is accepted only if all field values from all fields of
the Field list are valid. See Accept
Conditions for more
information. | LENIENT (default) | STRICT | |
Advanced | |||
E-mail buffer size | Maximum number of records that are read into memory after which they are bulk processed. See Buffer and Cache Size for more information. | 2000 (default) | 1-N | |
E-mail cache size | Maximum number of cached e-mail address verification results. See Buffer and Cache Size for more information. | 2000 (default) | 0 (caching is turned off) | 1-N | |
Domain cache size | Maximum number of cached DNS query results. Is ignored at SYNTAX level. | 3000 (default) | 0 (caching is turned off) | 1-N | |
Domain retry timeout (ms) | Timeout in millisecond for each DNS query attempt. Thus, maximum time in milliseconds spent to resolving equals to Domain retry timeout multiplicated by Domain retry count. | 800 (default) | 1-N | |
Domain retry count | Number of retries for failed DNS queries. | 2 (default) | 1-N | |
Domain query A records | By default, according to the SMTP standard, if no MX
record could be found, A record should be searched. If set to
false , DNS query is two times faster,
however, this SMTP standard is broken.. | true (default) | false | |
SMTP connect attempts (ms,...) | Attempts for connection and HELO. Expressed as a sequence of numbers separated by comma. The numbers are delays between individual attempts to connect. | 1000,2000 (default) | |
SMTP anti-greylisting attempts (s,...) | Anti-greylisting feature. Attempts and delays between individual attempts expressed as a sequence of number separated by comma. If empty, anti-greylisting is turned off. See SMTP Grey-Listing Attempts for more information. | 30,120,240 (default) | |
SMTP retry timeout (s) | TCP timeout in seconds after which a SMTP request fails. | 300 (default) | 1-N | |
SMTP concurrent limit | Maximum number of parallel tasks when anti-greylisting is on. | 10 (default) | 1-N | |
Mail From | The From field of a dummy message
sent at MAIL level. | CloudConnect <cloudconnect@cloudconnect.org> (default) | other | |
Mail Subject | The Subject field of a dummy message
sent at MAIL level. | Hello, this is a test message (default) | other | |
Mail Body | The Body of a dummy message sent at
MAIL level. | Hello,\nThis is CloudConnect text message.\n\nPlease ignore and don't respond. Thank you, have a nice day! (default) | other |
Increasing E-mail buffer size avoids unnecessary repeated queries to DNS system and SMTP servers by processing more records in a single query. On the other hand, increasing E-mail cache size might produce even better performance since addresses stored in cache can be verified in an instant. However, both parameters require extra memory so set it to the largest values you can afford on your system.
By default, even an empty field from input data records
specified in the List of fields is considered to
be a valid e-mail address. The Accept empty
attribute is set to true
by default. If you want to
be more strict, you can switch this attribute to
false
.
In other words, this means that at least one valid e-mail address is sufficient for considering the record accepted.
On the other hand, in case of Accept
condition set to STRICT
, all e-mail
addresses in the List of fields must be valid
(either including or excluding empty values depending on the
Accept empty attribute).
Thus, be careful when setting these two attributes:
Accept empty and Accept
condition. If there is an empty field among fields
specified in List of fields, and all other
non-empty values are verified as invalid addresses, such record gets
accepted if both Accept condition is set to
LENIENT
and Accept empty is
set to true
. However, in reality, such record does
not contain any useful and valid e-mail address, it contains only an
empty string which assures that such record is accepted.
SYNTAX
At the first level of validation
(SYNTAX
), the syntax of e-mail expressions is
checked and even both non-strict conditions and international
characters (except TLD) are allowed.
DOMAIN
At the second level of validation
(DOMAIN
) - which is the default one - DNS
system is queried for domain validity and mail exchange server
information. The following four attributes can be set to optimize
the ratio of performance to false-negative responses:
Domain cache size, Domain retry
timeout, Domain retry count. and
Domain query A records. The number of queries
sent to DNS server is specified by the Domain retry
count attribute. Its default value is 2. Time interval
between individual queries that are sent is defined by
Domain retry timeout in milliseconds. By
default it is 800 milliseconds. Thus, the whole time during which
the queries are being resolved is equal to Domain retry
count x Domain retry timeout. The
results of queries can be cached. The number of cached results is
defined by Domain cache size. By default,
3000 results are cached. If you set this attribute to 0, you turn
the caching off. You can also decide whether A records should be
searched if no MX record is found (Domain query A
records). By default, it is set to
true
. Thus, A record is searched if MX record
is not found. However, you can switch this off by setting the
attribute to false
. This way you can speed the
searching two times, although that breaks the SMTP
standard.
SMTP
At the third level of validation (SMTP
),
attempts are made to connect SMTP server. You need to specify the
number of attempts and time intervals between individual attempts.
This is defined using the SMTP connect
attempts attribute. This attribute is a sequence of
integer numbers separated by commas. Each number is the time (in
seconds) between two attempts to connect the server. Thus, the
first number is the interval between the first and the second
attempts, the second number is the interval between the second and
the third attempts, etc. The default value is three attempts with
time intervals between the first and the second attempts equal to
1000 and between the second and the third attempts equal to 2000
milliseconds.
Additionally, the EmailFilter component
at SMTP
and MAIL
levels is
capable to increase accuracy and eliminate false-negatives caused
by servers incorporating greylisting. Greylisting is one of very
common anti-spam techniques based on denial of delivery for
unknown hosts. A host becomes known and "greylisted" (i.e. not
allowed) when it retries its delivery after specified period of
time, usually ranging from 1 to 5 minutes. Most spammers do not
retry the delivery after initial failure just for the sake of high
performance. EmailFilter has an
anti-greylisting feature which retries each failed
SMTP/MAIL
test for specified number of times
and delays. Only after the last retry fails, the address is
considered as invalid.
At the fourth level (MAIL
), if all has
been successful, you can send a dummy message to the specified
e-mail address. The message has the following properties:
Mail From, Mail Subject
and Mail Body. By default, the message is
sent from CloudConnect
<cloudconnect@cloudconnect.org>
, its subject is
Hello, this is a test message
. And its default
body is as follows: Hello,\nThis is CloudConnect test
message.\n\nPlease ignore and don't respond. Thank you and have a
nice day!
To turn anti-greylisting feature, you can specify the SMTP grey-listing attempts attribute. Its default value is 30,120,240. These numbers means that four attempts can be made with time intervals between them that equal to 30 seconds (between the first and the second), 120 seconds (between the second and the third) and 240 seconds (between the third and the fourth). You can change the default values by any other comma separated sequence of integer numbers. The maximum number of parallel tasks that are performed when anti-greylisting is turned on is specified by the SMTP concurrent limit attribute. Its default value is 10.