EmailReader

Commercial Component

We assume that you have already learned what is described in:

If you want to find the right Reader for your purposes, see Readers Comparison .

Short Summary

EmailReader reads a store of email messages, either locally from a delimited flat file, or on an external server

Component Same input metadata Sorted inputs Inputs Outputs Java CTL
EmailReader
no
no
12--

Abstract

EmailReader is a component that enables reading of online or local email messages.

This component parses email messages and writes their attributes out to two attached output ports. The first port, the content port, outputs relevant information about the email and body. The second port, the attachment port, writes information relevant to any attachments that the email contains.

The content port will write one record per email message. The attachment port can write multiple records per email message; one record for each attachment it encounters.

Icon

Ports

When looking at ports, it is necessary that use-case scenarios be understood. This component has the ability to read data from a local source, or an external server. The component decides which case to use based on whether there is an edge connected to the single input port.

Case One: If an edge is attached to the input port, the component assumes that it will be reading data locally. In many cases, this edge will come from a CSVReader. In this case, a file can contain multiple email message bodies, separated by a chosen delimeter, and each message will be passed one by one into the EmailReader for parsing and processing.

Case Two: If an edge is not connected to the input port, the component assumes that messages will be read from an external server. In this case, the user must enter related attributes, such as the server host and protocol parameters, as well as any relevant username and/or password.

Port typeNumberRequiredDescriptionMetadata
Input0
no
For inputting email messages from a flat file String field
Output0
no
The content portAny
1
no
The attachment portAny

EmailReader Attributes

Whether many of the attributes are required or not depends solely on the configuration of the component. See Ports: in Case Two, where an edge is not connected to the input port, many attributes are required in order to connect to the external server. The user at minimum must choose a protocol and enter a hostname for the server. Usually a username and password will also be required.

AttributeReqDescriptionPossible values
Basic
Server Type  Protocol utilized to connect to a mail server. Options are POP3 and IMAP. In most cases, IMAP should be selected if possible, as it is an improvement over POP3. POP3, IMAP
Server Name The hostname of the server.e.g. imap.google.com
Server Port  Specifies the port used to connect to an external server. If left blank, a default port will be used. Integers
Security  Specifies the security protocol used to connect to the server. NONE,SSL,STARTTLS, SSL+STARTTLS
User Name  Username to connect to the server (if authorization is required)  
Password  Password to connect to server (if authorization is required)  
Fetch Messages  Filters messages based on their status. The option ALL will read every message located on the server, regardless of its status. NEW fetches only messages that have not been read. NEW,ALL
Field MappingYesDefines how parts of the email (standard and user-defined) will be mapped to CloudConnect fields. See Mapping Fields.
Advanced
Temp File URL  Specifies a directory for temporary storage of any files found inside of attachments. These filenames may be attained from the output "attachment" port's filename attribute. The default directory is the current CloudConnect project's temporary directory, denoted ${DATATMP_DIR}  
POP3 Cache File  Specifies the URL of a file used to keep track of which messages have been read. POP3 servers by default have no way of keeping track of read/unread messages. If one wishes to fetch only unread messages, they must download all of the messages IDs from the server, and then compare them with a list of message IDs that have already been read. Using this method, only the messages that do not appear in this list are actually downloaded, thus saving bandwidth. This file is simply a delimited text file, storing the unique message IDs of messages that have already been read. Even if ALL messages is chosen, the user should still provide a cache file, as it will be populated by the messages read. Note: the pop cache file is universal; it can be shared amongst many inboxes, or the user can choose to maintain a separate cache for different mailboxes.  

Advanced Description

Mapping Fields

If you edit the Field Mapping attribute, you will get the following simple dialog:

Mapping to CloudConnect fields in EmailReader

Figure 53.10. Mapping to CloudConnect fields in EmailReader


In its two tabs - Message and Attachments - you map incoming email fields to CloudConnect fields by a simple drag and drop. Notice the buttons on the right hand side allowing you to Cancel all mappings. Auto mapping is automatically performed when you first open this window. Finally, remember you will only see metadata fields in Attachments if you are using the second output port (see Ports to learn why).

[Note]Note

User-defined Fields let you handle all fields that can occur besides the Standard ones. Example: custom fields in the email header.

Tips&Tricks

Performance Bottlenecks