We assume that you have already learned what is described in:
If you want to find the right Reader for your purposes, see Readers Comparison.
XMLExtract reads data from XML files.
Component | Data source | Input ports | Output ports | Each to all outputs1) | Different to different outputs2) | Transformation | Transf. req. | Java | CTL |
---|---|---|---|---|---|---|---|---|---|
XMLExtract | XML file | 0-1 | 1-n | no | yes | no | no | no | no |
Legend
1) Component sends each data record to all connected output ports.
2) Component sends different data records to different output ports using return values of the transformation (DataGenerator and MultiLevelReader). See Return Values of Transformations for more information. XMLExtract and XMLXPathReader send data to ports as defined in their Mapping or Mapping URL attribute.
XMLExtract reads data from XML files using SAX technology. It can also read data from compressed files, console, input port, and dictionary. This component is faster than XMLXPathReader which can read XML files too.
Port type | Number | Required | Description | Metadata |
---|---|---|---|---|
Input | 0 | no | For port reading. See Reading from Input Port. | One field (byte ,
cbyte , string ). |
Output | 0 | yes | For correct data records | Any1) |
1-n | 2) | For correct data records | Any1) (each port can have different metadata) |
Legend:
1): Metadata on each output port does not need to be the same. Each metadata can use Autofilling Functions.
2): Other output ports are required if mapping requires that.
Attribute | Req | Description | Possible values |
---|---|---|---|
Basic | |||
File URL | yes | Attribute specifying what data source(s) will be read (XML file, console, input port, dictionary). See Supported File URL Formats for Readers. | |
Charset | Encoding of records which are read. | any encoding, default system one by default | |
Mapping | 1) | Mapping of the input XML structure to output ports. See XMLExtract Mapping Definition for more information. | |
Mapping URL | 1) | Name of an external file, including its path which defines mapping of the input XML structure to output ports. See XMLExtract Mapping Definition for more information. | |
Namespace Bindings | Allows using arbitrary namespace prefixes in Mapping. See Namespaces. | ||
XML Schema | URL of the file that should be used for creating the Mapping definition. See XMLExtract Mapping Editor and XSD Schema for more information. | ||
Use nested nodes | By default, nested elements are also mapped to output
ports automatically. If set to false , an
explicit <Mapping> tag must be
created for each such nested element. | true (default) | false | |
Trim strings | By default, white spaces from the beginning and the end
of the elements values are removed. If set to
false , they are not removed. | true (default) | false | |
Advanced | |||
XML features | Sequence of individual expressions of one of the
following form: nameM:=true or
nameN:=false , where each
nameM is an XML feature that should be
validated. These expressions are separated from each other by
semicolon. See XML Features for more
information. | ||
Number of skipped mappings | Number of mappings to be skipped continuously throughout all source files. See Selecting Input Records. | 0-N | |
Max number of mappings | Maximum number of records to be read continuously throughout all source files. See Selecting Input Records. | 0-N |
Legend:
1) One of these must be specified. If both are specified, Mapping URL has higher priority.
Example 53.3. Mapping in XMLExtract
<Mappings> <Mapping element="employee" outPort="0" xmlFields="salary" cloverFields="basic_salary"> <Mapping element="child" outPort="1" parentKey="empID" generatedKey="parentID"/> <Mapping element="benefits" outPort="2" parentKey="empID;jobID" generatedKey="empID;jobID" sequenceField="seqKey" sequenceId="Sequence0"> <Mapping element="financial" outPort="3" parentKey="seqKey" generatedKey="seqKey"/> </Mapping> <Mapping element="project" outPort="4" parentKey="empID;jobID" generatedKey="empID;jobID"> <Mapping element="customer" outPort="5" parentKey="projName;projManager;inProjectID;Start" generatedKey="joinedKey"/> </Mapping> </Mapping> </Mappings>
XMLExtract Mapping Definition
Every Mapping definition (both the
contents of the file specified in the Mapping
URL attribute and the Mapping
attribute) consists of a pair of the start and the end
<Mappings>
tags. Both the start and the
end <Mappings>
tag are empty, without any
other attributes.
This pair of <Mappings>
tags
surrounds all of the nested <Mapping>
tags. Each of these <Mapping>
tags
contains some XMLExtract Mapping Tag Attributes. See also XMLExtract Mapping Tags for more
information.
Empty Mapping Tag (Without a Child)
<Mapping
element="[prefix:]nameOfElement"
XMLExtract Mapping Tag Attributes
/>
This corresponds to the following node of XML structure:
<[prefix:]nameOfElement>ValueOfTheElement</[prefix:]nameOfElement>
Non-Empty Mapping Tags (Parent with a Child)
<Mapping
element="[prefix:]nameOfElement"
XMLExtract Mapping Tag Attributes
>
(nested Mapping elements (only children,
parents with one or more children, etc.)
</Mapping>
This corresponds to the following XML structure:
<[prefix:]nameOfElement
elementAttributes>
(nested elements (only children, parents with
one or more children, etc.)
</[prefix:]nameOfElement>
Nested structure of <Mapping>
tags
copies the nested structure of XML elements in input XML files.
See example below.
Example 53.4. From XML Structure to Mapping Structure
If XML Structure Looks Like This:
<[prefix:]nameOfElement> <[prefix1:]nameOfElement1>ValueOfTheElement11</[prefix1:]nameOfElement1> ... <[prefixK:]nameOfElementM>ValueOfTheElementKM</[prefixK:]nameOfElementM> <[prefixL:]nameOfElementN> <[prefixA:]nameOfElementE>ValueOfTheElementAE</[prefixA:]nameOfElementE> ... <[prefixR:]nameOfElementG>ValueOfTheElementRG</[prefixR:]nameOfElementG> </[prefixK:]nameOfElementN> </[prefix:]nameOfElement>
Mapping Can Look Like This:
<Mappings> <Mapping element="[prefix:]nameOfElement" attributes> <Mapping element="[prefix1:]nameOfElement1" attributes11/> ... <Mapping element="[prefixK:]nameOfElementM" attributesKM/> <Mapping element="[prefixL:]nameOfElementN" attributesLN> <Mapping element="[prefixA:]nameOfElementE" attributesAE/> ... <Mapping element="[prefixR:]nameOfElementG" attributesRG/> </Mapping> </Mapping> </Mappings>
However, Mapping does not need to copy
all of the XML structure, it can start at the specified level
inside the XML file. In addition, if the default setting of the
Use nested nodes attribute is used
(true
), it also allows mapping of deeper nodes
without needing to create separate child
<Mapping>
tags for them).
Important | |
---|---|
Remember that mapping of nested nodes is possible only if their names are unique within their parent and confusion is not possible. |
XMLExtract Mapping Tag Attributes
Required
Each mapping tag must contain one
element
attribute. The value of this
element must be a node of the input XML structure, eventually
with a prefix (namespace).
element="[prefix:]name"
Optional
Number of output port to which data is sent. If not defined, no data from this level of Mapping is sent out using such level of Mapping.
If the
<Mapping>
tag does not contain any
outPort
attribute, it only serves to
identify where the deeper XML nodes are located.
Example: outPort="2"
Important | |
---|---|
The values from any
level can also be sent out using a higher parent
|
The parentKey
attribute serves to identify the parent for a child.
Thus, parentKey
is a sequence of metadata fields on the next parent level
separated by semicolon, colon, or pipe.
These fields are used in metadata on the port specified for such higher level element, they are filled with corresponding values and this attribute (parentKey
) only says what fields should be copied from parent level to child level as the identification.
For this reason, the number of these metadata fields and their data types must be the same in the
generatedKey
attribute or all values are
concatenated to create a unique string value. In such a case,
key has only one field.
Example:
parentKey="first_name;last_name"
The values of these parent CloudConnect fields are copied into CloudConnect fields specified in the generatedKey
attribute.
The generatedKey
attribute is filled with values taken from the parent element. It specifies the parent of the child.
Thus, generatedKey
is a sequence of metadata fields on the specified child level
separated by semicolon, colon, or pipe.
These metadata fields are used on the port specified for this child element, they are filled with values taken from parent level, in which they are sent to those metadata fields of the parentKey
attribute specified in this child level. It only says what fields should be copied from parent level to child level as the identification.
For this reason, the number of these metadata fields and their data types must be the same in the
parentKey
attribute or all values are
concatenated to create a unique string value. In such a case,
key has only one field.
Example:
generatedKey="f_name;l_name"
The values of these CloudConnect fields are taken from CloudConnect fields specified in the parentKey
attribute.
Sometimes a pair of parentKey
and
generatedKey
does not ensure unique
identification of records (the parent-child relation) - this is the case when one parent has mupliple children of the same element name.
In such a case, these children may be given numbers as the identification.
By default (if not defined otherwise by a created sequence), children are numbered by integer numbers starting from 1 with step 1.
This attribute is the name of metadata field of the specified level in which the distinguishing numbers are written.
It can serve as
parentKey
for the next nested level.
Example:
sequenceField="sequenceKey"
Optional
Sometimes a pair of parentKey
and
generatedKey
does not ensure unique
identification of records (the parent-child relation) - this is the case when one parent has mupliple children of the same element name.
In such a case, these children may be given numbers as the identification.
If this sequence is defined, it can be used to give numbers to these child elements even with different starting value and different step. It can also preserve values between subsequent runs of the graph.
Id of the sequence.
Example:
sequenceId="Sequence0"
Important | |
---|---|
Sometimes there may be a parent which has multiple children of the same element name. In such a case, these children cannot be indentified using the parent information copied from |
If the names of XML nodes or attributes should be changed, it has to be done using a pair of xmlFields
and
cloverFields
attributes.
A sequence of element or attribute names on the specified level can be separated by semicolon, colon, or pipe.
The same number of these names
has to be given in the
cloverFields
attribute.
Do not foget the values have to correspond to the specified data type.
Example:
xmlFields="salary;spouse"
What is more, you can reach further than the current level of XML elements and their attributes. Use the "../" string to reference "the parent of this element". See Source Tab for more information.
Important | |
---|---|
By default, XML names (element names and attribute names) are mapped to metadata fields by their name. |
If the names of XML nodes or attributes should be changed, it must be done using a pair of xmlFields
and
cloverFields
attributes.
Sequence of metadata field names on the specified level are separated by a semicolon, colon, or pipe.
The number of these names must be the same in the
xmlFields
attribute.
Also the values must correspond to the specified data type.
Example:
cloverFields="SALARY;SPOUSE"
Important | |
---|---|
By default, XML names (element names and attribute names) are mapped to metadata fields by their name. |
Optional
Number of elements which must be skipped. By default, nothing is skipped.
Example: skipRows="5"
Important | |
---|---|
Remember that also nested (child) elements are skipped when their parent is skipped. |
Optional
Number of elements which should be read. By default, all are read.
Example: numRecords="100"
In addition to writing the mapping code yourself, you can set the XML Schema attribute. It is the URL of a file containing an XSD schema that can be used for creating the Mapping definition.
When using an XSD, the mapping can be performed visually in the Mapping dialog. It consists of two tabs: the Mapping tab and the Source tab. The Mapping attribute can be defined in the Source tab, while in the Mapping tab you can work with your XML Schema.
Note | |
---|---|
If you do not possess a valid XSD schema for your source XML, you can switch to the Mapping tab and click which attempts to "guess" the XSD structure from the XML. |
In the pane on the left hand side of the Mapping tab, you can see a tree structure of the XML. Every element shows how many occurences it has in the source file (e.g. [0:n]). In this pane, you need to check the elements that should be mapped to the output ports.
At the top, you specify Output port for each
selected element by checking the check box.
You can then choose from the list of output ports labeled
portNumber(metadata)
, e.g. "3(customer)".
On the right hand side, you can see both XML Fields and CloudConnect Fields. You either map them to each other according to their names (Implicit mapping) or you map them yourself - explicitly. Please note that in XML Fields, not only elements but also their parent elements are visible (as long as parents have some fields) and can be mapped. In the picture below, the "customer" element is selected but we are allowed to leap over its parent element "project" to "employee" whose field "emplID" is actually mapped. Consequently, that enables you to create the whole mapping in a much easier way than if you used the Parent key and Generated key properties.
Anyway, in the Mapping tab, you can specify all the ordinary mapping properties: Parent key, Generated key, Sequence id and/or Sequence field.
Once you define all elements, specify output ports, mapping and other properties, you can switch to the Source tab. The mapping code is displayed there. Its structure is the same as described in the preceding sections.
Note | |
---|---|
If you do not possess a valid XSD schema for your source XML, you will not be able to map elements visually and you have to do it here in Source. |
If you want to map an element to XML fields of its parents, use the "../" string (like in the file system) before the field name. Every "../" stands for "this element's parent", so "../../" would mean the element's parent's parent and so on. Examine the example below which relates to Figure 53.14, Parent Elements Visible in XML Fields. The "../../empID" is a field of "employee" as made available to the currently selected element "customer".
<Mapping element="employee"> <Mapping element="project"> <Mapping element="customer" xmlFields="name;../../empID" cloverFields="name;empId"/> </Mapping> </Mapping>
There's one thing that one should keep in mind when referencing parent elements particularly if you rely on
the Use nested nodes property set to true
: To reference
one parent level using "../" actually means to reference that ancestor element (over more parents) in the XML
which is defined in the direct parent <Mapping>
of <Mapping>
with the "../" parent reference.
An example is always a good thing so here it goes. Let us recall the mapping from last example. We will omit one of its
<Mapping>
elements and notice how also the parent field reference had to be changed accordingly.
<Mapping element="employee"> <Mapping element="customer" xmlFields="name;../empID" cloverFields="name;empId"/> </Mapping>
Since version 3.1.0 it is possible to map the value of an element using the '.' dot syntax.
The dot can be used in the xmlFields
attribute just the same way as any other XML element/attribute name.
In the visual mapping editor, the dot is represented in the XML Fields tree as the element's contents.
The dot represents 'the element itself' (its name). Every other occurence of the element's name in mapping (as text, e.g. "customer") represents the element's subelement or attribute.
The following chunk of code maps the value of element customer
on metadata field customerValue
and the value of customer
's parent element project
on metadata field projectValue
.
<Mapping element="project"> <Mapping element="customer" outPort="0" xmlFields=".;../." cloverFields="customerValue;projectValue"/> </Mapping>
The element value consists of the text enclosed between the element's start and end tag only if it has no child elements. If the element has child element(s), then the element's value consists of the text between the element's start tag and the start tag of its first child element.
Important | |
---|---|
Remember that element values are mapped to CloudConnect fields by their names.
Thus, the However, if you want to rename the
<Mapping ... xmlFields="customer" cloverFields="newFieldName" /> Moreover, when you have an XML file containg an element and an attribute of the same name: <customer customer="JohnSmithComp"> ... </customer> you can map both the element and the attribute value to two different fields: <Mapping element="customer" outPort="2" xmlFields=".;customer" cloverFields="customerElement;customerAttribute"/> </Mapping> You could even come across a more complicated situation stemming from the example above - the element has an attribute and a subelement all of the same name. The only thing to do is add another mapping at the end of the construct. Notice you can optionally send the subelement to a different output port than its parent. The other option is to leave the mapping blank, but you have to handle the subelement somehow: <Mapping element="customer" outPort="2" xmlFields=".;customer" cloverFields="customerElement;customerAttribute"/> <Mapping element="customer" outPort="4" /> // customer's subelement called 'customer' as well </Mapping> Remember the explicit mapping (renaming fields) shown in the examples has a higher priority than the implicit mapping. |
Source tab is the only place where templates can be used. Templates are useful when reading a lot of nested elements or recursive data in general.
A template consists of a declaration and a body. The body stretches from the declaration on
(up to a potential template reference, see below) and can contain
arbitrary mapping. The declaration
is an element containing the templateId
attribute.
See example template declaration:
<Mapping element="category" templateId="myTemplate"> <Mapping element="subCategory" xmlFields="name" cloverFields="subCategoryName"/> </Mapping>
To use a template, fill in the templateRef
attribute
with an existing templateId
. Obviously, you have to
declare a template first before referencing it. The effect of using a template is
that the whole mapping starting with the declaration
is copied to the place where the template reference appears. The advantage
is obvious: every time you need to change a code that often repeats,
you make the change on one place only - in the template.
See
a basic example
of how to reference a template in your mapping:
<Mapping templateRef="myTemplate" />
Furthermore, a template reference can appear inside a template declaration. The reference should be placed as the last element of the declaration. If you reference the same template that is being declared, you will create a recursive template.
You should always keep in mind how the source XML looks like. Remember
that if you have n levels of nested data you should set the nestedDepth
attribute to n.
Look at
the example:
<Mapping element="myElement" templateId="nestedTempl"> <!-- ... some mapping ... --> <Mapping templateRef="nestedTempl" nestedDepth="3"/> </Mapping> <!-- template declaration ends here -->
Note | |
---|---|
The following chunk of code: <Mapping templateRef="unnestedTempl" nestedDepth="3" /> can be imagined as <Mapping templateRef="unnestedTempl"> <Mapping templateRef="unnestedTempl"> <Mapping templateRef="unnestedTempl"> </Mapping> </Mapping> </Mapping>
and you can use both ways of nesting references. The latter one
with three nested references can produce unexpected results
when inside a template declaration, though.
As we step deeper and deeper, each <Mapping element="wrap"> <Mapping element="realElement" templateId="unnestedTempl" <!-- ... some mapping ... --> </Mapping> <!-- template declaration ends here --> </Mapping> <!-- end of wrap --> <Mapping templateRef="unnestedTempl"> <Mapping templateRef="unnestedTempl"> <Mapping templateRef="unnestedTempl"> </Mapping> </Mapping> </Mapping> |
In summary, working with nestedDepth
instead of
nested template references always
grants transparent results. Its use is recommended.
If you supply an XML Schema which has a namespace, the namespace is automatically extracted to Namespace Bindings and given a Name. The Name does not have to exactly match the namespace prefix in the input schema, though, as it is only a denotation. You can edit it anytime in the Namespace Bindings attribute as shown below:
After you open Mapping, namespace prefixes will appear before element and attribute names. If Name was left blank, you would see the namespace URI instead.
Note | |
---|---|
If your XSD contains two or more namespaces, mapping elements to the output in the visual editor is not supported. You have to switch to the Source tab and handle namespaces yourself. Use the 'Add' button in Namespace Bindings to pre-prepare a namespace. You will then use it in the source code like this:
Name =
Value = lets you write
instead of
|