In this section of DB2
pureXML Cookbook, you'll learn about the advantages and disadvantages of XML shredding or
decomposing and different shredding methods. You'll find out when shredding is and isn't a good
option for your company and find definitions of partial shredding and hybrid XML storage.
TABLE OF CONTENTS
- Advantages and disadvantages of XML shredding
- How
to shred XML with the XMLTABLE function
- Shredding
XML docs into relational tables with annotated XML schemas
- Examples
of single and bulk XML shredding of XML documents
Chapter 11 Converting XML to Relational Data
This chapter describes methods to convert XML documents to rows in relational tables. This
conversion is commonly known as shredding or decomposing of XML documents. Given the
rich support for XML columns in DB2 you might wonder in which cases it can still be useful or
necessary to convert XML data to relational format. One common reason for shredding is that
existing SQL applications might still require access to the data in relational format. For example,
legacy applications, packaged business applications, or reporting software do not always understand
XML and have fixed relational interfaces. Therefore you might sometimes find it useful to shred all
or some of the data values of an incoming XML document into rows and columns of relational
tables.
In this chapter you learn:
- The advantages and disadvantages of shredding and of different shredding
methods (section 11.1)
- How to shred XML data to relational tables using INSERT statements
that contain the XMLTABLE function (section 11.2)
- How to use XML Schema annotations that map and shred XML documents to
relational tables (section 11.3)
11.1 ADVANTAGES AND DISADVANTAGES OF SHREDDING
The concept of XML shredding is illustrated in Figure 11.1. In this example, XML documents with
customer name, address, and phone information are mapped to two relational tables. The documents
can contain multiple phone elements because there is a one-to-many relationship between customers
and phones. Hence, phone numbers are shredded into a separate table. Each repeating element, such
as phone, leads to an additional table in the relational target schema. Suppose the customer
information can also contain multiple email addresses, multiple accounts, a list of most recent
orders, multiple products per order, and other repeating items. The number of tables required in
the relational target schema can increase very quickly. Shredding XML into a large number of tables
can lead to a complex and unnatural fragmentation of your logical business objects that makes
application development difficult and error-prone. Querying the shredded data or reassembling the
original documents may require complex multiway joins.
Figure 11.1 Shredding of an XML document
Depending on the complexity, variability, and purpose of your XML documents, shredding may or
may not be a good option. Table 11.1 summarizes the pros and cons of shredding XML data to
relational tables.
Table 11.1 When Shredding Is and Isn't a Good Option
Shredding Can Be Useful When… |
Shredding Is Not A Good Option When… |
- Incoming XML data is just feeding an existing relational database.
- The XML documents do not represent logical business objects that should be
preserved.
- Your primary goal is to enable existing relational applications to access XML
data.
- You are happy with your relational schema and would like to use it as much as
possible.
- The structure of your XML data is such that it can easily be mapped to
relational tables.
- Your XML format is relatively stable and changes to it are rare.
- You rarely need to reconstruct the shredded documents.
- Querying or updating the data with SQL is more important than insert
performance.
|
- Your XML data is complex and nested, and difficult to map to a relational
schema.
- Mapping your XML format to a relational schema leads to a large number of
tables.
- Your XML Schema is highly variable or tends to change over time.
- Your primary goal is to manage XML documents as intact business objects.
- You frequently need to reconstruct the shredded documents or parts of
them.
- Ingesting XML data into the database at a high rate is important for your
application.
|
In many XML application scenarios the structure and usage of the XML data does not lend itself
to easy and efficient shredding. This is the reason why DB2 supports XML columns that allow you to
index and query XML data without conversion. Sometimes you will find that your application
requirements can be best met with partial shredding or hybrid XML storage.
- Partial shredding means that only a subset of the elements or
attributes from each incoming XML document are shredded into relational tables. This is useful if a
relational application does not require all data values from each XML document. In cases
where shredding each document entirely is difficult and requires a complex relational target
schema, partial shredding can simplify the mapping to the relational schema significantly.
- Hybrid XML storage means that upon insert of an XML document into an
XML column, selected element or attribute values are extracted and redundantly stored in relational
columns.
More on enterprise data architecture
Read about Onefinestay's move to a graph
database
Learn about the big
data architecture integration options
Hear William McKnight discuss relational
vs. NoSQL databases
If you choose to shred XML documents, entirely or partially, DB2 provides you with a rich set of
capabilities to do some or all of the following:
- Perform custom transformations of the data values before insertion into
relational columns.
- Shred the same element or attribute value into multiple columns of the same
table or different tables.
- Shred multiple different elements or attributes into the same column of a
table.
- Specify conditions that govern when certain elements are or are not shredded.
For example, shred the address of a customer document only if the country is Canada.
- Validate XML documents with an XML Schema during shredding.
- Store the full XML document along with the shredded data.
DB2 9 for z/OS and DB2 9.x for Linux, UNIX, and Windows support two shredding methods:
- SQL INSERT statements that use the XMLTABLE function. This
function navigates into an input document and produces one or multiple relational rows for insert
into a relational table.
- Decomposition with an annotated XML Schema. Since an XML Schema defines the
structure of XML documents, annotations can be added to the schema to define how elements and
attributes are mapped to relational tables.
Table 11.2 and Table 11.3 discuss the advantages and disadvantages of the XMLTABLE
method and the annotated schema method.
Table 11.2 Considerations for the XMLTABLE Method
Advantages of the XMLTABLE Method |
Disadvantages of the XMLTABLE Method |
- It allows you to shred data even if you do not have an XML Schema.
- It does not require you to understand the XML Schema language or to
understand schema annotations for decomposition.
- It is generally easier to use than annotated schemas because it is based on
SQL and XPath.
- You can use familiar XPath, XQuery, or SQL functions and expressions to
extract and optionally modify the data values.
- It often requires no or little work during XML Schema evolution.
- The shredding process can consume data from multiple XML and relational
sources, if needed, such as values from DB2 sequences or look-up data from other relational
tables.
- It can often provide better performance than annotated schema
decompositions.
|
- For each target table that you want to shred into you need one
INSERT statement.
- You might have to combine multiple INSERT statements in a stored
procedure.
- There is no GUI support for implementing the INSERT statements and
the required XMLTABLE functions. You need to be familiar with XPath and SQL/XML.
|
Table 11.3 Considerations for Annotated Schema Decomposition
Advantages of the Annotated Schema Method |
Disadvantages of the Annotated Schema Method |
- The mapping from XML to relational tables can be defined using a GUI in IBM
Data Studio Developer.
- If you shred complex XML data into a large number of tables, the coding
effort can be lower than with the XMLTABLE approach.
- It offers a bulk mode with detailed diagnostics if some documents fail to
shred.
|
- It does not allow shredding without an XML Schema.
- You might have to manually copy annotations when you start using a new
version of your XML Schema.
- Despite the GUI support, you need to be familiar with the XML Schema language
for all but simple shredding scenarios.
- Annotating an XML Schema can be complex, if the schema itself is
complex.
|
Copyright info
This chapter is an excerpt from the new book DB2 pureXML Cookbook: Master the Power of the
IBM Hybrid Data Server, by Matthias Nicola and Pav Kumar-Chatterjee.
Published by IBM Press. isbn:0138150478
Copyright 2010 by International Business Machines Corporation.
More information is available on the publisher's
site as well as Safari
Books Online.
More on converting XML to relational data: