In this section of DB2 pureXML Cookbook, you'll learn about the advantages and disadvantages of XML shredding or decomposing and different shredding methods. You'll find out when shredding is and isn't a good option for your company and find definitions of partial shredding and hybrid XML storage.
Table of contents:
Advantages and disadvantages of XML shredding
How to shred XML with the XMLTABLE function
Shredding XML docs into relational tables with annotated XML schemas
Examples of single and bulk XML shredding of XML documents
Chapter 11
Converting XML to Relational Data
This chapter describes methods to convert XML documents to rows in relational tables. This
conversion is commonly known as shredding or decomposing of XML documents. Given
the rich support for XML columns in DB2 you might wonder in which cases it can still be useful
or necessary to convert XML data to relational format. One common reason for shredding is that
existing SQL applications might still require access to the data in relational format. For example,
legacy applications, packaged business applications, or reporting software do not always understand
XML and have fixed relational interfaces. Therefore you might sometimes find it useful to
shred all or some of the data values of an incoming XML document into rows and columns of
relational tables.
In this chapter you learn:
- The advantages and disadvantages of shredding and of different shredding methods
(section 11.1)
- How to shred XML data to relational tables using INSERT statements that contain the
XMLTABLE function (section 11.2)
- How to use XML Schema annotations that map and shred XML documents to relational
tables (section 11.3)
11.1 ADVANTAGES AND DISADVANTAGES OF SHREDDING
 |
| Copyright Info |
This chapter is an excerpt from the new book DB2 pureXML Cookbook: Master the Power of the IBM Hybrid Data Server, by Matthias Nicola and Pav Kumar-Chatterjee. Published by IBM Press. isbn:0138150478 Copyright 2010 by International Business Machines Corporation. More information is available on the publisher's site as well as Safari Books Online. |
|
|
 |
 |
The concept of XML shredding is illustrated in Figure 11.1. In this example, XML documents
with customer name, address, and phone information are mapped to two relational tables. The
documents can contain multiple phone elements because there is a one-to-many relationship between customers and phones. Hence, phone numbers are shredded into a separate table. Each repeating element, such as phone, leads to an additional table in the relational target schema.
Suppose the customer information can also contain multiple email addresses, multiple accounts,
a list of most recent orders, multiple products per order, and other repeating items. The number of
tables required in the relational target schema can increase very quickly. Shredding XML into a
large number of tables can lead to a complex and unnatural fragmentation of your logical business
objects that makes application development difficult and error-prone. Querying the shredded
data or reassembling the original documents may require complex multiway joins.
Figure 11.1 Shredding of an XML document
Depending on the complexity, variability, and purpose of your XML documents, shredding may
or may not be a good option. Table 11.1 summarizes the pros and cons of shredding XML data to
relational tables.
Table 11.1 When Shredding Is and Isn't a Good Option
| Shredding Can Be Useful When… | Shredding Is Not A Good Option When… |
- Incoming XML data is just feeding an existing relational database.
- The XML documents do not represent logical business objects that should be preserved.
- Your primary goal is to enable existing relational applications to access XML data.
- You are happy with your relational schema and would like to use it as much as possible.
- The structure of your XML data is such that it can easily be mapped to relational tables.
- Your XML format is relatively stable and changes to it are rare.
- You rarely need to reconstruct the shredded documents.
- Querying or updating the data with SQL is more important than insert performance.
| - Your XML data is complex and nested, and difficult to map to a relational schema.
- Mapping your XML format to a relational schema leads to a large number of tables.
- Your XML Schema is highly variable or tends to change over time.
- Your primary goal is to manage XML documents as intact business objects.
- You frequently need to reconstruct the shredded documents or parts of them.
- Ingesting XML data into the database at a high rate is important for your application.
|
In many XML application scenarios the structure and usage of the XML data does not lend itself
to easy and efficient shredding. This is the reason why DB2 supports XML columns that allow
you to index and query XML data without conversion. Sometimes you will find that your application
requirements can be best met with partial shredding or hybrid XML storage.
- Partial shredding means that only a subset of the elements or attributes from each
incoming XML document are shredded into relational tables. This is useful if a relational
application does not require all data values from each XML document. In cases
where shredding each document entirely is difficult and requires a complex relational
target schema, partial shredding can simplify the mapping to the relational schema
significantly.
- Hybrid XML storage means that upon insert of an XML document into an XML column,
selected element or attribute values are extracted and redundantly stored in relational
columns.
If you choose to shred XML documents, entirely or partially, DB2 provides you with a rich set of
capabilities to do some or all of the following:
- Perform custom transformations of the data values before insertion into relational
columns.
- Shred the same element or attribute value into multiple columns of the same table or different
tables.
- Shred multiple different elements or attributes into the same column of a table.
- Specify conditions that govern when certain elements are or are not shredded. For example,
shred the address of a customer document only if the country is Canada.
- Validate XML documents with an XML Schema during shredding.
- Store the full XML document along with the shredded data.
DB2 9 for z/OS and DB2 9.x for Linux, UNIX, and Windows support two shredding methods:
- SQL INSERT statements that use the XMLTABLE function. This function navigates into
an input document and produces one or multiple relational rows for insert into a relational
table.
- Decomposition with an annotated XML Schema. Since an XML Schema defines the
structure of XML documents, annotations can be added to the schema to define how elements
and attributes are mapped to relational tables.
Table 11.2 and Table 11.3 discuss the advantages and disadvantages of the XMLTABLE method
and the annotated schema method.
Table 11.2 Considerations for the XMLTABLE Method
| Advantages of the XMLTABLE Method | Disadvantages of the XMLTABLE Method |
- It allows you to shred data even if you do not have an XML Schema.
- It does not require you to understand the XML Schema language or to understand schema annotations for decomposition.
- It is generally easier to use than annotated schemas because it is based on SQL and XPath.
- You can use familiar XPath, XQuery, or SQL functions and expressions to extract and optionally modify the data values.
- It often requires no or little work during XML Schema evolution.
- The shredding process can consume data from multiple XML and relational sources, if needed, such as values from DB2 sequences or look-up data from other relational tables.
- It can often provide better performance than
annotated schema decompositions.
| - For each target table that you want to shred into you need one INSERT statement.
- You might have to combine multiple INSERT statements in a stored procedure.
- There is no GUI support for implementing the INSERT statements and the required XMLTABLE functions. You need to be familiar with XPath and SQL/XML.
|
Table 11.3 Considerations for Annotated Schema Decomposition
| Advantages of the Annotated Schema Method | Disadvantages of the Annotated Schema Method |
- The mapping from XML to relational tables can be defined using a GUI in IBM Data Studio Developer.
- If you shred complex XML data into a large number of tables, the coding effort can be lower than with the XMLTABLE approach.
- It offers a bulk mode with detailed diagnostics if some documents fail to shred.
| - It does not allow shredding without an XML Schema.
- You might have to manually copy annotations when you start using a new version of your XML Schema.
- Despite the GUI support, you need to be familiar with the XML Schema language for all but simple shredding scenarios.
- Annotating an XML Schema can be complex, if the schema itself is complex.
|
More on converting XML to relational data: