Home > Shredding XML docs into relational tables with annotated XML schemas
Chapter Download:
EMAIL THIS

Shredding XML docs into relational tables with annotated XML schemas

26 Oct 2009 | by Matthias Nicola and Pav Kumar-Chatterjee

Tips, expert advice and sample chapters
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google

DB2 Pure XML cover In this section of DB2 pureXML Cookbook you'll find out how to shred XML documents into relational tables with annotated XML schemas. You'll find examples of the 14 different types of annotations and learn how to define schemas visually in IBM Data Studio.


Table of contents:

Advantages and disadvantages of XML shredding
How to shred XML with the XMLTABLE function
Shredding XML docs into relational tables with annotated XML schemas
Examples of shredding XML documents singly and in bulk


11.3 SHREDDING WITH ANNOTATED XML SCHEMAS

This section describes another approach to shredding XML documents into relational tables. The approach is called annotated schema shredding or annotated schema decomposition because it is based on annotations in an XML Schema. These annotations define how XML elements and attributes in your XML data map to columns in your relational tables.

To perform annotated schema shredding, take the following steps:

  • Identify or create the relational target tables that will hold the shredded data.
  • Annotate your XML Schema to define the mapping from XML to the relational tables.
  • Register the XML Schema in the DB2 XML Schema Repository.
  • Shred XML documents with Command Line Processor commands or built-in stored procedures.

Assuming you have defined the relational tables that you want to shred into, let's look at annotating an XML Schema.

Copyright Info
This chapter is an excerpt from the new book DB2 pureXML Cookbook: Master the Power of the IBM Hybrid Data Server,
by Matthias Nicola and Pav Kumar-Chatterjee.
Published by IBM Press.
isbn:0138150478
Copyright 2010 by International Business Machines Corporation.
More information is available on the publisher's site as well as Safari Books Online.
11.3.1 Annotating an XML Schema
Schema annotations are additional elements and attributes in an XML Schema to provide mapping information. DB2 can use this information to shred XML documents to relational tables. The annotations do not change the semantics of the original XML Schema. If a document is valid for the annotated schema then it is also valid for the original schema, and vice versa. You can use an annotated schema to validate XML documents just like the original XML Schema. For an introduction to XML Schemas, see Chapter 16, Managing XML Schemas.

The following is one line from an XML Schema:

This line defines an XML element called street and declares that its data type is xs:string and that this element has to occur at least once. You can add a simple annotation to this element definition to indicate that the element should be shredded into the column STREET of the table ADDRESS. The annotation consists of two additional attributes in the element definition, as follows:

db2-xdb:rowSet="ADDRESS" db2-xdb:column="STREET"/>

The same annotation can also be provided as schema elements instead of attributes, as shown next. You will see later in Figure 11.8 why this can be useful.

< xs:element name="street" type="xs:string" minOccurs="1" >
  < xs:annotation >
    < xs:appinfo >
      < db2-xdb:rowSetMapping >
        < db2-xdb:rowSet >ADDRESS< /db2-xdb:rowSet >
        STREET< /db2-xdb:column >
      < /db2-xdb:rowSetMapping >
   < /xs:appinfo >
 < /xs:annotation >

< xs:element/ >

The prefix xs is used for all constructs that belong to the XML Schema language, and the prefix db2-xdb is used for all DB2-specific schema annotations. This provides a clear distinction and ensures that the annotated schema validates the same XML documents as the original schema.

There are 14 different types of annotations. They allow you to specify what to shred, where to shred to, how to filter or transform the shredded data, and in which order to execute inserts into the target tables. Table 11.4 provides an overview of the available annotations, broken down into logical groupings by user task. The individual annotations are further described in Table 11.5.

Table 11.4 Overview and Grouping of Schema Annotations
If You Want toUse This Annotation
Specify the target tables to shred intodb2-xdb:rowSet
db2-xdb:column
db2-xdb:SQLSchema
db2-xdb:defaultSQLSchema
Specify what to shreddb2-xdb:contentHandling
Transform data values while shreddingdb2-xdb:expression
db2-xdb:normalization
db2-xdb:truncate
Filter datadb2-xdb:condition
db2-xdb:locationPath
Map an element or attribute to multiple columnsdb2-xdb:rowSetMapping
Map several elements or attributes to the same columndb2-xdb:table
Define the order in which rows are inserted into the target table, to avoid referential integrity violationsdb2-xdb:rowSetOperationOrder
db2-xdb:order

Table 11.5 XML Schema Annotations
AnnotationDescription
db2-xdb:defaultSQLSchemaThe default relational schema for the target tables.
db2-xdb:SQLSchemaOverrides the default schema for individual tables.
db2-xdb:rowSetThe table name that the element or attribute is mapped to
db2-xdb:columnThe column name that the element or attribute is mapped to.
db2-xdb:contentHandlingFor an XML element, this annotation defines how to derive the value that will be inserted into the target column. You can chose the text value of just this element (text), the concatenation of this element's text and the text of all its descendant nodes (stringValue), or the serialized XML (including all tages) of this element and all descendants (serializeSubtree). If you omit this annotation, DB2 chooses an appropriate default based on the nature of the respective element.
db2-xdb:truncateSpecifies whether a value should be truncated if its length is greater than the length of the target column.
db2-xdb:normalizationSpecifies how to treat whitespace—valid values are whitespaceStrip, canonical, and original
db2-xdb:expressionSpecifies an expression that is to be applied to the data before insertion into the target table.
db2-xdb:locationPathFilters based on the XML context. For example, if it is a customer address then shred to the cust table; if it is an employee address then shred to the employee table.
db2-xdb:conditionSpecifies value conditions so that data is inserted into a target table only if all conditions are true.
db2-xdb:rowSetMappingEnables users to specify multiple mappings, to the same or different tables, for an element or attribute.
db2-xdb:tableMaps multiple elements or attributes to a single column.
db2-xdb:orderSpecifies the insertion order of rows among multiple tables.
db2-xdb:rowSetOperationOrderGroups together multiple db2-xdb:order annotations.

To demonstrate annotated schema decomposition we use the shredding scenario in Figure 11.1 as an example. Assume that the target tables have been defined as shown in Figure 11.1. An annotated schema that defines the desired mapping is provided in Figure 11.8. Let's look at the lines that are highlighted in bold font. The first bold line declares the namespace prefix db2-xdb, which is used throughout the schema to distinguish DB2-specific annotations from regular XML Schema tags. The first use of this prefix is in the annotation db2-xdb:defaultSQLSchema, which defines the relational schema of the target tables. The next annotation occurs in the definition of the element name. The two annotation attributes db2-xdb:rowSet="ADDRESS" and db2-xdb:column="NAME" define the target table and column for the name element. Similarly, the street and city elements are also mapped to respective columns of the ADDRESS table. The next two annotations map the phone number and the type attribute to columns in the PHONES table. The last block of annotations belongs to the XML Schema definition of the Cid attribute. Since the Cid attribute value becomes the join key between the ADDRESS and the PHONE table, it has to be mapped to both tables. Two row set mappings are necessary, which requires the use of annotation elements instead of annotation attributes. The first db2-xdb:rowSetMapping maps the Cid attribute to the CID column in the ADDRESS table. The second db2-xdb:rowSet Mapping assigns the Cid attribute to the CID column in the PHONES table.

Figure 11.8 Annotated schema to implement the shredding in Figure 11.1
Example of annotated schema to implement the shredding

11.3.2 Defining Schema Annotations Visually in IBM Data Studio
You can add annotations to an XML Schema manually, using any text editor or XML Schema editor. Alternatively, you can use the Annotated XSD Mapping Editor in IBM Data Studio Developer. To invoke the editor, right-click on an XML Schema name and select Open With, Annotated XSD Mapping Editor. A screenshot of the mapping editor is shown in Figure 11.9. The left side of the editor shows the hierarchical document structure defined by the XML Schema (Source). The right side shows the tables and columns of the relational target schema (Target). You can add mapping relationships by connecting source items with target columns. There is also a discover function to find probable relationships. Mapped relationships are represented in the mapping editor by lines drawn between source elements and target columns.

Figure 11.9 Annotated XSD Mapping Editor in Data Studio Developer
Example of annotated XSD Mapping Editor in Data Studio Developer

11.3.3 Registering an Annotated Schema
After you have created your annotated XML Schema you need to register it in the XML Schema Repository of the database. DB2's XML Schema Repository is described in detail in Chapter 16, Managing XML Schemas. For the annotated schema in Figure 11.8 it is sufficient to issue the REGISTER XMLSCHEMA command with its COMPLETE and ENABLE DECOMPOSITION options as shown in Figure 11.10. In this example the XML Schema is assumed to reside in the file /xml/myschemas/cust2.xsd. Upon registration it is assigned the SQL identifier db2admin. cust2xsd. This identifier can be used to reference the schema later. The COMPLETE option of the command indicates that there are no additional XML Schema documents to be added. The option ENABLE DECOMPOSITION indicates that this XML Schema can be used not only for document validation but also for shredding.

Figure 11.10 Registering an annotated XML schema
Registering an annotated XML schema

Figure 11.11 shows that you can query the DB2 catalog view syscat.xsrobjects to determine whether a registered schema is enabled for decomposition (Y) or not (N).

Figure 11.11 Checking the status of an annotated XML schema
Checking the status of an annotated XML schema

The DECOMPOSITION status of an annotated schema is automatically changed to X (inoperative) and shredding is disabled, if any of the target tables are dropped or a target column is altered. No warning is issued when that happens and subsequent attempts to use the schema for shredding fail. You can also use the following commands to disable and enable an annotated schema for shredding:

ALTER XSROBJECT cust2xsd DISABLE DECOMPOSITION;
ALTER XSROBJECT cust2xsd ENABLE DECOMPOSITION;

More on converting XML to relational data:



Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google



RELATED CONTENT
Data warehouse strategy
Advantages and disadvantages of XML shredding
How to shred XML with the DB2 XMLTABLE function
Examples of single and bulk XML shredding of XML documents
Definition of primary, super, foreign and candidate key in the DBMS
What are some emerging data warehouse and DBMS trends?
Improving ODBC application performance and coding
Selecting ODBC functions for optimized SQL statements
Guidelines for managing data updates to optimize ODBC performance
How to capture metadata information, ETL rules with CA Erwin Data Modeler
Designing for performance: Strategic database application deployments

Data modeling tools and techniques
Understanding five major enterprise information management benefits
Advantages and disadvantages of XML shredding
How to shred XML with the DB2 XMLTABLE function
Examples of single and bulk XML shredding of XML documents
Improving ODBC application performance and coding
How to capture metadata information, ETL rules with CA Erwin Data Modeler
Data Warehouse Platforms Product Directory
Data models serve as blueprint for business intelligence, master data management projects
Similarities and differences between ROLAP, MOLAP and HOLAP
Data modeling for the business: What is a data model?

Enterprise data architecture best practices
Advantages and disadvantages of XML shredding
How to shred XML with the DB2 XMLTABLE function
Examples of single and bulk XML shredding of XML documents
Teradata takes a logical approach to data warehousing appliances
What is the difference between a logical and physical warehouse design?
What are some emerging data warehouse and DBMS trends?
Teradata VP talks data warehouse appliances, reveals cloud and SSD plans
Selecting ODBC functions for optimized SQL statements
Guidelines for managing data updates to optimize ODBC performance
Data Warehouse Platforms Product Directory

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
data modeling  (SearchDataManagement.com)
predictive modeling  (SearchDataManagement.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary




Data Compliance Articles and Research: Data Privacy, Financial Data Management, Healthcare Data
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts