The following excerpt about XML basics is taken from IBM DB2 Version 9 New Features, and is published with permission from McGraw-Hill; copyright 2007. Read the chapter below to answer the question "What is XML?", or download a free .pdf of What is XML? to read later.
The basics: What is XML?
The Extensible Markup Language (XML) was born circa 1996. A "concept evolution" of previous markup languages, XML was created from a need to go beyond the simple markup of display properties to one that provided a data model for the business challenges introduced with technologies such as the World Wide Web (WWW), services-oriented architecture (SOA), and so on. Consider the following snippet of code from a Hypertext Markup Language (HTML) document:
<body> <h1>Books</h1> <p><ul> <li><i>A Pocket Guide to 200,000 Miles in a Year</i> <p><b>George Baklarz</b >ID=47 <b>Paul Zikopoulos</b> ID=58 <p>35</p> </li> </ul> <metadata>excessive traveling angry spouse</metadata> </body>
You can see the markup surrounding the information in this example does nothing other than tell an application (for instance, a Web browser) that can process this code how to display this data. HTML does nothing to describe the data, facilitate its interchange, and so on.
XML is a metadata language; it's designed to describe the data within the tags and the structural relationship between them. Think of it as a model by which you can dynamically and easily define your own markup language. Metadata is data that describes data, so you can think of XML as the metadata for your markup language. You can write your own data language based on XML, which provides an efficient mechanism to define, share, store, and even validate your data—that's a heck of a lot more than telling an application to display some text in boldface.
For example, suppose a language you create based on XML, called AUTHORXML, was used to describe the data in the previous example as such:
<book> <authors> <author id="47">George Baklarz</author> <author id="58">Paul Zikopoulos</author> </authors> <title>A Pocket Guide to 200,000 Miles in a Year</title> <price>35</price> <keywords> <keyword>excessive traveling</keyword> <keyword>angry spouse</keyword> </keywords> </book>
You can see that this same information in XML has become data; not just formatted text. Imagine an application interchange program that can parse this document and understand the names of the book authors and their titles. This sounds like such a simple capability, but it has radically changed the IT landscape—all because of XML.
XML provides a facility that allows you to exchange data among applications and systems without requiring changes to the application itself. And since this data-sharing ability is built on open standards, it means that you can reach across lines of business and value nets with minimal impediments.
Of course, the way the data looks to the end user is important, and you can use the related Extensible Stylesheet Language (XSL) technologies (translators, stylesheets, and so on) to shape the look of your data. Quite simply, while HTML stopped at the "glass" (in other words, at the desktop), XML leaps beyond this paradigm and into application enablement, data sharing, and more.
XML provides a paradigm that lets you define tags that describe the structure of your hierarchical data. Programmers like it because it's easy to use and flexible, and when you use XML to host data, it becomes easy to validate via another related standard named XML Schema Definition (hereafter referred to as XML Schema -- more on this in a bit), evolve, and share. You could summarize XML as a data model comprising nodes of several types linked through ordered parent/child relationships to form a hierarchy, or you could just call it a hierarchical data model.
Beyond the application of semantic awareness to the data within a tag, XML offers (as its name implies) extensibility. Flexibility is the key to XML—don't forget that fact when you're reading the remaining chapters in this part of the book. Using XML, you can easily evolve your data model to accommodate new data on the fly, in a minimal amount of time (try that with a relational schema). For example, many customers today have multiple phone numbers. Adding extra phone numbers to a customer document is simple in XML. In a relational database model, it could require a new table with foreign key relationships to maintain third normal form (3NF).
XML is an open standard. Published standards tell you how to create these documents and the facilities that accompany them. This provides a technology that is assuredly easy to adopt, and you'll be able to find and share skill sets and applications built on it.
XML technology is well known to developers, but not so well known to database administrators (DBAs). We encourage DBAs to spend time investigating XML technology because a lot of data is being stored this way, and as data storage professionals, sooner or later, some of this data will wind up under your control (or you should be pushing for it to be).
The purpose of this chapter isn't to make you an XML expert, but rather to help you understand the terminology that surrounds XML, which will be helpful in understanding the XML technology in DB2 9.
Components of an XML document
XML documents include various components and related technologies (not all of which are covered in this chapter):
- Declarations: For example,
<?xml version='1.0' encoding='UTF-8'?>
- Start and end tags: For example,
- Attributes: For example,
- Data: For example,
A Pocket Guide to 200,000 Miles in a Year
- Elements (nodes): For example,
<author id="58">Paul Zikopoulos</author>
- Comments: For example,
<!-- This is a comment -->
All XML documents start with an XML declaration that specifies the encoding scheme used so that an XML parser can read it, transpose it, and store it in Unicode. While an XML document can be encoded in any language, all XML parsers transform the XML data into Unicode. Other elements can be used in this declaration as well. For example, the standalone option can be used to declare that the XML document depends on an external file.
The term node is often associated with pieces of an XML document, and unfortunately, it's such an overloaded term that its use can get pretty confusing in the IT world. With respect to XML, you can use element and text nodes. At the bottom of the parsed XML representation in Figure 1-1 (in the next section) are leaf nodes that are considered text nodes (only elements have text nodes; attributes do not). Figure 1-1 shows both element nodes (<book>, <title>, and so on) and text nodes (A Pocket Guide to 200,000 Miles in a Year) that reside in an XML document.
You may have noticed that you could choose to use an attribute or an element to represent some of your data. For example, consider the following line from the XML code shown earlier:
<author id="47">George Baklarz</author>
This XML fragment could have been defined like so:
<author> <name>George Baklarz</name> <id>47</id> </author>
Debate surrounds the decision of which approach (element or attribute) is the best way to represent this data, but that's outside the scope of this chapter.
- Continue reading XML basics: What is XML??
- Read other excerpts from data management books in the chapter download library
This was first published in June 2007