News Stay informed about the latest enterprise technology news and product updates.

Shifting from a data model to XML messaging

This is the third article in a series that explores the use of XML to develop data standards in the insurance arena. This series addresses how to design XML messages and various approaches for maximizing implementation of those standards.

This article originally appeared on the BeyeNETWORK.

In my previous article, I discussed that in order to develop an XML message that can be maintained over time, you must first focus on modeling the information and then focus on the design of the message. In this article, we’ll look into how to make the transition from the data model to a message.

I’ve seen this done in a variety of ways, each with varying degrees of success. At the most basic level, you need to develop a series of principles – that can be applied consistently – to make the shift from a model to an XML message.

Attributes versus Elements
For instance, you have to decide what gets put into an XML stream as an XML element versus an attribute. I saw one standard that called all their data items attributes, and thus ported it into XML that way.

Another company was completely attribute averse, and thus made a rule never to use them.

A third company chose to use attributes for XML references and code values, but nowhere else.

The most important message here is to pick a rule and make a decision.

One of the standards I worked with failed at this basic level. It created reusable data entities but fell short of determining when to use an attribute versus an element. As a result, in some cases, the entity type (e.g., Type of Party) was put in as an attribute.

However, in some cases (e.g., Type of Address), the entity type was put in as an element.

This just makes the standard more convoluted. Whatever the decision is, apply it consistently throughout.

Element Naming
Another basic decision point is if and how to reuse the data element name defined in the data model. Will the element names be all upper case or mixed case (mixed case, I hope!)? Do you take the data model name and use it exactly as defined as an element name? Do you accept abbreviations as part of the XML element name? If so, which ones are acceptable? Once again, consistency is the key.

When to use an element versus an attribute, upper case versus mixed case or abbreviations are decision points that are the most simplistic in concept.

The Dangers of Inheritance
Some other decisions, however, are much trickier to make and, if not fully understood, the ramifications can be profound.

In all cases, I always err on the side of making the XML structure as simple to use as possible. This is a trade-off with your XML architecture gurus who want to adopt the really cool stuff, but the “cool” XML features often make the XML more difficult to comprehend and utilize.

This is most apparent when trying to utilize the inheritance features available in XML schemas to define specifically named aggregates. In two standards implementations I’ve witnessed in the insurance space, this concept is applied heavily.

One standard has utilized this ability by first creating abstract classes for every general concept. For instance, there may be an abstract class called “Party.” They then define concrete classes that are derived from the abstract type. For instance, while my schema has a specific class called “Party,” “Party” will never appear in a data stream as such because it is an abstract class. In order for it to appear in the data stream, I need a concrete class that inherits from the base class of Party. In this standard, for instance, the role of the party becomes the aggregate name. Thus, I have concrete classes called “Insured” or “Producer.”

At first glance, I really liked this. The role of the party is very clear, and I can use a tag name instead of a typecode to process this aggregate. The roles also become part of each individual message, making it obvious what roles apply to each message type. The processing seemed much clearer. Yet, something was nagging at me about this approach. I realized that the problem was the number of possibilities that one would have to accommodate.

The group that went down this path kept the usage at a relatively high level. If I was looking for something called Producer, I would still have to look at the actual role to determine if it was the primary writing producer, servicing producer, the agent or the agency, etc. It didn’t save me from having to look at role typecodes. It only segregated those parties that are producers from those that are insureds.

If you look at the number of roles for insurance, you’ll realize there are an awful lot of aggregates that are parties. This standard, which is not yet complete, currently has more than 50 aggregates derived from the party aggregate. That number is limited by keeping the definitions at a general level. Thus, a standard program that, for instance, is trying to populate a party table in your back-end system cannot look for the thing called “Party.” Instead, it must look for more than 50 different aggregates in order to find the parties. Another insurance standard has well over 100 roles for a party, exaggerating the problem even more.

The other critical issue is that the insurance industry itself is not standardized. Consequently, the chance of needing a party role type that is different or more specific than is available today is pretty high. However, there’s no way to indicate a new role using the existing structures because there is no “code list” to use to add a new code. You have to build a custom schema to add your unique type and deal privately with every trading partner to accept your new data stream, rather than just adding a new code value to indicate the party you need to include.

This exact same concept applies to message type identification.

One standard went with specifically named aggregates that included the full message definitions:

When they ended up with more than 1,000 messages, they changed course. They instead adopted the inheritance pattern that allows abstract classes to be defined and then reused for concrete message types. The end result? They still have more than 1,000 aggregates for specifically defined message types, but they have ensured consistency and reduced the redundancy in definitions by creating classes that are inherited down to the message. This is a definite improvement over the previous design that caused a lot of inconsistencies across messages.

However, it still does not address the bigger problems. First, I still have to account for the possibility of more than 1,000 specifically defined messages coming my way, only to report an error that says, “I don’t support that particular message.” Second, reusing mappings and XPath syntax and parsing routines across messages becomes increasingly difficult because the root element is different in every case, even though the data content inside may be same. Third, and this is the biggest, I can’t add new types easily. To add a new type, I have to create a new schema that references the old, derive a new type and go from there.

Considerations for Designing Reusable Standards
For any insurance standard to truly be successful, it must support the known requirements today and make it easy for companies to expand the standard to support tomorrow’s requirements (as well as the proprietary needs that will never make it into an industry standard). This ability allows companies to pick a single architecture and use it consistently – both internally and externally.

One insurance standard has seen success in this space. One of the key reasons to which I attribute its success is how transactions are defined. Instead of a specifically named aggregate, they made a controversial choice and went with a type code.

To process, rather than reading a tag with the message name, you read the code value for the element TransType to determine the message. This allowed for several things. First, it’s a simple routine to say, if the TransType\@tc=“1”, “2” or “3,” we support it; otherwise respond with an error stating, ”I don’t support that particular message.” Second, I can reuse mappings and parsing subroutines easily because my hierarchy is always the same from the root element on down. Third, and this is the biggest, if I want to use the standard for a business process not yet defined, I simply add a new code value. I don’t have to create a new schema or jump through any other hoops. I just use a different code value. This allows me to use the standard for the transactions that are specifically defined and for all my unique internal proprietary purposes without having to do anything extra.

In general, when defining successful standards, K.I.S.S. always prevails. We must define for the lowest common denominator. As part of keeping it simple, reusability is key. The ability to reuse the standard for other purposes, in the simplest manner possible, should be a compelling driver behind the development.

Next month, we’ll continue in this vein and look at other decisions points in moving from a message-based standard to a model to achieve maximum use.

Dig Deeper on Data modeling tools and techniques

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.