Sergey Nivens - Fotolia

JSON format coexists with XML in association's data strategy

The JSON data-interchange format has increasingly found a home in Web applications. But XML is keeping its place in a publishing system at the American Psychological Association.

As Web and big data applications have flourished, the JSON format has gained wider use for handling data interchanges between different systems or Web servers and browsers. But there's still life in JSON's sometimes maligned predecessor, XML.

As an example, while JSON (short for JavaScript Object Notation) has become an important data format at a leading association for psychology professionals, educators and students, extensible markup language (XML) continues to play a big role there for specialized data integration needs, according to IT architect and semantic data veteran Beverly Jamison.

"XML is a used a lot in [data] interchange in the academic publishing world. And our business-to-business interactions still involve a lot of XML," said Jamison, who was senior director of IT architecture and publishing solutions at the American Psychological Association (APA) until the end of 2015. She left the Washington, D.C.-based organization at that point to become an independent IT consultant.

In an interview while she was still at the APA, Jamison said that over the past few years, JSON increasingly has helped the professional association deliver information to its almost 80,000 members more rapidly via the Web.

"Our outward-facing [user interfaces] now want to see JSON, as often as not," said Jamison, who managed the evolution of an academic publishing system that currently holds over 160,000 journal articles and 3 million abstracts -- including some citation references that go back more than 100 years.

When worlds collide on data formats

The JSON format allows developers a fair degree of independence in their designs, as it keeps data descriptions simple, and its upfront schema requirements are minimal. JSON came to the fore, at least in part, as an alternative to XML. Still, some academic documents benefit greatly from XML's markup capabilities, which bring more structure and stronger data definition to document elements. That makes XML a natural interchange mechanism for some of the APA's needs, Jamison said.

In addition, the two data formats aren't mutually exclusive in the APA's environment.  In some cases, JSON is used to to transport nested XML payloads that, in Jamison's words, "are unwrapped at the other end and resume their lives as XML [documents]."

And APIs written by Jamison and her team enable the publishing system to request items in either JSON or XML as needed for particular uses.

"What we like best about our content system is that it speaks both XML and JavaScript," she said. "It's the best of both worlds. The content only exists one time as a structure, but it can manifest itself as either XML or Java."

Software vendor MarkLogic's namesake database serves as the development and runtime platform for the APA's publishing system, according to Jamison. That started in 2008, when the association was well along in a migration from earlier data standards to wider use of XML.

The MarkLogic software was first developed more than a decade ago as a database for XML documents. But in order to meet the requirements of new data architectures, MarkLogic has added features beyond XML support to what is now categorized as a NoSQL database. In 2014, for example, it rolled out native JSON capabilities as part of a new MarkLogic 8 release.

Joe Pasqua,senior vice president of product strategy at MarkLogic, said that both JSON and XML are widely used by programmers, but for different things. JSON gets used in Web APIs and is good at representing programming objects; to him, it isn't as good at representing marked-up documents as XML is.

Going for a triple with semantic data

In August 2015, MarkLogic added enhanced semantic data processing support to version 8. As a result, the database now works with the Apache Jena and Eclipse Sesame semantic APIs. Jena and Sesame are both designed to enable developers working with Resource Description Framework (RDF) graph-style data structures that employ triple-store formats to better convey the relationships between different data elements.

Triples depict data relationships using a subject-predicate-object representation; standalone RDF databases, a variant of graph database technology, have been developed to store triples, but MarkLogic also provides a native triple store as part of its database. That technology, along with the SPARQL semantic query language, allows APA teams to work within the MarkLogic environment to create and manage data graphs.

"The APA's metadata regarding research is very much about relationships. Triple stores seem a good way to capture and compute these relationships," Jamison said.

Going forward, the APA likely will continue to exploit a varied assortment of data formats, including the JSON format, XML, RDF and others.  For the APA throughout, Jamison said, "the important thing has been to set up a very modular pipeline for that dataflow."

Next Steps

Learn more about using JSON APIs

Find out how semantic data is being deployed at a major bank

Jump into a semantic data lake for medical information

Dig Deeper on Enterprise data integration (EDI) software