Thursday, May 13, 2004

XML: Better Than the Hype

PDF Printable Version (PDF)

In all the time I've been in and around software only one technology I have applied has consistently and repeatedly delivered more than it seemed to promise. That technology is XML. Moreover, I don't seem to be alone in finding that I get more benefits from XML than seems reasonable. This 'overdelivery' is sufficiently surprising and unusual that I've spent quite a lot of time trying to understand its cause.

The Content, the Whole Content and Nothing But the Content

One statement often made about XML is that it separates information about content (meaning, semanics) from form. This is frequently contrasted with HTML, in which the mark-up contains information about formatting intention and (particularly in the case of div- and span-heavy XHTML) document structure.

While this is true, it seems to miss the key point, which I would summarize as follows:

In designing a good XML format we seek to describe whatever objects are to be represented fully and naturally, without undue regard to immediate usage.

The point here is that when we set out to represent something in XML, we are usually motivated by a specific use that we have in mind for the data. For example, perhaps we wish to print some bus timetables, and this is motivating us to devise an XML representation for timetable data. The key is that we don't focus narrowly on the precise data that we expect to print in the timetable, still less on how we are going to break it up and lay it out. Rather, we try to find a clear, natural representation of the underlying timetable data in XML, and to ensure that we make this representation as rich and complete as is reasonable.

Repurposing

The benefit that we get from adopting this approach is that when we want to make carry out some other task, there is a very good chance that we will be able to exploit exactly the same XML. This might be some other layout-oriented task (such as displaying the bus timetables on the web, or producing a pocket timetable), or something quite different such as building an online query service, calculating travel times, or journey planning.

In practice, I think it is this 'repurposability' of well-designed XML that is responsible for its repeatedly over-delivering. First, I get the project I actually wanted done cleanly, and then a few weeks, months or years later, I decide I need to exploit the same data differently, and this is very easy.

Paper, Scissors, Stone, XML

There are obviously lots of other virtues of XML that contribute to its 'overdelivery'. Commonly sited examples are its internationalization, the ever growing sea of high-quality, open-source XML processing software, and the rapidly growing XML-ization of the world; going with the flow has its rewards. But I think there are two other factors that are less widely appreciated, and perhaps more important.

The first of these is a sort of 'moral high ground'. Basically, if I need to exchange data with someone and I say my format's XML and theirs isn't, I win: almost everyone will take XML, because even if they don't have current capability to handle it, they feel they should have. And if we both have XML, we both win, because even though the formats are unlikely to be the same, XSLT usually makes conversion between different formats extremely easy (subject to their having similar expressiveness and concepts).

The way I think of this is in relation to Paper, Scissors, Stone. It's as if XML has changed the rules, so they now read

paper wraps stone
stone blunts scissors
scissors cut paper
XML annihilates paper, scissors and stone.

Cruel to be Kind

Finally, I think the other reason XML delivers such big payoffs is down to Tim Bray's success in inisisting that all XML parsers reject non-well-formed XML. The consequence is that I hardly ever encounter a badly formed document, and when I doas Tim himself explains over at ongoing ("XML Supports Constructive Finger-Pointing")the guilty party always says mea culpa and fixes it. I know there are people who believe draconian error handling is unfriendly, but XML appears to me the closest thing we're ever likely to get to proof that they're wrong. Sometimes, you really do have to be cruel to be kind.