“that error-handling be completely deterministic”

Inasmuch as, back in January, I was teaching another XML course, I reviewed the basis for draconian error handling in XML in light of the sea change in recent years towards HTML5-style completely-defined error recovery.

At the time of the draconian error handling decision, I was on the larger “W3C SGML Working Group” mailing list that provided input, clamour, and distraction to the core “W3C SGML Editorial Review Board” that did the work and made the decisions on the road to XML. I followed the discussions on the mailing list at the time (as much as humanly possible), and the message about this that stuck in my mind is the “ERB votes on error handling” message from Tim Bray on behalf of the ERB, particularly this section:

2. We have a strong political reality to deal with here in that for the first time, the big browser manufacturers have noticed XML and have together made a strong request: that error-handling be completely deterministic, and that browsers not compete on the basis of excellence in handling mangled documents. It was observed that if they wanted to do this, they could just do it; but then pointed out that this is exactly why standards exist – to codify the desired practices shared between competitors. In any case, if we want XML to succeed on the Web, it will be difficult to throw the first serious request from M & N back in their face.

The connection that I, and possibly many others, at the time saw between this summary, at twice remove, of browser vendors asking “that error-handling be completely deterministic” and XML using draconian error handling may not have been much of a connection at all. Okay, if you apply hindsight as of now, there never was any connection, but if you were/are of the mindset that:

  • Well-formed XML doesn’t come with rules about what elements, etc., are allowed and not-allowed in different places so in the general case you can’t “just fix” a broken document so it now makes sense;
  • There are (or, in 1997, were to be) XML applications where “keep calm and carry on” after receiving a document with an error isn’t an option; and
  • The design goal of XML that “The number of optional features in XML is to be kept to the absolute minimum, ideally zero.” rules out including optional error recovery in the XML spec

then draconian error handling looks like the best/only alternative for XML. And binary yes/no error handling certainly is completely deterministic even if it may not have been exactly what the browser vendors had in mind at the time.

The question whether draconian error handling is a good idea still comes up today. (In XML circles, anyway; in HTML circles, they’ve voted with their feet/keyboards and gone for the almost too completely defined error handling of HTML 5.) It came up again at XML Prague 2012, where the circles overlapped briefly when Anne van Kesteren presented “What XML can learn from HTML; also known as XML5” and, as Eric van der Vlist notes in his blog, in the panel discussion that followed. The question still has no universally accepted answer: for instance, Eric in his post describes as irrational that others wouldn’t even discuss changing well-formedness requirements, and these two successive posts to the XML-ER list, by David Carlisle and David Lee, take diametrically opposite views about error recovery.

IMO, it’s too late to change the definition of well-formedness for XML 1.0, and the inroads that XML 1.1 hasn’t made illustrates the likely success of anything labelled XML 1.2 or even XML 2.0. Enter the XML Error Recovery Community Group (XML-ER). Set up in record time mid-conference at XML Prague 2012, its goal is to define a deterministic error-recovery mechanism for XML such that parsing well-formed XML 1.0 with a XML-ER processor (if that’s what it’s eventually called) produces the same result as parsing with a XML 1.0 processor and parsing non-well-formed XML (which, strictly speaking, by not being well-formed isn’t XML at all) produces what looks like the result of parsing well-formed XML with a XML 1.0 processor, where the fix-ups from non-well-formed to well-formed will be the same in every XML-ER processor.

Where XML-ER finds it’s place in the XML stack is a bit up in the air at present – since few people would want to parse with a XML 1.0 processor then fall back to a XML-ER processor only if it failed yet there are aspects of XML 1.0 parsing, such as reading and using any DTD, that won’t be handled by a XML-ER processor – but it is the current best effort for moving to deterministic error-handling that is more than a binary yes/no.