@lang in xsl:sort in XSLT 1.0 and XSLT 2.0

I recently tripped over a difference between XSLT 1.0 and XSLT 2.0 that is not currently listed in Appendix J, Changes from XSLT 1.0 (Non-Normative), of the XSLT 2.0 spec.

In XSLT 1.0, when @lang is not specified on xsl:sort, its value should be determined from the system environment (my emphasis). In practice, it seems that most xsl:sort in XSLT 1.0 stylesheets are written without a @lang and people are used to XSLT processors “doing the right thing” when sorting text.

In XSLT 2.0, “if none of the collation, lang or case-order attributes is present, the collation is chosen in an implementation-defined way.” This is more honest than XSLT 1.0’s “should” (as was pointed out to me by Liam Quin), since “implementation-defined” really does have to be defined. However, that does mean the processor is not required to consider language when @lang is omitted.

Despite the three paragraphs written so far, I believe the XSLT 2.0 behaviour is adequate, and if you’re writing an XSLT 2.0 stylesheet, you’re better off specifying a collation (implementation-specific as it unfortunately currently must be) than either omitting attributes or specifying a single language and trusting that your processor both agrees with you on the minutiae of sorting punctuation, etc., and gets right the sorting of the other languages that you weren’t able to specify.

However, where I tripped up (minor as it was) was when using an XSLT 1.0 stylesheet with an XSLT 2.0 processor. I was using an XSLT 2.0 processor because I’d added an XSLT 2.0 stylesheet to an existing sequence of XSLT 1.0 stylesheets (since the task, and the new stylesheet, was made much simpler with regex). The new stylesheet sequence had worked correctly in my tests, but results were in an unexpected order when my client tried the sequence on some new input, and could I please fix it?

Everybody’s assumption was that it was a problem with the new stylesheet, but happily it checked out okay. The cause was a long-forgotten xsl:sort in my client’s existing stylesheets:

<xsl:sort select=”@name” order=”ascending” data-type=”text”/>

The previous XSLT 1.0 processor had been sorting according to the implicit language setting, and the XSLT 2.0 processor was sorting on Unicode codepoint, just as it said in the release notes. This hadn’t shown up in my tests because my data had been all in lowercase (which was the norm).

The lessons are for stylesheet writers to specify @lang if there’s any possibility that their XSLT 1.0 stylesheet will be run by an XSLT 2.0 processor and for me to use more complex test data with my client’s system, which I am doing.

tdtd available from svn.menteith.com

The source code for the tdtd Emacs code for DTDs is now available for checkout from https://svn.menteith.com/trunk/tdtd and for browsing from http://www.menteith.com/browser/trunk/tdtd.

I didn’t think it would happen straight after setting up the Trac, but I’d wanted to use tdtd on a new (virtual) PC, and it was preferable to put the code into the repository and check it out onto the new PC than to scp a copy onto the new PC and risk the copy becoming out of sync w.r.t. the other copies floating around on my computers. Of course, now those other copies have to be updated to use the Subversion version instead of either the RCS or the Ubuntu versions.

The website and repository for tdtd are provided by Menteith Consulting Ltd.

Changed webhosts

After quite a few years with Hostway, I’ve changed over to WebFaction.

The good news is that the menteith.com website is now a Trac so I can host the tdtd code in public for the first time (and possibly also host the xslide code here), plus I expect better spam filtering and the blog software is more fully featured.

The bad news is that URLs to existing pages have had to change because I’m using a Trac and that I couldn’t properly export my blog entries from the previous benighted blog software, so blog entry URLs also changed and I lost the few non-spam comments that the entries had received.