@lang in xsl:sort in XSLT 1.0 and XSLT 2.0

I recently tripped over a difference between XSLT 1.0 and XSLT 2.0 that is not currently listed in Appendix J, Changes from XSLT 1.0 (Non-Normative), of the XSLT 2.0 spec.

In XSLT 1.0, when @lang is not specified on xsl:sort, its value should be determined from the system environment (my emphasis). In practice, it seems that most xsl:sort in XSLT 1.0 stylesheets are written without a @lang and people are used to XSLT processors “doing the right thing” when sorting text.

In XSLT 2.0, “if none of the collation, lang or case-order attributes is present, the collation is chosen in an implementation-defined way.” This is more honest than XSLT 1.0’s “should” (as was pointed out to me by Liam Quin), since “implementation-defined” really does have to be defined. However, that does mean the processor is not required to consider language when @lang is omitted.

Despite the three paragraphs written so far, I believe the XSLT 2.0 behaviour is adequate, and if you’re writing an XSLT 2.0 stylesheet, you’re better off specifying a collation (implementation-specific as it unfortunately currently must be) than either omitting attributes or specifying a single language and trusting that your processor both agrees with you on the minutiae of sorting punctuation, etc., and gets right the sorting of the other languages that you weren’t able to specify.

However, where I tripped up (minor as it was) was when using an XSLT 1.0 stylesheet with an XSLT 2.0 processor. I was using an XSLT 2.0 processor because I’d added an XSLT 2.0 stylesheet to an existing sequence of XSLT 1.0 stylesheets (since the task, and the new stylesheet, was made much simpler with regex). The new stylesheet sequence had worked correctly in my tests, but results were in an unexpected order when my client tried the sequence on some new input, and could I please fix it?

Everybody’s assumption was that it was a problem with the new stylesheet, but happily it checked out okay. The cause was a long-forgotten xsl:sort in my client’s existing stylesheets:

<xsl:sort select=”@name” order=”ascending” data-type=”text”/>

The previous XSLT 1.0 processor had been sorting according to the implicit language setting, and the XSLT 2.0 processor was sorting on Unicode codepoint, just as it said in the release notes. This hadn’t shown up in my tests because my data had been all in lowercase (which was the norm).

The lessons are for stylesheet writers to specify @lang if there’s any possibility that their XSLT 1.0 stylesheet will be run by an XSLT 2.0 processor and for me to use more complex test data with my client’s system, which I am doing.