Windows drive names with Cygwin xsltproc & xmllint

Cygwin may be the only way to stay sane while using Windows, but it has its own Unix-like notion for drive names, e.g., “/cygdrive/c/” instead of “c:“. Which is fine, except when you want to use both Java XML tools, which understand only the “c:” form, and Cygwin tools, which tend to understand only the “/cygdrive/c/” form.

The Cygwin xsltproc and xmllint complain when you use them with files containing Windows drive names in system identifiers, so the second time it happened, I wrote a simple XML catalog file to map the Windows drive names to the Cygwin paths.

Put this as the contents of /etc/xml/catalog (not catalog.xml!) and the Cygwin xsltproc, etc., will handle Windows drive names:

<?xml version="1.0"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<rewriteSystem
systemIdStartString="file:///C:/"
rewritePrefix="file:///cygdrive/c/"/>
</catalog>

You will have to add a suitable rewriteSystem for each additional drive that you use.

Signs of the long-term Tokyo resident

In a week in Tokyo, I observed these signs that a foreigner may be a long-term resident:

  • Carries an umbrella on the same days as everyone else
  • Uses a “Suica” smartcard instead of individual tickets on JR trains
  • Carries a flip-style phone (though may not spend entire train trips sending and receiving messages, as many Japanese do)
    • Extra points are awarded for having trinkets dangling from the phone

My camera’s battery is flat, there’s nothing for me here

In Tokyo for the W3C Japanese Layout Taskforce meeting, I went to the first day of the Sumo basho with a friend. Firstly, I was amazed by the number of foreigners in the audience (as opposed to the number of foreigners competing), and secondly I was flabbergasted when one North American told his friends just as the top division was getting started that he was going to leave because his camera battery was going flat.

Whatever happened to experiencing something for its own sake? Is the experience only real if you can take photos of it? Is it only real if you can put photos of it on your social networking site? What about just staying with your friends while you all watch something unique to the country you are visiting and that you may never see again?

Code Complete, Second Edition

The only reason that I am not overawed by the depth of knowledge and sound advice in Code Complete, Second Edition (Steve McConnell, ISBN: 0-7356-1967-0) is that I bought the second edition because the first edition was so good.

Buying maple syrup in the fishmongers

In Montréal for Extreme Markup 2007, I went to Marché Atwater to buy some maple syrup. When I wanted to know the weight of a pack of 8 cans of syrup, the syrup seller took me and the 8-pack next door to the fishmongers, and the fishmonger put the cans on her scale. She even set the price to $2.20/kg so I could see the weight in pounds if I’d wanted.

Since it was at that point I agreed to make the purchase, I’d say I did buy the maple syrup in the fishmongers.

The curse of a good bug reporting system

A good bug reporting system, by being good, can make a project look bad.

In five-or-so years on SourceForge, xmlroff garnered 24 bug reports. In the couple of months since moving everything to xmlroff.org, xmlroff has already amassed over 60 Trac tickets.

It may look as if xmlroff is suddenly much buggier, but it’s due to finally having a bug reporting system that’s easy to use.

Because it’s easier to use, we use it more. There’s been tickets for moving to xmlroff.org and for pie-in-the-sky ideas like a Texinfo-XML-to-FO stylesheet as well as for common or garden bugs. Since it’s also easy to link to bug reports, there’s now more ticket numbers in the notes on test results and in commit messages.

The proliferating tickets and ticket references point to quality improving, not worsening. After all, we’ve also closed more tickets than xmlroff had bug reports while on SourceForge.

First xslide update in years

I recently checked into CVS the addition of a sub-menu for locating xsl:function elements. It is, as the title notes, the first update of the xslide XSL mode for Emacs in literally years.

xslide itself may be a dead end since its based on neither nXML nor Semantic, so it doesn’t quite work the same as the other modes that people use (nor does it matter that it predates nXML). I still use xslide (which is why I updated it), and I even copied it to make an Emacs mode for Ant build files, but it would be more useful if it could indicate when the XSLT stylesheet is not well-formed. Then again, perhaps that problem could be solved with Flymake mode and an XML parser.

@lang in xsl:sort in XSLT 1.0 and XSLT 2.0

I recently tripped over a difference between XSLT 1.0 and XSLT 2.0 that is not currently listed in Appendix J, Changes from XSLT 1.0 (Non-Normative), of the XSLT 2.0 spec.

In XSLT 1.0, when @lang is not specified on xsl:sort, its value should be determined from the system environment (my emphasis). In practice, it seems that most xsl:sort in XSLT 1.0 stylesheets are written without a @lang and people are used to XSLT processors “doing the right thing” when sorting text.

In XSLT 2.0, “if none of the collation, lang or case-order attributes is present, the collation is chosen in an implementation-defined way.” This is more honest than XSLT 1.0’s “should” (as was pointed out to me by Liam Quin), since “implementation-defined” really does have to be defined. However, that does mean the processor is not required to consider language when @lang is omitted.

Despite the three paragraphs written so far, I believe the XSLT 2.0 behaviour is adequate, and if you’re writing an XSLT 2.0 stylesheet, you’re better off specifying a collation (implementation-specific as it unfortunately currently must be) than either omitting attributes or specifying a single language and trusting that your processor both agrees with you on the minutiae of sorting punctuation, etc., and gets right the sorting of the other languages that you weren’t able to specify.

However, where I tripped up (minor as it was) was when using an XSLT 1.0 stylesheet with an XSLT 2.0 processor. I was using an XSLT 2.0 processor because I’d added an XSLT 2.0 stylesheet to an existing sequence of XSLT 1.0 stylesheets (since the task, and the new stylesheet, was made much simpler with regex). The new stylesheet sequence had worked correctly in my tests, but results were in an unexpected order when my client tried the sequence on some new input, and could I please fix it?

Everybody’s assumption was that it was a problem with the new stylesheet, but happily it checked out okay. The cause was a long-forgotten xsl:sort in my client’s existing stylesheets:

<xsl:sort select=”@name” order=”ascending” data-type=”text”/>

The previous XSLT 1.0 processor had been sorting according to the implicit language setting, and the XSLT 2.0 processor was sorting on Unicode codepoint, just as it said in the release notes. This hadn’t shown up in my tests because my data had been all in lowercase (which was the norm).

The lessons are for stylesheet writers to specify @lang if there’s any possibility that their XSLT 1.0 stylesheet will be run by an XSLT 2.0 processor and for me to use more complex test data with my client’s system, which I am doing.

tdtd available from svn.menteith.com

The source code for the tdtd Emacs code for DTDs is now available for checkout from https://svn.menteith.com/trunk/tdtd and for browsing from http://www.menteith.com/browser/trunk/tdtd.

I didn’t think it would happen straight after setting up the Trac, but I’d wanted to use tdtd on a new (virtual) PC, and it was preferable to put the code into the repository and check it out onto the new PC than to scp a copy onto the new PC and risk the copy becoming out of sync w.r.t. the other copies floating around on my computers. Of course, now those other copies have to be updated to use the Subversion version instead of either the RCS or the Ubuntu versions.

The website and repository for tdtd are provided by Menteith Consulting Ltd.