XML Summer School 2012

Inasmuch as last year’s sessions were both enjoyable and well received, I’m teaching two sessions at XML Summer School again in September this year.

One, in the XSLT and XQuery track, is an update of the Developing and Testing in XSLT talk, again alongside Jeni Tennison, that got us such a good review last year:

Unit tests, pro­fil­ing, debug­ging and, increas­ingly, test-driven devel­op­ment are part of the bread and but­ter of work­ing with other pro­gram­ming lan­guages but are not always so with XSLT or XQuery. In test-driven devel­op­ment, which is a fun­da­mental part of agile approaches to soft­ware devel­op­ment, the developers write tests that describe the desired beha­viour of their applic­a­tion, then write code that meets the tests. This style of devel­op­ment keeps code focused, avoids break­ing exist­ing code and facil­it­ates refactoring.

In this ses­sion, Jeni Ten­nison and Tony Gra­ham will describe both the state of the art in test­ing and debug­ging XSLT and XQuery and how test-driven devel­op­ment applies to XSLT and XQuery devel­op­ment. In par­tic­u­lar, they will focus on the use of the XSpec test­ing framework.

The other, in the Publishing track, is XML and Publishing Workflows:

Some formats are bet­ter or worse than oth­ers for cap­tur­ing and/or rep­res­ent­ing the inform­a­tion for pub­lish­ing pur­poses. Can you cre­ate and man­age life-cycle work­flows which ration­al­ise or reg­u­lar­ise mixes of formats using XSLT and other XML tool­sets? Should XML be the begin­ning of your pub­lish­ing work­flow, the hub format in the middle, the res­ult, or all three? How can XSLT and related tools be used to cover up the defi­cien­cies or excesses of the source XML? What are the argu­ments for mov­ing authors towards sub­mit­ting in XML (or not)? For mov­ing editors?

Incor­por­at­ing both live examples and war stor­ies, Tony Gra­ham will lead an exam­in­a­tion of XML in pub­lish­ing work­flows, the advant­ages and dis­ad­vant­ages of using XML at each stage, and some of the tools and tech­niques avail­able to you.

XML Sum­mer School 2012 is on Septem­ber 16–21 2012 at St Edmund Hall, Oxford Uni­ver­sity.

Posted in XML

XML Calabash Ant task

Inasmuch as I’d been threatening since the XML Summer School last year to do it, I’ve made a custom Ant task for running XML Calabash, currently only in my fork at git@github.com:MenteaXML/xmlcalabash1.git.

You can use this task to process:

  • A single input file to produce a single output file
  • A set of input files, processed one at a time, to produce a set of output files
  • Multiple input files as the input to one XProc input port processed to produce a single output file
  • Any of the above with additional input ports to each of which are applied one or more input files whose file names may be either fixed or mapped from the name(s) of the current main input file(s)
  • Any of the above with additional output ports whose file names may be either fixed or mapped from the name(s) of the current main input file(s)
  • Any of the above with Ant defaulting to not running the pipeline when the outputs are already up-to-date compared to the inputs and the pipeline

You can also specify options and parameters to be used by the pipeline. Continue reading

“that error-handling be completely deterministic”

Inasmuch as, back in January, I was teaching another XML course, I reviewed the basis for draconian error handling in XML in light of the sea change in recent years towards HTML5-style completely-defined error recovery.

At the time of the draconian error handling decision, I was on the larger “W3C SGML Working Group” mailing list that provided input, clamour, and distraction to the core “W3C SGML Editorial Review Board” that did the work and made the decisions on the road to XML.  I followed the discussions on the mailing list at the time (as much as humanly possible), and the message about this that stuck in my mind is the “ERB votes on error handling” message from Tim Bray on behalf of the ERB, particularly this section:

2. We have a strong political reality to deal with here in that for the first time, the big browser manufacturers have noticed XML and have together made a strong request: that error-handling be completely deterministic, and that browsers not compete on the basis of excellence in handling mangled documents.  It was observed that if they wanted to do this, they could just do it; but then pointed out that this is exactly why standards exist – to codify the desired practices shared between competitors.  In any case, if we want XML to succeed on the Web, it will be difficult to throw the first serious request from M & N back in their face.

Continue reading

Posted in XML

Flymake for RELAX NG compact syntax

Inasmuch as the Wisent parsing and other CEDET/Speedbar/Semantic goodness for RELAX NG compact syntax files that I’m currently working on may not be ready for prime time for a while, here’s something to add to your .emacs so `flymake' runs Jing in the background to find syntax errors in your RELAX NG compact syntax files:

(require 'flymake)
(defun flymake-rnc-init ()
  (let* ((temp-file (flymake-init-create-temp-buffer-copy
		     'flymake-create-temp-inplace))
     	 (local-file (file-relative-name
		      temp-file
		      (file-name-directory buffer-file-name))))
    (list "jing" (list "-c" local-file))))
(add-to-list 'flymake-allowed-file-name-masks
      '(".+\\.rnc$"
	flymake-rnc-init
	flymake-simple-cleanup
	flymake-get-real-file-name))
(add-hook 'rnc-mode-hook
	  'flymake-mode)

Schematron Testing Framework poster

Inasmuch as it exists as a PDF file, you, too, can have your own copy of my “Schematron Testing Framework” (stf) poster from XML Prague 2012.  I’m happy to say that I received constructive comments about stf from people at XML Prague 2012 who read the poster, and I’ll be looking at incorporating the feedback in the near future.

One suggestion, from George Bina, was to make a single “framework” file for running the tests – and including the test files in the framework file either directly or by using XInclude to refer to external test files – rather than the current decentralised approach.  A single framework file would make it easier to make a report of the results, unlike the the current approach where the idea is that the only report you really want to see is “<errors/>” when there are no more errors.  A single framework file could also become very large and hard to navigate when there’s lots of very similar tests in it.  What do you think?

Schematron Testing Framework

Inasmuch as a suite of Schematron tests contains many contexts where a bug in a document will make a Schematron assert fail or a report succeed, it follows that for any new test suite and any reasonably sized but buggy document set, there will straight away be many assert and report messages produced by the tests.  When that happens, how can you be sure your Schematron tests all worked as expected?  How can you separate the expected results from the unexpected?  What’s needed is a way to characterise the Schematron tests before you start as reporting only what they should, no more, and no less.

stf (https://github.com/MenteaXML/stf) is a XProc pipeline that runs a Schematron test suite on test documents (that you create) and winnows out the expected results and report just the unexpected.  stf uses a processing instruction (PI) in each of a set of (typically, small) test documents to indicate the test’s expected asserts and reports: the expected results are ignored, and all you see is what’s extra or missing.  And when you have no more unexpected results from your test documents, you’re ready to use the Schematron on your real documents. Continue reading

CSS for the hierarchically minded

Inasmuch as both HTML and XML markup – being descended from SGML – support a nested, hierarchical structure and as CSS allows, even promotes, a stream-of-consciousness style of coding, there can be a tension between the two approaches.

To put it another way:

  • If you want different styles in a couple of contexts that depend on the type of several levels of ancestor, then you get to put all those ancestors in the CSS selectors for each of those styles;
  • If you want to use the same colour in multiple different styles, then, by golly, you get to enter the same color value in each of them (and if you want to change it later, you get to find them all again to do it); and
  • If you want to use the same set of styles in multiple contexts – say, use rounded corners multiple places and with bigger radii on the outermost corners – then you get to repeat the same set of styles while jiggering their values every place that you want them.

The CSS soon gets to the point that only a machine can reliably work out the cascading and so we require tools such as Firebug to make sense of it and present it to us in ways that we can understand.

I have previously implemented a system for a client where the template CSS file is wrapped in an XML element and contains empty elements for each of the values of color properties so the all-XML processing system can ‘skin’ the stylesheet by substituting the preferred color values and outputting proper CSS on the way to making the HTML, but that was adding complexity, not taking it away.

Enter LESS (http://www.lesscss.org/), the “dynamic stylesheet language”. LESS is pretty much CSS as it should have been, since it elegantly solves the gripes listed above, and more besides. Continue reading

XML Summer School 2011 ends on high note

Inasmuch as my final “XSLT and XSL-FO toolbox of tips and tricks” session was well received, XML Summer School 2011 finished on a high note. My other sessions, “Developing and Testing in XSLT” with Jeni Tennison in the “XSLT/XQuery” track and a five-minute Ignite-format talk on EPUB, also went well, but it was that final talk in the “Publishing” track that got the most visible reactions. Continue reading