Multi-stage XSLT in Ant

Inasmuch as Ant is good at re-running an XSLT transformation or a series of transformations when the XML source changes but not so good at re-running when one of the transformations’ stylesheets’ sub-modules changes, it’s a simple thing to generate on-the-fly a temporary build file containing paths listing the dependencies of each stylesheet so Ant can do the right thing.

The first requirement is knowing which top-level stylesheets you’ll be running. If you have an existing build file, you could write a stylesheet to find all the XSLT files referred to in all the xslt tasks. I tend to have a series of XSLT transformations that run one after the other as needed (I hesitate to call it a ‘pipeline’, since that often means XProc, though see XML Calabash Ant task if you want to run Calabash from Ant), which I specify as a property in an XML property file:

<entry key="stages">foo, bar, baz</entry>

The property file is in XML since that makes it easier to process and since representing non-ASCII (actually non-ISO-8859-1) characters is more straightforward in XML than in Java property files.

From the list of stages, you can find the corresponding XSLT files and generate a path for each by recursing through all their xsl:import and xsl:include to find the other modules on which each stylesheet depends:

<!-- Sequence of names of XSLT files (without '.xsl' or other
     extension) for 'process' target to run as stages. -->
<xsl:variable
    name="stages"
    select="tokenize(key('properties',
                         'stages',
                         $properties-xml-doc), ', *')"
    as="xs:string*" />

<xsl:variable
    name="all.stages"
    select="distinct-values($stages)"
    as="xs:string+" />

<xsl:comment>Paths for dependencies of XSLT files for stages.</xsl:comment>
<xsl:for-each select="$all.stages">
  <xsl:sort />
<xsl:comment>'<xsl:value-of select="."/>.xsl' and dependencies</xsl:comment>
<path id="stage.{.}.path">
  <pathelement location="${{xsl.dir}}/{.}.xsl" />
  <xsl:sequence select="m:dependencies(concat(., '.xsl'))" />
</path>
</xsl:for-each>
<xsl:comment> <xsl:value-of select="$xsl.dir" /> </xsl:comment>
</project>
</xsl:template>

<xsl:function name="m:dependencies">
  <xsl:param name="current" as="xs:string" />
  <xsl:variable
      name="current-doc"
      select="if (doc-available(concat('file://', $xsl.dir, '/', $current)))
                then document(concat('file://', $xsl.dir, '/', $current))
          else ()"
      as="document-node()?" />
  <xsl:for-each select="$current-doc/*/(xsl:import | xsl:include)">
    <xsl:text>&#xA;  </xsl:text>
    <pathelement location="${{xsl.dir}}/{@href}" />
    <xsl:sequence select="m:dependencies(@href)" />
  </xsl:for-each>
</xsl:function>

which produces:

<!-- Paths for dependencies of XSLT files for stages. -->

<!-- 'bar.xsl' and dependencies -->
<path id="stage.bar.path">
  <pathelement location="${xsl.dir}/bar.xsl"/>
  <pathelement location="${xsl.dir}/other.xsl"/>
</path>

<!-- 'baz.xsl' and dependencies -->
<path id="stage.baz.path">
  <pathelement location="${xsl.dir}/baz.xsl"/>
</path>

<!-- 'foo.xsl' and dependencies -->
<path id="stage.foo.path">
  <pathelement location="${xsl.dir}/foo.xsl"/>
  <pathelement location="${xsl.dir}/second.xsl"/>
  <pathelement location="${xsl.dir}/third.xsl"/>
</path>

The next part of the puzzle is being able to use these paths. From the list of stage names, you could generate a target for each and make each stage require its previous stage, but IMO it’s easier to generate one target that does one macro call for each stage:

<target name="process"><xsl:if test="exists($stages)">
<xsl:for-each select="$stages">
  <xsl:variable name="position" select="position()" as="xs:integer" />
  <stage name="{.}" number="{format-number($position, '00')}"
         previous="{if ($position = 1)
                      then '${source.dir}'
                    else concat('${stages.dir}/',
                                format-number($position - 1, '00'),
                                $stages[$position - 1])}"/></xsl:for-each>
  <copy todir="${{usx.dir}}">
    <fileset
        dir="${{stages.dir}}/{format-number(count($stages), '00')}{$stages[last()]}"
        includes="*.xml"/>
  </copy></xsl:if>
</target>

which produces:

<target name="process">
  <stage name="foo" number="01" previous="${source.dir}"/>
  <stage name="bar" number="02" previous="${stages.dir}/01foo"/>
  <stage name="baz" number="03" previous="${stages.dir}/02bar"/>
  <copy todir="${result.dir}">
    <fileset dir="${stages.dir}/03baz" includes="*.xml"/>
  </copy>
</target>

where “stage” is a macro that you put in your main build file and tailor as needed:

<macrodef name="stage"
      description="One stage of processing work">
  <attribute name="name" description="Name of this stage" />
  <attribute name="number" description="Number of this stage" />
  <attribute name="previous" description="Directory of previous stage" />
  <sequential>
    <echo>Stage @{number}: @{name}</echo>
    <mkdir dir="${stages.dir}/@{number}@{name}" />
    <dependset>
      <sources>
    <path refid="stage.@{name}.path"/>
      </sources>
      <targetfileset dir="${stages.dir}/@{number}@{name}"
             includes="*.xml" />
    </dependset>
    <xslt basedir="@{previous}"
      includes="*.xml"
      destdir="${stages.dir}/@{number}@{name}"
      extension=".xml"
      style="${xsl.dir}/@{name}.xsl"
      classpath="${saxon.jar}">
      <factory name="net.sf.saxon.TransformerFactoryImpl">
    <attribute
        name="http://saxon.sf.net/feature/allow-external-functions"
        value="true"/>
    <attribute
        name="http://saxon.sf.net/feature/linenumbering"
        value="true"/>
      </factory>
    </xslt>
  </sequential>
</macrodef>

The dependset uses the generated path for the stage’s stylesheet to force the stage to run whenever any of the stylesheet modules are newer than the stage’s XML result files, and the xslt task runs anyway for any source XML files that are newer than their corresponding result files.

If the stages’ stylesheets are to have stylesheet parameters passed to them, then either the “stage” macro gets param elements for the union of all parameters needed for the stylesheets or you can, as I’ve done at times, get Ant to write all its properties to an XML file and write functions to read the file to look up property values to use in setting parameters’ default values.

Finally, the main build file needs to generate and import the build file containing the “process” target and the paths:

<!-- XML file of properties determining or describing
     configuration. -->
<property
    file="properties.xml"/>

<xmlcatalog id="xmlcatalog">
  <!-- XML catalog for property file DTD. -->
  <catalogpath>
    <pathelement
    location="${system.basedir.converted}/schema/catalog.xml"/>
  </catalogpath>
</xmlcatalog>

<tempfile property="build.temp"
      destdir="${system.basedir.converted}"
      prefix="build-"
      suffix=".xml"
      deleteonexit="${deleteonexit}"/>
<xslt basedir="${basedir}"
      in="${properties.xml}"
      out="${build.temp}"
      style="${xsl.dir}/build.xsl"
      force="true"
      classpath="${resolver.jar}:${saxon.jar}">
  <xmlcatalog refid="xmlcatalog" />
  <factory name="net.sf.saxon.TransformerFactoryImpl">
    <attribute
        name="http://saxon.sf.net/feature/initialTemplate" 
        value="build"/>
    <attribute
        name="http://saxon.sf.net/feature/allow-external-functions" 
        value="true"/>
  </factory>
  <param name="properties.xml"
         expression="${properties.xml}"/>
  <param name="xsl.dir"
         expression="${xsl.dir}"/>
</xslt>
<import file="${build.temp}" />

I seldom use tasks outside of a target, but the include task can only be used outside a target, and the xslt task generating the build file to be included has to run before then, so there wasn’t much choice.

One Reply to “Multi-stage XSLT in Ant”