The Ground Cero Guide to XSL
Henrik Aasted Sorensen
1. Introduction
The following is a basic introduction to the Extensible Stylesheet Language (XSL). The tutorial is meant for beginners, preferably with a bit of experience in HTML. It's not a complete guide to XSL, but rather an attempt to guide the reader through the first basic steps of developing in XSL. After understanding the material covered here, it's my hope that the reader will feel better prepared to go and read some of the more advanced stuff on the topic.
The name of the tutorial may seem a bit odd at first. The explanation is that I once wrote "The Ground Cero Guide to C", and would like this to be the second tutorial in the "The Ground Cero Guide to"-series. With the rate that I'm putting these out, you should expect the third chapter to arrive around 2009.
Any comments, suggestions or errors may be sent to me. I would also be very happy to hear from you even if you don't have anything in particular to say, but have read the tutorial.
As a quick sidenote, I would like to say beforehand that I do know that writing "XML-language" is the same as writing language-language, but I have yet to find an appropriate subsitute. Please forgive that.
2. Short introduction to XML
XML is an abbreviation for eXtensible Markup Language. A markup language is (simply described) a collection of tags and text-values. The most well-known markup language today is HTML, which is used to layout homepages.
XML enables people to design their own markup languages, tailored for storing any kind of information.
The following is an example XML-document:
<addresses>
    <person name="John Doe">
        <address>
            <street> Oakroad 5 </street>
            <city> Lyngby </city>
            <country> Denmark </country>            
        </address>
        <phone> 124-21424-21 </phone>
        <email> jd@example.com </email>
        <category>friend</category>
        <category>co-worker</category>
    </person>
</addresses>
Looking at this document quickly reveals its purpose: it's a markup-language for storing adresses.
Let's take a detailed view of one of the tags:
<person name="John Doe"> ... </person>
<person> is called the opening tag.
name="John Doe" inside the opening tag is called an attribute. The name of the attribute is name, and the value is John Doe.
... Is the contents of the tag. This can be either more tags or text-data.
</person> is the closing tag.
A tag must always be closed to make well-formed XML-document. If a tag doesn't have any contents, it's not necessary to put up both an opening- and and closing-tag. <person/> is the same as writing <person></person>.
Comments in a XML-document have the same syntax as in a HTML-document: <!-- Comment goes here. -->
It's important to notice the structure of a XML-document. The outermost tag, the one containing all the other tags, is called the root-node. In the above example <addresses> is the root node. Any tags in the contents of another tag is called the children of the tag. A XML-document can be visualized like this:
Each black dot is a tag. The top dot is the root-tag. This kind of structure is usually called a tree, because it has a root, which is on top just like a ... and... uh... It's a tree! The black dots are called nodes. Notice that all nodes, except the root, have one and only one parent. Knowing this way of visualizing a XML-document is very important once we start navigating it.
All XML-documents should start with the line
<?xml version="1.0" encoding="iso-8859-1"?>
Variations can be made to the encoding-attribute, but this will not be covered here.
Adding the line
<?xml-stylesheet href="stylesheet.xsl" type="text/xsl"?>
will tell whatever program that wants to render the XML-file, which XSL-stylesheet to use. Making XML-documents that contain this line will make it possible for Internet Explorer (versions 6 and above) to render the document according to the stylesheet. This is a simple and very effective way of trying out the examples in this tutorial.
These are the very basics of writing XML. So far it may not seem like much ... and truth be told it isn't. :) What makes XML powerful and useful is the tools that can be used in connection with it. This tutorial will focus on the XML-language XSL and its applications.
3. Introducing the Extensible Stylesheet Language (XSL)
XSL is probably the first XML-language to make it out of the computer science laboratories to the general public. Several wide-spread browsers already support it. Expect more exciting technologies to emerge in the coming years.
XSL is capable of transforming documents from one XML-language to another using an XSL-processor. The transformation is described in an XSL-file. By far, the most widespread use of this today is the transformation into XHTML-documents, which is the XML-version of HTML. Using the address-book example from before, this would allow us to define an XSL-document for making a web-version of the addressbook. The advantage of doing it this way, is the absolute separation of content and layout. In the addressbook-XML-file we keep only the data, while all information on how to show it is in another file. This also adds the possibility of using more than one XSL-sheet and thus rendering the addressbook in a different way.
Although this tutorial will focus on transforming XML-documents into XHTML, it's important to emphasize that XSL can be used for transforming between any two XML-languages.
XSL is actually made up of two languages: XSLT and XPath. XSLT is used for the transformation itself, while XPath is used for navigating the XML-tree.
I'll start with introducing XSLT and postpone XPath a bit.
4. XSLT
Namespaces
One of the first issues to address when working with XSLT is how to mix XML-languages. Imagine if the address-language from the above example contained a tag called <br>. It might cause a bit of trouble if we wanted to transform an address-document containing that tag into an XHTML-document, because XHTML contains a tag with the same name. To resolve this conflict, namespaces were introduced. Namespaces allow a document to contain tags from several different XML-languages.
Namespaces are defined in the root-tag of the document. A namespace is defined as:
<roottag xmlns:ns="URI">
xmlns: is the keyword indicating that we're registering a namespace. ns is a short name for the name-space, while the URI (Universal Resource Identifiers) is used to uniquely identify the XML-language used. URIs take the form of an ordinary web-address. This is done to avoid collisions in URIs. Anyone making a new XML-language should choose an URI from their own domain-address. The address doesn't necessarily need to point to anything, but it's convenient to let it point to the specification of the language. The XSLT-URI is "http://www.w3.org/1999/XSL/Transform" and the XHTML-URI is "http://www.w3.org/1999/xhtml". Namespaces are used by putting the short name or the entire URI in front of each tag.
Example. The person-tag, if its namespace is called add, would look like:
<add:person name="John Doe"> ... </add:person>
Example of specifying namespaces for a document:
<add:addresses xmlns:add="http://www.phonedirectoy.org/XML" 
     xmlns="http://www.w3.org/1999/xhtml">
Notice that a short name for the XHTML-namespace is not defined. This makes it the default namespace. Any tags in the document without a namespace will then be considered a XHTML-tag.
5. Basic XSLT
With that covered, let's move on the basics of XSLT.
The root-tag of an XSLT-document is stylesheet.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
                xmlns="http://www.w3.org/1999/xhtml"
                xmlns:add="http://www.phonedirectoy.org/XML">
    
This example shows the stylesheet tag initiating 3 namespaces: XSL, XHTML and our address-language.
XSLT uses tag-matching rules to process a document. Each rule has its own tag, which must be a direct child of the root-tag.
<xsl:template match="add:addresses">
Will match the root-element of an address-document. The contents of a template is evaluated from the point of view of the node that was matched. The whole document is evaluated from the point of view of the root node, which is why it's necessary to make a template-rule for evaluating that.
The contents of the match-attribute is actually an XPath-expression, but to keep things a bit simple we'll initially stick with only making templates that match the names of tags directly. The section on XPath will broaden the possibities.
The contents of the <xsl:template>-tag is the a combination of XSL-tags and tags from the resulting XML-language. Any non-XSL-elements will be written to the destination document. Building on the initial example, we'll make a web-presentation of the address-book. You can download an extended address-book to experiment with here.
<xsl:template match="add:addresses">
    <html>
    <head><title> My addressbook </title></head>
    <body>
    <h1> My addressbook</h1>
    </body>
    </html>
</xsl:template>
    
The result of processing this XSL-document with the address-document will be a (very simple) web-page saying "My addressbook" several times.
To add some information to the webpage, we need more XSL-tags. To process the children of the address tag we add the <xsl:apply-templates>-tag, which will call the person-template with all the person-tags.
[...]
<h1> My addressbook</h1>
<xsl:apply-templates select="add:person"/>
</body>
[...]
This requires that we make another template-tag. This one must be capable of matching the person-tag.
<xsl:template match="add:person">
</xsl:template>
The contents of this tag should print some or all of the available information contained in the person-tag, which can be done like this:
<xsl:template match="add:person">
    </b><xsl:value-of select="@name"/></b><br/>
    Phone: <xsl:value-of select="add:phone"/>
</xsl:template>
This introduced the <xsl:value-of>-tag. This tag is used for printing the text-contents of a tag or an attribute to the resulting document. Notice the @ in front of the name-attribute. This signals to the XSL-processor that we're interested in an attribute and not a child-node.
A fancy feature of the resulting XHTML-document would be to make a link to a person's e-mail-address with the <a>-tag. Remember that the <a>-tag has the attribute href which contains the target URL of the link. XML doesn't support tags with-in tags, which prevents us from doing this:
<a href="<xsl:value-of select="email"/>"> <xsl:value-of select="email"/> </a>
Instead it's necessary to use XSL-variables.
<xsl:variable name="emailaddress" select="add:email"/>
This will instantiate a variable named emailaddress which contains the node email. Displaying the variable in the middle of the <a>-tag is done like this:
<a href="mailto:{$emailaddress}"> <xsl:value-of select="add:email"/> </a>
Hopefully this is sufficient to get you started at basic XSL.
6. Introducing XPath
XPath is a language made for locating nodes in XML-documents. It's not in itself a XML-language, but is rather used as part of other XML-technologies (XSL, XPointer, XML Schema, X Query and others). This introduction to XPath may be a bit biased towards its use in XSL.
An XPath-expression consists of three parts:
axis::nodetest[filter]
axis defines which nodes, relative to the current node, that should be part of the expression. The most common values for this are:
  • self - The current node.
  • child - All the children of the current node.
  • descendant - All nodes below the current node in the tree.
  • parent - The current node's parent.
  • ancestor - All nodes above the current node in the tree.
nodetest is a simple way of filtering the selected nodes with regard to either their name or their type. Possible values are:
  • name - Picks only nodes with the specified name. This was the nodetest used in the examples in the above section.
  • * - Picks any node.
  • text() - Picks only nodes containing text.
  • node() - Returns the node.
filter is used for fine-grained filtering of the selected nodes. A filter will usually be a boolean expression (ie. an expression that will evaluate to either true or false). Another possibility is that the expression evaluates into a number. This will be translated into making the expression true for the node with the index (relative to its parent) that matches the number.
A few functions are available for these expressions:
  • position() - Returns the index of the current node.
  • last() - Returns the index of the last node selected. The index of the first node is 1.
  • true() and false() - returns the boolean values true and false.
  • not(expr) - Returns the inverse of the boolean expression given as an argument.
I believe this calls for examples.
Imagine that the current node is addresses and that it has several children.
The following XPath-expression will select the first 5 persons in the address-book:
child::add:person[6 > position()]
An expression that will pick person number 4 from the address-book. This is also an example of a filter that is NOT a boolean expression.
child::add:person[4]
This expression selects all people who are listed with their e-mail-address in the address-book:
child::add:person/add:email/parent::node()
This expression will select all the co-workers in the address-book:
child::add:person/add:category[contains(text(),'co-worker')]/parent::node()
Notice how, after checking the category-tag, it's necessary to go back up one node to pick the person-nodes.
This one will print every second person in the book.
child::add:person[position() mod 2 = 0]
Any of these expressions can be put in the select-attribute of an <xsl:apply-templates> or a <xsl:variable>-tag. It's possible to link together several expressions by using a | as a seperator.
add:email | add:phone
This will match both email- and phone-nodes. Not all tags that utilize XPath will be able to use several expressions like that. <xsl:apply-template> is able to do it, <xsl:value-of> is not.
The contains()-function is one of several functions capable of working on strings. Other functions are:
  • substring(string, number1, number2) - Returns the substring of the string supplied as argument.
  • string-length(string) - Returns the length of a string.
  • normalize-space(string) - Returns a space with all leading and trailing whitespace stripped.
Notice that all the above functions return a new string, while the contains()-function returns boolean.
A number of less readable, but easier to type abbreviations have been made for the axes:
  • axis abbr.
  • child Empty. This means that the default is to choose the choose the children of the current node.
  • self .
  • parent ..
Computer-savvy people may notice the resemblance to filesystem-handling syntax.
Looking at these abbreviations, it's possible to change the co-worker-example above to:
add:person/add:category[contains(text(),'co-worker')]/..
7. More XSLT
Having touched upon XPath, let's move forward to slightly more advanced XSLT-topics.
Conditional processing
It's possible to use conditional processing in XSLT by applying the <xsl:if>-tag. This tag contains the test-attribute, which in turn contains an XPath-expression. This can be used to test for tree-structure, variable values and other things. A short example for the address-book:
<xsl:if test="add:email">
    ...
</xsl:if>
If this example is put around the e-mail-printing lines in the person-template, it will only print the e-mail-addres if it is available. This way it's possible to have a really dynamic address-book where not all entries need to contain every piece of information: only the available information will be printed. This can also be achieved by making templates for every tag that will be printed.
Iteration
The <xsl:for-each>-tag iterates through a collection of nodes selected by the select-attribute. The contents of the tag will be used as a template on each node. The <xsl:for-each> tag is in many ways very similar to the <xsl:apply-templates>-tag. Use the <xsl:for-each>-tag if you don't feel like making a complete template for the processing of some nodes or if you want to do processing that's very different from the one available in the template.
Sorting
The normal ordering of nodes-processing is what's called "document-ordering", which means that nodes are processed in the order that they appear in the XML-document. A different sorting of nodes can be achieved with the <xsl:sort>-tag. Putting an <xsl:sort>-tag inside a <xsl:for-each>- or <xsl:apply-template>-tag will change the processing-order of the nodes.
Example:
<xsl:apply-templates select="add:person">
    <xsl:sort select="@name"/>
</xsl:apply-templates>
This will sort each person in the address-book alphabetically by their name. Notice how the <xsl:apply-templates> tag is having both an opening and a closing tag in this example. Looking at this makes me realize that it would have been nice to split the name-attribute into a "first-name" and a "last-name" for more correct sorting. I'll leave that as a an exercise to the reader. :) The <xsl:sort>-tag also contains the attribute order, which can take the two values "ascending" or "descending".
If more <xsl:sort>-tags are used, the first one will be the primary sort key, the second will be the secondary sort key and so on. The following example will sort the people in the addressbook firstly by country and secondly by name.
<xsl:apply-templates select="add:person">
    <xsl:sort select="add:address/add:country"/>
    <xsl:sort select="@name"/>
</xsl:apply-templates>
Parameters
It may be relevant to be able to affect the processing from "outside", ie. without changing the document. This can be done through parameters. The syntax of using parameters is strikingly similar to that of using variables:
<xsl:param name="parameter1">value</xsl:value>
A param-tag must be the direct child of the root-node. After instantiating a parameter, it can be used like a variable in the rest of the document. If the parameter is not set by the environment when calling the XSL-processor, the parameter will just take on the value in the tag. Passing parameters to an XSL-transformation is done differently according to which XSL-processor you're using. Look to the section on XSL with PHP to see how to do it in PHP.
8. XSL-processors
A number of XSL-processors exist. My experience lies in two of them: Internet Explorer 6 and the XSL-library of PHP.
Internet Explorer 6
IE6 has (reputedly good) support for XSL and XML-documents in general. If it receives an XML-document in a language it doesn't know, it will simply display it. I find that IE6 is very good for debugging stylesheets and documents, because it produces rather detailed and exact reports in case of an error in the documents. The problem is that IE6 is a rather new browser and it's not in as widespread use. It's therefore a bit hard to rely on the browser being able to do the XSL-transformation. The solution to this is performing the transformation on the server instead, and just send the result to the browser.
Support for doing this can be found in a lot of server-languages. I'll make short introduction to using the PHP-version in the following.
PHP
Some PHP-servers come with the XSLT-library. It's unfortunately not a standard part of the PHP-package. The three most important functions in the library are:
 xslt_create() 
This method creates a handle to an XSL-processor. It takes no arguments.
xslt_process(XSL-handle, XML-FILE, XSL-file)
This method is doing the actual transformation. The first argument is the handle obtained from the previous function. The second and third arguments are the filename of the XML-file and XSL-file respectively. The resulting XML-document is returned from the function call.
xslt_free(XSL-handle);
This function releases the resources allocated in connection with the xslt_create()-function.
Passing parameters
It's possible to pass parameters to the XSL-document before the transformation. This can be done through the extended version of the xslt_process()-function
xslt_process(XSL-handle, XML-file, XSL-file, Returnvalue ,Argument-array,Parameter-array);
The first three parameters were described above. Returnvalue gives the possibility of getting the resulting XML- document. The Argument-array can be used if the XML- and XSL-documents are not kept in files, but are instead kept in a variable. The fifth argument is an associative array containing the XML- and XSL-documents. The parameters XML-file and XSL-file must be substituted with "arg:/index", where index is the key of the document in the associative array. Parameter-array is also an associative array. Each variable's index must match an <xsl:param> in the XSL-document.
The following is an example of an XSL-transformation in PHP. The XML- and XSL-data are considered to be contained in the variables $xml and $xsl.
    // Create a new processor handle
    $th = xslt_create() or die("Can't create XSLT handle!");
    
    $args = array();
    $params = array();
    
    $args["xml"] = $xml;
    $args["xsl"] = $xsl;
    
    if (isset($cat)) {
        $params["cat"] = $cat;
    }
    
    // Perform the XSL transformation
    $trans = xslt_process($th, "arg:/xml", "arg:/xsl", null,$args,$params);
    if (!$trans) {
        print "Failure: Reason is that " . xslt_error($th) . " and the ";
        print "error code is " . xslt_errno($th);
    
    }
        
    echo $trans; // Output the transformed XML file
                        
    xslt_free($th); // Free up the resources
9. Goodbye
I hope that this tutorial has been of use to you. I suppose that if you're still reading, there is a fair chance that it has :). I feel that making webpages with XSL can often (always?) be far superior to making them with ordinary HTML, because the total separation of content and layout makes both far clearer. I hope that you're starting to feel the same way after reading through this.
If you want some simple files to experiment with, feel free to use my bookmarks-XML-file and the corresponding XSL-file which uses most of the techniques presented in this text.
The tutorial is, of course, also presented with XSL. If you want a look behind the scenes, download the XML-text, the index-XSL and the document-XSL.
If you've used this text and have any comments, please mail me.
Some suggestions for places to go from here: