XML Tutorial
Volume 7 : XSLT Basics
Tomoya Suzuki
Index
Effective Utilization of XML Data
What is XSLT?
Data Transformation using XSLT Stylesheets
Review Questions
Effective Utilization of XML Data
In our world today, the same information is published through a variety of different media. For example, in the mail order industry, customers can view product information in catalogs, or through websites via a PC or mobile phone. In either case, the product names, prices and other information are the same, even though the display format differs. Alternatively, we see when we look at a catalog sales website that the visual layout and other design elements are the same for each article, and a large volume of product information is accessible.
If we manage the product information data separately from its layout and other design factors, we can respond flexibly to demands for edits or updates. With XML, we can deal with the data itself, using tags as a means to label the data. At that point, all that is left is to designate the layout information using XML to define how we want the product information to be displayed.
Transforming Data in XML Format
Since data can be described by itself in XML, exactly how data is loaded and displayed is left to the application. Web system environments allowing stored data to be viewed via web browser have become increasingly popular. It is very convenient to transform data stored in XML format to HTML, and then view the data using a regular web browser.
So, how do we go about transforming data in XML format to HTML? There are basically two ways to accomplish this:
- (1) Use DOM/SAX or other API to load using a program, and then output into an arbitrary HTML format
- (2) Use XSLT to describe transformation rules in a stylesheet, outputting the results in HTML format
Under (1) above, loaded data can be freely processed into any format via programming. However, since the transformation process is described via program, when there is a change in display format and the output HTML must be changed, the program itself must be recoded. On the other hand, with (2) above, the transformation rules are written in the stylesheet. In this case, even if the HTML is changed, there is no need to rewrite the program itself. In an environment where designs change frequently, being able to modify transformation content using a stylesheet is extremely convenient.
What is XSLT?
At this point, let's provide a quick introduction to XSLT. The W3C has already standardized a specification for transforming XML data to HTML format for display. The section covering XML structure transformation was developed first, recommended by the W3C in November 1999 as the "XSLT (XSL Transformations)" specification. "XSL-FO (Formatting Object)," a section related to XML format transformation, was recommended in October 2001 in the "XSL (Extensible Stylesheet Language)" specification.
Under XSLT, a XSLT stylesheet is used to describe transformation rules in XML format. This is read by an application called an "XSLT Processor," transforming a designated XML document. The transformation results are output in XML, HTML or text format.
Figere2: Document Transformation via XSLT Processor
With XSL, document information is described in XML format (normally, an XML document is transformed via XSLT, and document information is added). This can then be loaded into an application called an "XSL Formatter" that provides an end-product (display, printed page) in a uniformly formatted layout.
By using these technologies, electronic data described using the XML format can be viewed via web browser or printed media.
Data Transformation using XSLT Stylesheets
Now, let's actually use an XSLT stylesheet to transform XML data into HTML data. Here, we will take a look at an example that uses an XSLT stylesheet (LIST2) to transform an XML Document (LIST1) representing user information. LIST3 shows the actual data transformed into an HTML format. The second line of LIST1 is where the designated XSLT stylesheet (list2.xsl) is applied.
LIST1: Source XML Document (list1.xml)
01 <?xml version="1.0" ?>
02 <?xml-stylesheet type="text/xsl" href="list2.xsl"?>
03 <UserList>
04 <User>
05 <Name>John Smith</Name>
06 <Account>John</Account>
07 </User>
08 </UserList>
LIST2: XSLT Stylesheet (list2.xsl)
01 <?xml version="1.0" ?>
02 <xsl:stylesheet version="1.0"
03 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
04 <xsl:template match="/">
05 <html>
06 <body>
07 <h1>Welcome</h1>
08 Mr. <xsl:value-of select="UserList/User/Name" /><br/>
09 </body>
10 </html>
11 </xsl:template>
12 </xsl:stylesheet>
LIST3: Transformation Result (list3.html)
<html>
<body>
<h1>Welcome</h1>
Mr. John Smith<br>
</body>
</html>
Opening the XML document using Internet Explorer 5.x or later results in the following display:
Looking at LIST2, you can see that the HTML tags and XSLT commands are both used. In order to decipher what parts of the code are XSLT commands, we designate the following namespace:
http://www.w3.org/1999/XSL/Transform
This description indicates an XSLT command. Designating this for all XSLT commands creates redundancy, so we associate the description with a prefix. Any prefix can be used, but the prefix "xsl" is used most frequently with XSLT. The prefix and the element name are separated by a colon (:). In other words, "xsl:…" is notated to separate transformation commands and output data. Since the XSLT stylesheet is an XML document, a root element is required. The root element of an XSLT stylesheet is "xsl:stylesheet element."
The various commands for document transformation are described in the child element(s) of the xsl:stylesheet element. The element described directly below the xsl:stylesheet element is called the "top level element." The following are elements that can be coded as a top level element:
key
param
template
decimal-format
namespace-alias
preserve-space
variable
include
output
strip-space
Specific transformation content is called "Templates" (lines 5 through 10 in LIST2); templates are described in "Template Rules." The "xsl:template element" is what represents the template rules (lines 4 through 11 in LIST2).
Because it is not an XML document, an HTML document can be coded in different ways that would not be accepted as a well-formed XML document (attributes not surrounded by quotes; open tag with no closing tag, etc.). However, an XSLT stylesheet is an XML document, and as such, must be a well-formed document. Accordingly, the HTML br element <br> must always be notated as an empty element (<br/>) (line 8 in LIST2).
"XPath" specification for designating the transformation source XML
In order to transform a document from XML data, you need to do more than simply apply an XSLT stylesheet. You must clearly designate what is to be transformed (e.g. "transform A into B") and how. In other words, you must designate the source location. Accordingly, the W3C recommended "XPath (XML Path Language)" in November 1999 as a specification for designating a certain location in an XML document.
With XPath, the component elements of an XML document are treated as nodes. For example, following Figure represents the result of expressing the XML document LIST4 via XPath in nodes.
LIST4: Catalog XML Document (list4.xml)
01 <?xml version="1.0" ?>
02 <Catalog>
03 <Title>XML Series Commemorative Goods</Title>
04 <Product ProductCode="p001">
05 <ProductName>XML Strap</ProductName>
06 <UnitPrice>700</UnitPrice>
07 </Product>
08 </Catalog>
Under XPath, the root node "/" is expressed at the top of a tree structure. But we must be careful to note the difference between "root node" and "root element." A root element is the root of the element tree structure. In an XML document, comments and processing instructions may also be coded outside the root element; the root node is what summarizes all of these components.
To designate a node, nodes are separated by the "/" character in order from the root node down through each hierarchy level. For example, taking the ProductName element from LIST4, the XPath method would be "/Catalog/Product/ProductName". Under XSLT, the node to be processed with respect to the template rule is designated. This is the match attribute (line 4 of LIST5) of the xsl:template element.
LIST5:XSLT Stylesheet (list5.xsl)
01 <?xml version="1.0" ?>
02 <xsl:stylesheet version="1.0"
03 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
04 <xsl:template match="/">
05 <html>
06 <body>
07 <h1><xsl:value-of select="Catalog/Title" /></h1>
08 Here is the featured product for today<br/>
09 <table border="1" width="200">
10 <tr><th>ProductName</th><th>Price</th></tr>
11 <xsl:apply-templates select="Catalog/Product"/>
12 </table>
13 </body>
14 </html>
15 </xsl:template>
16 <xsl:template match="Product">
17 <tr>
18 <td><xsl:value-of select="ProductName" /></td>
19 <td><xsl:value-of select="UnitPrice" /></td>
20 </tr>
21 </xsl:template>
22 </xsl:stylesheet>
Lines 4 through 15 in LIST5 represent the template rules regarding root node processing. The node that is subject to the template rule is called the "current node." In XPath, there is the "absolute path" where descriptions are made from the root node, and the "relative path" where descriptions are made relative to the current node.
If the current node in LIST4 is the Catalog element node, then the XPath notation for the ProductName element would be "Product/ProductName".
When designating an attribute, use "@attributename". The XPath notation representing the ProductCode attribute in LIST4 would be "Product/@ProductCode".
"xsl:value-of" command for extracting data
Using the xsl:value-of command allows you to extract data from the transformation source XML document, and then output the transformation result. Under the xsl:value-of command, you designate the data you want to extract for the select attribute in XPath notation:
On line 8 of LIST2, the "John Smith" (from the designated "UserList/User/Name" text content) is extracted, and then output as the transformation result.
"xsl:template" element for describing transformation rules
The specific transformation process content is described as a template inside the xsl:template element. Designate in the match attribute the location within the source XML document that is the target for processing.
The XSLT processor first applies the template rule that processes the root node, executing the template within the applied template rules in order from top to bottom.
"xsl:apply-templates" command for applying other template rules
A multiple number of template rules can be created, similar to the subroutines of a traditional program. Use the xsl:apply-templates command to apply a different template rule from within a template rule, and use XPath notation in the select attribute to designate the location in the transformation source XML documen.
XPath notation for selecting a location
In LIST5, a separate template is applied in order to display the Product element data.
Review Questions
Question 1
Select which of the following is incorrect as a method for using XSLT.
- Transform XML data into HTML for display in a web browser
- Output a portion of data extracted from source XML data in order to send XML data to another system
- Output data in CSV format (text format) in order to send XML data to a system that does not support XML
- Transform XML data into a zip format (compressed format) in order to send the data more efficiently
Comments
Under XSLT, data can be output in XML, HTML or text formats. A certain portion of source data can be copied in XML format. Data can also be output in CSV format by separating data using a comma (,) when you need to output it in text format. However, binary format files cannot be output. Accordingly the correct answer to this question is D.
Question 2
Select which of the following is correct as an XSLT stylesheet notation.
- <?xml version="1.0" encoding="UTF-8"?>
<stylesheet version="1.0">
<template match="/">
<html>
<body>
<h1>Welcome</h1>
Hello<value-of select="UserList/User/Name" /> <br/>
</body>
</html>
</template>
</stylesheet> - <?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<html>
<body>
<h1>Welcome</h1>
Hello<xsl:value-of select="UserList/User/Name" /><br/>
</body>
</html>
</xsl:stylesheet> - <?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h1>Welcome</h1>
Hello<xsl:value-of select="UserList/User/Name" /><br>
</body>
</html>
</xsl:template>
</xsl:stylesheet> - <?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h1>Welcome</h1>
Hello<xsl:value-of select="UserList/User/Name" /><br/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Comments
With XSLT stylesheets, the namespace "http://www.w3.org/1999/XSL/Transform" must be declared using the element associated with the namespace. If the namespace is not designated, the XSLT command will not be recognized, even if it is the same element name as the XSLT. The template is described as the child element of the xsl:template element, but it cannot be written directly beneath the xsl:stylesheet element. In addition, since the XSLT stylesheet is an XML document, it must be a well-formed document.
The HTML <br> tag must be written as <br/>. Accordingly, the correct answer is D.
Question 3
Select which of the following is the correct XPath notation corresponding to (1), when you want to transform the following XML document into HTML using an XSLT stylesheet. Select all that apply.
[XML Document]
<?xml version="1.0" encoding="UTF-8"?>
<Snack>
<Yesterday>
<Fruit>Banana</Fruit>
</Yesterday>
<Today>
<Fruit>Watermelon</Fruit>
</Today>
<Tomorrow>
<Fruit>Melon</Fruit>
</Tomorrow>
</Snack>
[XSL Stylesheet]
<?xml version="1.0" ?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h1>Snack</h1>
<xsl:apply-templates select="Snack/Tomorrow"/>
</body>
</html>
</xsl:template>
<xsl:template match="Tomorrow">
Snack will be<xsl:value-of select=" (1) " /><br/>
</xsl:template>
</xsl:stylesheet>
[HTML Document]
<html>
<body>
<h1>Snack</h1>
Snack will be Melon<br>
</body>
</html>
- /Snack/Tomorrow/Fruit
- Snack/Tomorrow/Fruit
- Tomorrow/Fruit
- Fruit
Comments
With XPath, the absolute path is written beginning with the "/" character, while a path not beginning with the "/" character is a relative path. In this question, the fruit element node below the Tomorrow element node is designated. Expressing the absolute path results in "/Snack/Tomorrow/Fruit", and in the case of a relative path, the node designated in the match attribute of the xsl:template element is the current node. Here, the current node is the Tomorrow element node, so expressing the relative path results in "Fruit". Accordingly, the correct answers are A and D.
Tomoya Suzuki
Toshiba OA Consultant, Ltd. Training Solutions Engineering Department. Mr. Suzuki works mainly as an instructor in XML and programming language seminars. He laments that, for some reason, all of the rainy days during rainy season were on the weekends this year, preventing him from playing tennis, and causing him to suffer from a lack of exercise. He does predict, however, that he will have a deep, dark tan by the time this article is published. Don't forget the sun block, Tomoya.
The content presented here is an HTML version of an article that originally appeared in the September 2006 issue of DB Magazine published by Shoeisya.