XML Tutorial
Volume 1 : XML Basics

Yasuhiro Nonaka

This is the first installment of our "XML Master Basic V2" exam strategy course. In this series, we will be commenting on the knowledge, important issues, techniques, and approach to help you pass the Basic V2 exam. But our objective is not to merely help you pass the XML Master. At the same time, this series will provide you with a basic foundation related to XML technology. We highly encourage anyone interested in learning about XML to take advantage of this XML training series. As this is the first installment of our series, we will begin by providing an overview of the XML Master exam and the most frequently encountered XML technologies.

Index

Introduction

An Overview of the XML Master Exam and Scope of Exam Question

Knowledge Tested in Section 1

Review Questions (Section 1: XML Overview)

Introduction

Every year, the "XML Master" IT certification becomes more popular as a means to prove one's technical skills related to XML. Why has the XML Master become so popular? Perhaps it's because XML technology has become so closely intertwined with our personal lives.

For example, XML is now the predominant file saving format for Office products. "RSS" is another example of an XML data format. The leading DBMS, DTP and other software productivity tools are also adopting XML compatibility as a standard feature. Having an understanding of XML when using these types of software programs offered by various vendors will allow XML data links to be used in a variety of applications. Today, being able to use XML has become an indispensable skill for the IT professional.

The XML Master Certification Program

An Overview of the XML Master Exam and Scope of Exam Question

The XML Master was first introduced in October 2001 as a vendor-neutral certification program for objectively determining a professional's skill level in XML technology. During November 2004, the number of XML Masters surpassed 10,000 professionals, and the number of exam candidates continues to grow each year. The XML Master consists of two certification levels: The XML Master Basic ("Basic") and the XML Master Professional ("Professional").

The XML Master Basic certifies the basic technical XML competency of a professional. The Basic certification exam consists of foundational content related to XML and XML technologies (DTD, XML Schema, XSLT, Namespace).

On the other hand, the XML Master Professional certifies advanced technical skills required to construct XML applications. The XML Master Professional Exam asks questions related to the technical knowledge required for constructing APIs (DOM, SAX) and XML systems to process XML.

The front "Section" regarding exam scope shows the topics covered by exam questions. See the XML Master Website for more about the XML Master.

Knowledge Tested in Section 1

In this, the first installment of our series, we will provide commentary related to Section 1 "XML Overview" covering the scope of the questions in the XML Master Basic Exam. In this section, questions mainly deal with the topics related to XML features, typical XML technologies, and a general overview.

XML Features

A basic summary of the main features of XML follows:

  • Excellent for handling data with a complex structure or atypical data
  • Data described using markup language
  • Text data description
  • Human- and computer-friendly format
  • Handles data in a tree structure having one-and only one-root element
  • Excellent for long-term data storage and data reusability

Excellent for Handling Data with a Complex Structure or Atypical Data

Data managed using RDB tables has a regular data structure. One could say that nothing surpasses RDB for handling data of this type of structure.

However, not all of the various data that exists in the world today is of a structure that can be managed using tables. Most such data has either an extremely complex structure (system logs, e-mail data, etc.), or is atypical data (product manuals, specification sheets, etc.) that has no specific structure. What can be done to handle these data types without extensive manipulation? XML is a data format well-suited to handling these circumstances.

Text Data Description

XML allows for the description of data in a text format. Since XML uses text data, XML data created on a Windows platform can also be used in a UNIX system. Data can be delivered back and forth without having to take OS and systems differences into account.

Excellent for Long-Term Data Storage and Data Reusability

Data created through a specific application becomes useless or even impossible to access if the application is eventually unusable or cannot maintain backward compatibility.

However, XML documents are text data, and do not rely on any particular application. The data can be stored for long periods of time with little fear of ever becoming unusable. Using "XSLT (XSL Transformation)" (note), an XML document can be transformed into a document of a different structure or format (HTML, CSV, etc.), increasing the reusability of XML documents-a "one-source, multi-use" solution.

Note: A specification recommended by the W3C. An XSLT stylesheet and XSLT processor can be used to transform an XML document (source XML document) into a document having a different structure or text format (HTML, CSV, etc.) An XSLT stylesheet is a document in which XML document transformation rules are described in XML syntax. Creating an XSLT stylesheet designed to use a single XML document for various purposes (e.g. Web browser or mobile phones) allows for simplified data management, and more efficient work processes.

Human- and Computer-Friendly Format

CSV is a typical example of a data format expressed using text. Data in CSV format is easily understood by programs designed to process CSV data, but only appears as a continuous string of characters to the human eye. Data expressed in XML, however, is "marked up" so it is not only an easy format for computer processing, but can be read and understood by humans.

Data Described using Markup Language

With XML, each individual piece of information is "marked up" (a marker shows the meaning of the associated data) with a tag that attaches meaning to the information. The unit of data to which a meaning has been attached is called an "element." An "element" consists of a "start tag", "content," and an "end tag."

Start Tag Content End Tag

When required, an "attribute" can be described in the start tag of an element, allowing more detailed information to be assigned to the data.

Attribute

Handles data in a tree structure having one-and only one-root element

With XML, a hierarchical element structure can be created by nesting elements. Under the XML specification, one-and only one-XML root element (the outmost element in an XML document) must exist, giving XML a single "tree structure" always having a single root element at the top. A collection of compiled data starting with a root element is called an "XML Document."

XML Document Example

Typical XML Related Technologies and Overview

The XML specification only stipulates the syntax for XML. Accordingly, the following XML related technologies play an important role in actually using XML:

DOM (Document Object Model)

The DOM specification is recommended by the W3C (World Wide Web Consortium), the standards organization for Internet-related technologies.

Under DOM, XML documents and HTML documents are modeled on a tree structure (DOM Tree) when being processed. External applications use this DOM Tree and included nodes (element nodes and attribute nodes, etc.) to perform operations on XML and HTML documents. The DOM specification defines the different "levels" of DOM: the higher the level, the more advanced features it supports. Under the DOM specification, the entire XML document is first loaded into memory, allowing for the easy creation of new nodes, or node movement/ deletion. Saving the last DOM Tree in memory allows for the creation or update of an XML document.

The drawbacks of DOM include the fact that an entire XML document must be analyzed when first constructing the DOM Tree, and the fact that the entire XML document must be loaded into memory. The large the XML document, the greater the overhead in terms of processing and memory used.

SAX (Simple API for XML)

SAX is a specification created through the XML-DEV mailing list, rather than being a W3C-recommended specification. Processing via SAX is light and quick in contrast to DOM. SAX loads an XML document in order from top to bottom, and is an event-driven API that notifies the application of an event regarding information associated with the detection of an element's start tag or end tag, or occurrences of text. On the application side, the event received is processed to acquire the data from within the XML document.

The benefit of SAX is that processing can be conducted while analyzing the XML document-there is no need to load the entire document first as with DOM. On the other hand, not having an API for XML document updating means that any XML document updates have to be handled from within the application. Also, SAX only loads an XML document in order from top to bottom, so forward or backward referencing of XML documents must be performed by the application. SAX is especially suited for searching for and extracting data from XML documents.

Main XML Related Technologies

Structure Definitions DTD Schema definition language defined under the XML 1.0 specification
XML Schema Schema definition language defining XML document structure
Selection Method XPath Language designating syntax that indicates the appropriate path within a tree structure in order to select elements and/ or attributes within an XML document
Transformation XSLT Language specifying the mechanism for transforming an XML document into an XML document of different structure or structured data
Display CSS A specification defining the layout for XML and HTML displayed through a Web browser
XSL-FO Language for describing the layout of an XML document
Protocol SOAP XML-based messaging protocol

Review Questions (Section 1: XML Overview)

Question 1

Which of the following correctly described a feature of XML? Select all that apply.

  1. Element names that can be used in an XML document are designated in the XML specification
  2. An XML document can be written as text data
  3. XML documents are excellent for long-term storage and data reusability
  4. When creating an XML document, an XML parser must be used

Comments

Features of XML include that fact that XML documents can be written as text data and XML is excellent for long-term storage and data reuse. Accordingly, the correct answer is B and C.

Question 2

Which of the following is an API for accessing XML documents? Select all that apply.

  1. SAX
  2. DOM
  3. SOAP
  4. XPath

Comments

SOAP is a specification for linking systems through the transmission of messages. XPath is a specification for describing a method for selecting a certain node from an XML document tree structure. Accordingly, the correct answer is A and B.

Yasuhiro Nonaka

Infoteria Corporation Training Department. After working in systems development for more than a decade, Mr. Nonaka now mainly oversees the text development and revision of the Infoteria certification training course corresponding to the "XML Master Basic V2" exam.

Mr. Nonaka's most recent goal is to visit the health club twice per week, up from his current pace of 0 to 1 times per month.


The content presented here is an HTML version of an article that originally appeared in the February 2007 issue of DB Magazine published by Shoeisya.

XML Master Tutorial Indexs

Go To HOME