XML Tutorial - XML Master Professional Application Developer Edition
Volume 3 : DOM/SAX Programming
Tatsuya Kimura
Section 2 Major Points for Study
Section 2 presents questions related to simple XML documents, as well as Java programs using DOM API or SAX API, asking the exam taker to identify processing results. The programs presented in exam questions are quite simple; however, you must have an accurate understanding of the behavior of DOM/SAX API when handling XML data.
Questions related to DOM Programs
The following is the same question presented under "Example of Question Appearing on the Exam" in the first of this series of tutorials.
Example of a DOM Program Question Appearing on the Exam - (1)
Load the following [XML Document], and process the document using [Processing by DOM]. Select the answer that best describes the results expressed under XML 1.0. Ignore line returns and indents.
[XML Document]
<parent>
<child1>DATA1</child1>
<child2>DATA2</child2>
</parent>
[Processing by DOM]
Process XML using the following code:
Document output = updateXML(doc);
Here, the variable doc references the Document instance of the loaded XML Document. Assume that there is no runtime error. The method updateXML code is as follows:
public static Document updateXML(Document doc) {
Node child1 = doc.getElementsByTagName("child1").item(0);
Element parent = doc.getDocumentElement();
parent.appendChild( child1 );
return doc;
}
Option
<parent>
<child1>DATA1</child1>
<child2>DATA2</child2>
</parent>
<parent>
<child2>DATA2</child2>
<child1>DATA1</child1>
</parent>
<parent>
<child1>DATA1</child1>
<child1>DATA1</child1>
<child2>DATA2</child2>
</parent>
<parent>
<child1>DATA1</child1>
<child2>DATA2</child2>
<child1>DATA1</child1>
</parent>
Answer
B
Commentary
To correctly answer this question, you must understand the behavior of the appendChild method provided in the DOM API. appendChild is a method that appends an element node designated in the argument at the end of the targeted node. Now, let's look through the process conducted by the code of the provided updateXML method in order.
Under updateXML method, child1 element is first obtained via getElementsByTagName method. getElementsByTagName method is a method that detects an element designated in the argument from the targeted node.
Next, the root element (parent element) is obtained via the getDocumentElement method, after which the appendChild method is executed for the root element, adding child1 element to the end (in other words, after child2 element) of the node (at this time, child1 element is separated from its original position).
From the part of the question that states "there is no runtime error," we understand that there are no simple errors (element name or XML namespaces syntax errors) in the Java program.
If, as in this question, indents are included in the XML document to be processed, the processing results will reflect the whitespace-only text node if the setting for the DOM parser during execution is set to retain whitespace-only text nodes.
<parent>
<child2>DATA2</child2>
<child1>DATA1</child1></parent>
However, this question tells you to ignore line returns and indents. Accordingly, the correct answer is B, regardless of the DOM parser settings. Questions may or may not tell you to ignore line returns and indents; be sure to carefully read the entire text of each question.
DOM-related questions that show up on the XML Master Exam are based on DOM Level 2. Accordingly, as was the case in this question, I will not address how to load XML documents that are not provided for under the specifications.
The practice question above only asks about the behavior of the appendChild method. You may also see questions asking how XML namespaces are affected when processing an XML document that uses namespaces. Let's look at such a question, applying the previous practice question as our starting point.
Example of a DOM Program Question Appearing on the Exam - (2)
Load the following [XML Document], and process the document using [Processing by DOM]. Select the answer that best describes the results expressed under XML 1.0. Ignore line returns and indents.
[XML Document]
<parent>
<child1>DATA1</child1>
<child2 xmlns="urn:xmlmaster:sample">DATA2</child2>
</parent>
[XML Document]
Process XML using the following code:
Document output = updateXML(doc);
Here, the variable doc references the Document instance of the loaded XML Document. Assume that there is no runtime error. The method updateXML code is as follows:
public static Document updateXML(Document doc) {
Node child1 = doc.getElementsByTagNameNS(null, "child1").item(0);
Node child2 = doc.getElementsByTagNameNS("urn:xmlmaster:sample", "child2").item(0);
child2.appendChild(child1);
return doc;
}
Option
<parent>
<child2 xmlns="urn:xmlmaster:sample">DATA2<child1>DATA1</child1></child2>
</parent>
<parent>
<child2 xmlns="urn:xmlmaster:sample">DATA2<child1 xmlns="">DATA1</child1></child2>
</parent>
<parent>
<child2 xmlns="urn:xmlmaster:sample"><child1>DATA1</child1>DATA2</child2>
</parent>
<parent>
<child2 xmlns="urn:xmlmaster:sample"><child1 xmlns="">DATA1</child1>DATA2</child2>
</parent>
Answer
B
Commentary
In order to solve this question, you must not only have an understanding of the behavior of appendChild method, but also an understanding of how XML namespaces are handled when processing the XML document. When moving a node via XML document operation using the DOM API, the node namespace does not change. Answer A and B match the execution results of the appendChild method. However, answer A does not match the XML namespaces rule as mentioned earlier, because child1 element in answer A belongs to namespace urn:xmlmaster:sample. Therefore the correct answer is B.
The following has been determined under DOM Level 2 Core with respect to XML document operations via DOM API and XML namespaces.
When moving a node via DOM operation, the node namespace does not change. In addition, there is no need to add the namespace declaration required in conjunction with moving a node.
Accordingly, the serialization of the operation result (DOM tree) through DOM API may be automatically performed, outputting an XML document like answer A, depending on the DOM parser (or execution environment).
However, the question here asks you to select the answer that best describes the results expressed under XML 1.0.
Questions related to SAX Programs
Next, I'll cover two practice questions related to SAX programs. As with DOM, the SAX program practice question shows the XML document and the SAX program that processes the document, asking you to select the processing results from a list of possible choices.
Example of a SAX Program Question Appearing on the Exam - (1)
Select the answer that best describes the output result (print method output) when processing the following [XML Document] by the method shown in [Processing by SAX]. Assume that indents (line returns, tabs or other ignorable whitespace) in the XML document are taken into consideration.
[XML Document]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
<!ELEMENT root (body*)>
<!ELEMENT body (#PCDATA)>
<!ATTLIST body section CDATA "0">
]>
<root>
<body/>
<body section="1">XML Programming</body>
</root>
[Processing by SAX]
Use the following ContentHandlerImpl class to process the XML document via SAX API. The SAX parser is configured to perform a validity check. Assume that there are no runtime errors.
class ContentHandlerImpl extends DefaultHandler {
public void characters(char[] ch, int start, int length) throws SAXException {
System.out.print("[Text]");
}
public void ignorableWhitespace(char ch[], int start, int length)
throws SAXException {
System.out.print("[Blank]");
}
}
Option
- [Blank][Blank][Text][Blank]
- [Text][Text][Text][Text]
- [Text][Text][Text]
- [Text]
Answer
A
Commentary
This example tests your understanding of ignorableWhitespace method.
First, let's address the conditions provided in the question. In this example, it asks you to take indents (line returns, tabs or other ignorable whitespace) within the XML documentation into consideration. Also, from the statement, "there is no runtime error," we know that there are no XML document well-formedness constraint violations or validity constraint violations. When reading a question, be careful not to overlook the conditions provided in the question like this. So, let's go ahead and walk through the practice question.
The SAX parser calls characters method for the text data contained in the element. But here, the SAX parser is configured to perform a validity check, so in the event that the content of a certain element is child element only (in other words, the element does not contain text string), ignorableWhitespace method is called for the whitespace (line returns or whitespace) lined up with the child element.
Tracing the XML document presented based on this principle, we first see that "[Blank]" is output twice for the whitespace surrounding the empty body element *1. Next, "[Text]" is output for "XML Programming" (character data), which is contained in the body element. Finally, "[Blank]" is output one time for the whitespace (line returns) following the second body element. Accordingly, the correct answer is A.
*1 | To be precise, the front portion of the empty body element has both line returns and whitespace (a two-character space indent). However, in this case, whether the combination of line returns and whitespace are counted as one, two or even three, depends on the implementation of the SAX parser. Here, we view this as one. |
Example of a SAX Program Question Appearing on the Exam - (2)
Select the answer that best describes the output result (print method output) when processing the following [XML Document] by the method shown in [Processing by SAX]. Ignore indents (line returns, tabs or other ignorable whitespace) within the XML document.
[XML Document]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
<!ELEMENT root (body*)>
<!ELEMENT body (#PCDATA)>
<!ATTLIST body section CDATA "0">
]>
<root>
<body/>
<body section="1"><!--Comment--><![CDATA[DOM&SAX]]>Programming</body>
</root>
[Processing by SAX]
Use the following ContentHandlerImpl class to process the XML document via SAX API. The SAX parser does not perform a validity check. Assume that there are no runtime errors.
class ContentHandlerImpl extends DefaultHandler {
public void characters(char[] ch, int start, int length) throws SAXException {
System.out.print(new String(ch, start, length));
}
}
Option
- 01CommentDOM&SAXProgramming
- 1CommentDOM&SAXProgramming
- CommentDOM&SAXProgramming
- DOM&SAXProgramming
- Programming
Answer
D
Commentary
To answer this question, you must have an understanding of what type of text string within the XML document is returned by the SAX API characters method. The following are relevant sections within the XML document not returned by characters method:
- Declarations (XML declaration and document type declaration) not included in the root element
- attribute values
- comment
What the characters method does return is the content of the body element that includes the text string defined within the CDATA section (begins with text string <![CDATA[ and ends with text string ]]>). Accordingly, the correct answer is D.
* * *
In this volume, we discussed major points related to DOM/SAX programming questions that come up in Section 2 of the XML Master Exam. An understanding of XML namespaces, whitespace, and how to handle XML documents containing entity references (which we didn't discuss here) is necessary not only for the exam but is also vital for practical work applications, so I strongly recommend you make a thorough study of this area.
XML Tutorial - XML Master Professional Application Developer Edition Indexs