Traversing and Modifying XML Files with DOM and XPath

You'll remember from the start of this chapter that the construct of XML ensures that a parent/child hierarchy exists between all elements and attributes within an XML document. This gives XML documents an inherent logical structure that allows each construct in the document to be parsed into nodes.

XPath is a language that allows you to locate the component parts in an XML document by specifying a path to each node in the tree. With XPath, you can build an expression called a location path. A location path is similar to the file path you use to locate a file on your PC, starting at a given location, stepping through each directory in the file tree until you reach the file you want.

You compose your Xpath expression with location steps, evaluating the nodes at each step until all the location steps have been evaluated. The final step results in a node set that can be extracted, modified, deleted, and so on.

Take a moment to go through a few examples where combining DOM and XPath allows you to find, extract, and edit the data in a DOMDocument.

Keep in mind that the concepts presented here are key to your ability to program against the XML parts that make up your Excel file. Each one of the examples demonstrated here is employed later in this chapter when programming the Open XML file formats.

Return all Employee IDs

The first thing you will notice in the following example is that in addition to declaring a DOMDocument object, you are also declaring IXMLDOMNode and IXMLDOMNodeList. The IXMLDOMNode object is used to hold and pass a single node, whereas the IXMLDOMNodeList object is used to hold and pass a collection of nodes.

In this procedure, your goal is to return all Employee IDs contained in the chosen XML document. First, load the document in a DOMDocument object, as demonstrated in the previous section. Next, pass an Xpath expression to the SelectNodes method of the DOMDocument:

"/EmployeeSales/Employee/Empid"

This expression steps through each element node in the document until it reaches the Empid element. As you can see, the XPath starts with the forward slash (/). A single forward slash establishes the root element of the document as the starting point for the location steps in the XPath expression. Each single forward slash from there steps through the hierarchy of elements until the one you need (the Empid) is reached. Once the destination path is reached, the SelectNodes method executes the query, trapping each element node in the xmlnodes object variable. It then iterates through xmlnodes to retrieve each single element node and output it:

Sub FindNode() Dim oXmlDoc As DOMDocument Dim oXmlNode As IXMLDOMNode Dim oXmlNodes As IXMLDOMNodeList

'Load your XML Document into a DOM Document Set oXmlDoc = New DOMDocument oXmlDoc.async = False oXmlDoc.Load (ThisWorkbook.Path & "\EmployeeSales.xml")

'Find and select the all EMPID nodes in the document

Set oXmlNodes = oXmlDoc.SelectNodes("/EmployeeSales/Employee/Empid")

'Iterate through the nodes and output each Empid For Each oXmlNode In oXmlNodes Debug.Print oXmlNode.Text Next End Sub

Return all Nodes for Any Employee with an Invoice Amount over $3000

In this procedure, the goal is to return all the element nodes for any employees who have an invoice over $3,000:

"//Employee[InvoiceAmount>3000]"

Notice in this expression the use of the double forward slashes (//). The double forward slash is an abbreviated syntax that can be used in any expression to jump directly to the desired node. This allows programmers to avoid the redundancy of declaring each step in the node hierarchy. In this example, you are selecting all Employee elements in the document that meet a specific criterion. You can pass a criterion through XPath by using a predicate placed in brackets ([ ]), as shown in previous procedure. Here, you are limiting the employees that are returned to only those whose InvoiceAmount element node has a value greater than 3000:

Sub FindNode() Dim oXmlDoc As DOMDocument Dim oXmlNode As IXMLDOMNode Dim oXmlNodes As IXMLDOMNodeList

'Load your XML Document into a DOM Document Set oXmlDoc = New DOMDocument oXmlDoc.async = False oXmlDoc.Load (ThisWorkbook.Path & "\EmployeeSales.xml") 'Find and select the all Employee that meet the criteria

Set oXmlNodes = oXmlDoc.SelectNodes("//Employee[InvoiceAmount>3000]")

'Iterate through the nodes and output each Employee For Each oXmlNode In oXmlNodes Debug.Print oXmlNode.Text Next End Sub

Trap the Node that Contains the FirstName Mike

This procedure uses the same concepts to trap the node that contains the FirstName Mike. Note that you are using the SelectSingleNode method to trap and return one specific node. Other things to notice here are the use of the predicate text() followed by the criterion wrapped in single quotes, not double quotes. Because the iXMLDOMNode object does not hold a collection of nodes, there is no need to iterate through the nodes; you can simply output the node:

Sub FindNode()

Dim oXmlDoc As DOMDocument

Dim oXmlNode As IXMLDOMNode

Set oXmlDoc = New DOMDocument oXmlDoc.async = False oXmlDoc.Load (ThisWorkbook.Path & "\EmployeeSales.xml")

Set oXmlNode = oXmlDoc.SelectSingleNode("//FirstName[text()='Mike']") Debug.Print oXmlNode.XML End Sub

Find and Edit the Node that Contains the FirstName Mike

In this procedure, you go a step further and actually change the text of the returned node. Because you are working with an in-memory version of the XML document, you must use the Save method of the DOMDocument to save changes back to the XML document:

Sub ChangeNode() Dim oXmlDoc As DOMDocument Dim oXmlNode As IXMLDOMNode

'Create a test copy of the EmployeeSales.xml file

FileCopy ThisWorkbook.Path & "\EmployeeSales.xml", _ ThisWorkbook.Path & "\EmployeeSalesTest.xml"

'Load your XML Document into a DOM Document Set oXmlDoc = New DOMDocument oXmlDoc.async = False oXmlDoc.Load (ThisWorkbook.Path & "\EmployeeSalesTest.xml")

'Find and select the all Employee that meet the criteria

Set oXmlNode = oXmlDoc.SelectSingleNode("//FirstName[text()='Mike']")

'Edit the text and save back to the XML document oXmlNode.Text = "Michael"

oXmlDoc.Save ThisWorkbook.Path & "\EmployeeSalesTest.xml" End Sub

Find and Delete all Nodes for the Employee Bullen

In this example, first find the nodes where the employee's last name is Bullen. Once the correct set of nodes has been identified, iterate through the collection to remove each node. Here's how it works. Use each node as a starting point to find its parent by using the parentnode property. For instance, xmlnode.parentnode will return the parent of the xmlnode in focus. Once the parent is identified, use the RemoveChild method to remove the given xmlnode. After iterating through all the nodes in the collection, save the changes back to the XML document:

Sub DeleteNode() Dim oXmlDoc As DOMDocument Dim oXmlNode As IXMLDOMNode Dim oXmlNodes As IXMLDOMNodeList

'Load your XML Document into a DOM Document Set oXmlDoc = New DOMDocument oXmlDoc.async = False oXmlDoc.Load (ThisWorkbook.Path & "\EmployeeSalesTest.xml")

'Find and select the all Employee that meet the criteria

Set oXmlNodes = oXmlDoc.SelectNodes("//Employee[LastName='Bullen']")

'Find and select the all Employee that meet the criteria For Each oXmlNode In oXmlNodes oXmlNode.parentnode.RemoveChild oXmlNode Next oXmlDoc.Save ThisWorkbook.Path & "\EmployeeSalesTest.xml" End Sub

+2 -3

Responses

  • annett
    How to traverse directory in access xml dom?
    8 months ago

Post a comment