Home Search Contact us About us
Title Using the XPath notation to locate data in an XML document.
Summary XPath is like the fully qualified path to a file in a directory except it it points to a piece of data in a XML document. It is very useful for extracting data with predefined descriptors in the document.
Contributor John McTainsh
Published 8-Jun-2001
Last updated 8-Jun-2001
Page rating   90% for 2 votes Useless Brilliant

Description.

As XML has become more popular, the need to extract information in a simple way has arisen (possibly in parallel). One simple way to extract data from an XML document is using the XPath notation. With XPath the exact element is referenced like you would a file in a sub directory. Consider the following XML.
<People>
	<SillyPeople>
		<MyPerson Name="Mr Wal Mart">This is very rich man.	</MyPerson>
		<MyPerson Name="Tod">No so rich</MyPerson>
	</SillyPeople>
	<BrilliantPeople>
		<MyPerson Name="John McTainsh">This is a very cool person.</MyPerson>
	</BrilliantPeople>
</People>

In the above XML Tod's data is represented with the XPath of //People/SillyPeople/MyPerson[@Name='Tod']. It looks a bit cryptic to start with but with time it will grow on you. The code presented here shows how to extract data using this method and points out a few traps for young players.

Where to find out about XPath.

XPath stands for XML Path Language. It is defined by W3C in http://www.w3.org/TR/xpath

What we need to get started.

Microsoft has a COM object called the Microsoft MSXML Parser tool to create and parse XML. To use this we need to add the following code to stdafx.h. It is very important to note we are using MSXML Parser 3 and the MSXML2 namespace.

#import "msxml3.dll"
using namespace MSXML2;

MSXML Parse 3.0 can be downloaded from Microsoft for free here.

Because we are working with COM will also need to call CoInitialize(NULL); and CoUninitialize(); in our code and start-up and shutdown respectively.

Extracting the data.

The following code segment extracts the data at //People/SillyPeople/MyPerson[@Name='Tod'] with these steps;

  • Creates an XML Document 2 smart pointer.
  • Loads it with an XML document. Note: Async is false.
  • Set the Language to XPath.
  • Request a list of nodes that match the search XPath search criteria. This may be more than on item.
  • Iterate through the items displaying the data.
    try
    {
        // Create the XML Document
        IXMLDOMDocument2Ptr pXMLDoc(__uuidof(MSXML2::DOMDocument)); 

        // Load the XML from a file or a string
        pXMLDoc->put_async(VARIANT_FALSE);
        ASSERT( pXMLDoc->loadXML(
        _T( "<People>"
            "   <SillyPeople>"
            "       <MyPerson Name=\"Mr Wal Mart\">Rich man.</MyPerson>"
            "       <MyPerson Name=\"Tod\">Is a clown.</MyPerson>"
            "   </SillyPeople>"
            "   <BrilliantPeople>"
            "       <MyPerson Name=\"John McTainsh\">Is cool.</MyPerson> "
            "   </BrilliantPeople>"
            "</People>") ) );   

        // Very important to set the language
        pXMLDoc->setProperty( _T("SelectionLanguage"), _T("XPath") );

        // Get the list of items we are looking.
        //bstr_t bsLookFor( _T("//People/SillyPeople/MyPerson[@Name='Tod']") );
        bstr_t bsLookFor( _T("//SillyPeople/MyPerson") );
        IXMLDOMNodeListPtr pNodeList = pXMLDoc->documentElement->selectNodes( bsLookFor );

        int nList = pNodeList->length;
        TRACE( _T("Looking for = %s\n"), (LPTSTR)bsLookFor );
        TRACE( _T("Found %d item(s)\n"), nList );

        // Iterate through each item found
        for( int n = 0; n < nList; n++ )
        {
            IXMLDOMNodePtr pNode = pNodeList->item[n];
            bstr_t bsNodeText = pNode->text;
            TRACE( _T("Text = %s\n"), (LPTSTR)bsNodeText );
        }
    }
    catch(_com_error &e)
    {
        // Display any com error in a MessageBox.
        bstr_t bstrSource(e.Source());
        bstr_t bstrDescription(e.Description());
        CString sErr, sOutMessage;
        sErr.Format( _T("Code = 0x%08lx\n"), e.Error());
        sOutMessage += sErr;
        sErr.Format( _T("Code meaning = %s\n"), e.ErrorMessage());
        sOutMessage += sErr;
        sErr.Format( _T("IErrorInfo.Source = %s\n"), (LPTSTR)bstrSource );
        sOutMessage += sErr;
        sErr.Format( _T("IErrorInfo.Description = %s"), (LPTSTR)bstrDescription );
        sOutMessage += sErr;
        AfxMessageBox( sOutMessage );
    }
Comments Date
Home Search Contact us About us