It all started so innocently. I wanted to test-drive the creation of a JSP tag. And I didn't want to create a fragile test that did string comparisons on the output. Remembering Alex Chafee's XPath Explorer, I found:

Maybe you're using HTTPUnit to unit test your Web site, and you're sick of using the W3C DOM classes to painstakingly walk down your DOM tree. You can use XPath to jump immediately to the value you're looking for and assert that it's present. Using Jaxen, that's as easy as...

    assertXPathEquals("book list should contain title " + expectedTitle,

    void assertXPathEquals(String message, String xpath,
       String expected, WebResponse response) throws Exception
        String value = new XPath( xpath ).valueOf( response.getDOM() );
        assertEquals(message, expected, value);

Do you realize how many packages define a class or interface named XPath? The paths of XML remind me of the old Colossal Cave Adventure Game. You are in a maze of twisty passages, all alike. There are so many XPath libraries for java, it took me awhile to track things down. Some of the libaries are in source form, and have to be compiled.

Then there are multiple DOM parsers. HttpUnit comes with two, JTidy and NekoHTML.

So, what works?

The only combination I've found to work, so far, is the JTidy parser and the DOMXPath of Jaxen. Since I'm already using HttpUnit, I'm using HttpUnit's interface to JTidy for parsing strings.

    public Node parse(String htmlContent) throws MalformedURLException, IOException, SAXException {
        HTMLParser parser = HTMLParserFactory.getHTMLParser();
        DocumentAdapter documentAdapter = this;
        parser.parse(new URL("file:/inMemory.html"), htmlContent, documentAdapter);
        return documentAdapter.getRootNode();

You can also use HttpUnit to fetch a page and grab the DOM from the WebResponse.

        WebConversation wc = new WebConversation();
        WebRequest req = new GetMethodWebRequest(url);
        WebResponse resp = wc.getResponse(req);
        Document rootNode = resp.getDOM();

Either way, you can easily retrieve the string contents of the first node indicated by the XPath

        XPath xpath = makeXpath(xpathString);
        String contents = xpath.stringValueOf(rootNode);

the node itself

        XPath xpath = makeXpath(xpathString);
        Object node = xpath.selectSingleNode(rootNode);

or, in cases where the XPath expression matches multiple nodes, get all of them

        XPath xpath = makeXpath(xpathString);
        List nodes = xpath.selectNodes(rootNode);

where, for Jaxen, the method makeXpath() is

    private XPath makeXpath(String xpathString) throws JaxenException {
        return new DOMXPath(xpathString);

See also HtmlTestingUsingXpath/MyHtmlParser HtmlTestingUsingXpath/MyHtmlParserTest HtmlTestingUsingXpathAndHtmlUnit

iDIAcomputing: HtmlTestingUsingXpath (last edited 2009-07-27 18:25:09 by localhost)