Xpath doesn't want to select my tags

I have the following HTML code which contains an address:

<html>
<body>
    <div>
        <h2>Address</h2>
        <p>
            Rotes Rathaus<br />
            Rathausstrasse<br />
            10178 Berlin<br />
            Germany<br />
        </p>
    </div>
</body>
</html>

I'm trying to find the paragraph node that wraps the address (in my case: the <p> tag), and all I have is an array of parts of the address (eg.: 'Rathaus', 'Berlin', '10178').

I'm querying the dom using the following XPath selector:

//*[contains(text(),'Rathaus')]

This works great and returns met the

node. However, when I'm looking based on the postalCode, I don't get any matches:

//*[contains(text(),'10178')]

What do I need to do in order to solve this? Please note that the location of the address can be anywhere on the page.

Best regards, Nicolas

Answers


Solution

Use

//*[text()[contains(.,'10178')]]

and the pelement will be selected as a result. It means

look for any element node anywhere in the document, but only if there is at least one child text node whose string value contains "10178".

On the other hand, your original expression:

//*[contains(text(),'10178')]

means:

look for any element node anywhere in the document, but only if the first of its child text nodes contains the string "10178".

Explanation

You are surprised by the result because of the way functions work in XPath 1.0. A function like contains() expects a single node as the first argument. If it is handed a set of nodes, it will only process the first one and ignore the rest.

The other thing you need to understand is that text that is separated by child elements ends up in separate text nodes. So, the text content of P is actually cut up into several text nodes, because of the intervening br elements.

You can check this by evaluating an expression like

//p/node()           |  Find `p` elements anywhere in the document and return all nodes
                        that are their children, regardless of the type of node.

on the document you have shown, and it will return (individual results separated by -------):

            Rotes Rathaus
-----------------------
<br/>
-----------------------

            Rathausstrasse
-----------------------
<br/>
-----------------------

            10178 Berlin
-----------------------
<br/>
-----------------------

            Germany
-----------------------
<br/>
-----------------------

As you can see, the textual content of p is stored in a separate text node whenever there is a br in between. At this point you should realise that your original expression would have worked if "10178" had happened to be in the first text node, instead of the third. And perhaps you can guess what //p/text()[3] would yield?


Last hint: This changes with XPath 2.0, where more than one item is a true sequence of nodes and where functions will process each node in turn.


Need Your Help

Mapping With Restkit iphone

iphone objective-c xcode mapping restkit

I am trying to map and store a response json data using restkit object mapping. unfortunately i cannot able to map a response data. when i view my database the datas are storing, but i get a crash ...

How to detect when window content has changed

windows winapi

I need to write a screencast, and need to detect when window content has changed, even only text was selected. This window is third party control.

Check if there are any subscribers to an event handler whilst using a custom delegate

c# events delegates

So the idea was simple, how do i check if there are any duplicate subscribers.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.