how to get the full contents of a node using xpath & lxml?

I am using lxml's xpath function to retrieve parts of a webpage. I am trying to get contents of a <font> tag, which includes html tags of its own. If I use

//td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]

I get the right amount of nodes, but they are returned as lxml objects (<Element font at 0x101fe5eb0>).

If I use

//td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]/text()

I get exactly what I want, except that I don't get any of the HTML code which is contained within the <font> nodes.

If I use

//td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]/node()

if get a mixture of text and lxml elements! (e.g. something something <Element a at 0x102ac2140> something)

Is there anyway to use a pure XPath query to get the contents of the <font> nodes, or even to force lxml to return a string of the contents from the .xpath() method, rather than an lxml object?

Note that I'm returning a list of many nodes from the XPath query so the solution needs to support that.

just to clarify... i want to return something something <a href="url">inside</a> something from something like...

<font face="verdana" color="#ffffff" size="2"><a href="url">inside</a> something</font>

Answers


I'm not sure I understand -- is this close to what you are looking for?

import lxml.etree as le
import cStringIO
content='''\
<font face="verdana" color="#ffffff" size="2"><a href="url">inside</a> something</font>
'''
doc=le.parse(cStringIO.StringIO(content))

xpath='//font[@face="verdana" and @color="#ffffff" and @size="2"]/child::*'
x=doc.xpath(xpath)
print(map(le.tostring,x))
# ['<a href="url">inside</a> something']

Is there anyway to use a pure XPath query to get the contents of the <font> nodes, or even to force lxml to return a string of the contents from the .xpath() method, rather than an lxml object?

Note that I'm returning a list of many nodes from the XPath query so the solution needs to support that.

just to clarify... i want to return something something <a href="url">inside</a> something from something like...

<font face="verdana" color="#ffffff" size="2"><a

href="url">inside something

Short answer: No.

XPath doesn't work on "tags" but with nodes

The selected nodes are represented as instances of specific objects in the language that is hosting XPath.

In case you need the string representation of a particular node's markup, such objects typically support an outerXML property -- check the documentation of the hosting language (lxml in this case).

As @Robert-Rossney pointed out in his comment: lxml's tostring() method is equivalent to other environments' outerXml property.


Need Your Help

MySQL how to rounded value to ceil

mysql

I tried to increase my product table price from 25%,

OpenCL clock_gettime vs kernel profiling : strange results

c profiling opencl gpu

I'm trying to profile different implementations of a simple convolution.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.