Is there a better way to give elements knowlege of their parents and xpath in xml.etree.ElementTree
I have the following code which works:
import xml.etree.ElementTree as etree def get_path(self): parent = '' path = self.tag sibs = self.parent.findall(self.tag) if len(sibs) > 1: path = path + '[%s]'%(sibs.index(self)+1) current_node = self while True: parent = current_node.parent if not parent: break ptag = parent.tag path = ptag + '/' + path current_node = parent return path etree._Element.get_path = get_path etree._Element.parent = None class XmlDoc(object): def __init__(self): self.root = etree.Element('root') self.doc = etree.ElementTree(self.root) def SubElement(self, parent, tag): new_node = etree.SubElement(parent, tag) new_node.parent = parent return new_node doc = XmlDoc() a1 = doc.SubElement(doc.root, 'a') a2 = doc.SubElement(doc.root, 'a') b = doc.SubElement(a2, 'b') print etree.tostring(doc.root), '\n' print 'element:'.ljust(15), a1 print 'path:'.ljust(15), a1.get_path() print 'parent:'.ljust(15), a1.parent, '\n' print 'element:'.ljust(15), a2 print 'path:'.ljust(15), a2.get_path() print 'parent:'.ljust(15), a2.parent, '\n' print 'element:'.ljust(15), b print 'path:'.ljust(15), b.get_path() print 'parent:'.ljust(15), b.parent
Which results in this output:
<root><a /><a><b /></a></root> element: <Element a at 87e3d6c> path: root/a parent: <Element root at 87e3cec> element: <Element a at 87e3fac> path: root/a parent: <Element root at 87e3cec> element: <Element b at 87e758c> path: root/a/b parent: <Element a at 87e3fac>
Now this is drastically changed from the original code, but I'm not allowed to share that.
The functions aren't too inefficient but there is a dramatic performance decrease when switching from cElementTree to ElementTree which I expected, but from my experiments it seems like monkey patching cElementTree is impossible so I had to switch.
What I need to know is whether there is either a way to add a method to cElementTree or if there is a more efficient way of doing this so I can gain some of my performance back.
Just to let you know I am thinking of as a last resort implementing selected static typing and to compile with cython, but for certain reasons I really don't want to do that.
Thanks for taking a look.
EDIT: Sorry for the wrong use of the term late binding. Sometimes my vocabulary leaves something to be desired. What I meant was "monkey patching."
EDIT: @Corley Brigman, Guy: Thank you very much for your answers which do address the question, however (and I should have stated this in the original post) I had completed this project before using lxml which is a wonderful library that made coding a breeze but due to new requirements (This needs to be implemented as an addon to a product called Splunk) which ties me to the python 2.7 interpreter shipped with Splunk and eliminates the possibility of adding third party libraries with the exception of django.
If you need parents, use lxml instead - it tracks parents internally, and is still C behind the scenes so it's very fast.
However... be aware that there is a tradeoff in tracking parents, in that a given node can only have a single parent. This isn't usually a problem, however, if you do something like the following, you will get different results in cElementTree vs. lxml:
p = Element('x') q = Element('y') r = SubElement(p, 'z') q.append(r)
dump(p) <x><z /></x> dump(q) <y><z /></y>
dump(p) <x/> dump(q) <y> <z/> </y>
Since parents are tracked, a node can only have one parent, obviously. As you can see, the element r is copied to both trees in cElementTree, and reparented/moved in lxml.
There are probably only a small number of use cases where this matters, but something to keep in mind.
you can just use xpath, for example:
import lxml.html def get_path(): for e in doc.xpath("//b//*"): print e
should work, didn't test it though...