Extracting attributes out of the XML using Python

I want to extract attributes out of the xml file using this code xml file is :

xml= "<graphics type='xxx' port=’0’ autoport='xxx' listen='0.0.0.0'>
  <listen type='address' address='0.0.0.0'/>
</graphics>"

and the code is :

def xml_to_dict(xml):  
      d={}   
      if xmlk.text:
         d[xmlk.tag] = xmlk.text  
      else:
         d[xmlk.tag] = {}   
     children = xmlk.getchildren()   
     if children:
         d[xmlk.tag] = map(xml_to_dict, children)  
         return d

     xml_to_dict(xyz) Output: {'graphics': [{'listen': {}}]}

i have tried dmlk,attrib instead of tag but to no avail. Does anybody knows this

Answers


I'd suggest lxml, but the following code works with either lxml or ElementTree

You also need to tweak your agorithm a bit:

from xml.etree import ElementTree as etree
tree = etree.fromstring(xml)

def xml_to_dict(tree):  
  d={}   
  if tree.text:
     d[tree.tag] = tree.text  
  elif len(tree) < 0:
     d[tree.tag] = {}   
  else:   
     d[tree.tag] = map(xml_to_dict, tree)  
 return d

That gives you what you asked for.


from lxml import etree
dict(etree.fromstring(xml).items())

outputs

{'autoport': 'xxx', 'type': 'xxx', 'port': '0', 'listen': '0.0.0.0'}

It is not clear what the output format you'd like to use. Here's one possibility that tries to be close to your code:

empty = lambda s: not (s and s.strip())

def xml_to_dict(root):
    assert empty(root.tail), 'tail is not supported'
    d = root.attrib
    assert root.tag not in d, 'tag and attribute name conflict'
    if len(root) > 0: # has children
       assert empty(root.text), 'text and chilren conflict'
       d[root.tag] = map(xml_to_dict, root)
    elif not empty(root.text):
       d[root.tag] = root.text
    return d

It is not reversible in general case.

Example
import pprint
import xml.etree.ElementTree as etree

xml = """<graphics type='xxx' port='0' autoport='xxx' listen='0.0.0.0'>
  <listen type='address' address='0.0.0.0'/>
 <value>1</value>
 <blank/>
</graphics>
"""
pprint.pprint(xml_to_dict(etree.fromstring(xml)))
Output
{'autoport': 'xxx',
 'graphics': [{'address': '0.0.0.0', 'type': 'address'}, {'value': '1'}, {}],
 'listen': '0.0.0.0',
 'port': '0',
 'type': 'xxx'}

Note: <listen> tag name is not present in the graphics list and <blank/> is reduced to {} in it.


Need Your Help

Compute affinity matrix from distance matrix

python bioinformatics affinity

I used clustal omega to get a distance matrix of 500 protein sequences (they are homologous to each other).

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.