ElementTree and unicode

I have this char in an xml file:

<data>
  <products>
      <color>fumè</color>
  </product>
</data>

I try to generate an instance of ElementTree with the following code:

string_data = open('file.xml')
x = ElementTree.fromstring(unicode(string_data.encode('utf-8')))

and I get the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 185: ordinal not in range(128)

(NOTE: The position is not exact, I sampled the xml from a larger one).

How to solve it? Thanks

Answers


You do not need to decode XML for ElementTree to work. XML carries it's own encoding information (defaulting to UTF-8) and ElementTree does the work for you, outputting unicode:

>>> data = '''\
... <data>
...   <products>
...       <color>fumè</color>
...   </products>
... </data>
... '''
>>> x = ElementTree.fromstring(data)
>>> x[0][0].text
u'fum\xe8'

If your data is contained in a file(like) object, just pass the filename or file object directly to the ElementTree.parse() function:

x = ElementTree.parse('file.xml')

Might you have stumbled upon this problem while using Requests (HTTP for Humans), response.text decodes the response by default, you can use response.content to get the undecoded data, so ElementTree can decode it itself. Just remember to use the correct encoding.

More info: http://docs.python-requests.org/en/latest/user/quickstart/#response-content


Need Your Help

What's the best way to deal with cache and the browser back button?

asp.net caching back-button

What's the best way to handle a user going back to a page that had cached items in an asp.net app? Is there a good way to capture the back button (event?) and handle the cache that way?

Why does eclipse require multiple imports?

java eclipse

a.java has import java.io.*; I dragged all the java to link to the project, but some ended up with red icon, while a.java with yellow. Those with red don't have imports, and act like there are no

Nested Loop Smalltalk

loops nested smalltalk pharo

I am a newbie on Smalltalk,after I did study overall the topics and I thought I was ready to start to my project but when I started I just got stucked in doing nested for loops at the very beginnin...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.