Python convert binary file into string while ignoring non-ascii characters

I have a binary file and I want to extract all ascii characters while ignoring non-ascii ones. Currently I have:

with open(filename, 'rb') as fobj:
   text = fobj.read().decode('utf-16-le')
   file = open("text.txt", "w")
   file.write("{}".format(text))
   file.close

However I'm encountering an error when writing to file UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 0: ordinal not in range(128). How would I get Python to ignore non-ascii?

Answers


Use the built-in ASCII codec and tell it to ignore any errors, like:

with open(filename, 'rb') as fobj:
   text = fobj.read().decode('utf-16-le')
   file = open("text.txt", "w")
   file.write("{}".format(text.encode('ascii', 'ignore'))
   file.close()

You can test & play around with this in the Python interpreter:

>>> s = u'hello \u00a0 there'
>>> s
u'hello \xa0 there'

Just trying to convert to a string throws an exception.

>>> str(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 6: ordinal not in range(128)

...as does just trying to encode that unicode string to ASCII:

>>> s.encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 6: ordinal not in range(128)

...but telling the codec to ignore the characters it can't handle works okay:

>>> s.encode('ascii', 'ignore')
'hello  there'

Need Your Help

Syntax error in from clause - no reserved words DA.fill

vb.net database-connection ms-access-2010

Having some issues with the myDataAdapter.fill line, getting the error "syntax error in From clause" looked around at other solutions and none have worked, most seem to be about reserved words - al...

How to Use BOOL in IBACTION IOS

objective-c ios methods boolean ibaction

i need to connect an imap email server with below given -(BOOL) method