Removing control characters from a string in python

I currently have the following code

def removeControlCharacters(line):
    i = 0
    for c in line:
        if (c < chr(32)):
            line = line[:i - 1] + line[i+1:]
            i += 1
    return line

This is just does not work if there are more than one character to be deleted.

Answers


You could use str.translate with the appropriate map, for example like this:

>>> mpa = dict.fromkeys(range(32))
>>> 'abc\02de'.translate(mpa)
'abcde'

There are hundreds of control characters in unicode. If you are sanitizing data from the web or some other source that might contain non-ascii characters, you will need Python's unicodedata module. The unicodedata.category(…) function returns the unicode category code (e.g., control character, whitespace, letter, etc.) of any character. For control characters, the category always starts with "C".

This snippet removes all control characters from a string.

import unicodedata
def remove_control_characters(s):
    return "".join(ch for ch in s if unicodedata.category(ch)[0]!="C")

Examples of unicode categories:

>>> from unicodedata import category
>>> category('\r')      # carriage return --> Cc : control character
'Cc'
>>> category('\0')      # null character ---> Cc : control character
'Cc'
>>> category('\t')      # tab --------------> Cc : control character
'Cc'
>>> category(' ')       # space ------------> Zs : separator, space
'Zs'
>>> category(u'\u200A') # hair space -------> Zs : separator, space
'Zs'
>>> category(u'\u200b') # zero width space -> Cf : control character, formatting
'Cf'
>>> category('A')       # letter "A" -------> Lu : letter, uppercase
'Lu'
>>> category(u'\u4e21') # 両 ---------------> Lo : letter, other
'Lo'
>>> category(',')       # comma  -----------> Po : punctuation
'Po'
>>>

Need Your Help

How to make a tkinter GUI output screen following a database search

python user-interface tkinter

Hi I have created a database and have used a query which all works on python. Now I am trying to create a GUI for the program. The query depends on a user input, I have so far managed to create a m...

How to convert IETF BCP 47 language identifier to ISO-639-2?

python ios iso-639-2 ietf-bcp-47

I am writing a server API for an iOS application. As a part of the initialization process, the app should send the phone interface language to server via an API call.