How to tell sed “do not remove some characters”?
I have a text file containing Arabic characters and some other characters (punctuation marks, numbers, English characters, ... ). How can I tell sed to remove all the characters in the file, except Arabic ones? In short I can say that we typically tell sed to remove/replace some specific characters and print others, but now I am looking for a way to tell sed just print my desired characters, and remove all other characters.
Answers
With GNU sed, you should be able to specify characters by their hex code. You can use those in a a character class:
sed 's/[\x00-\x7F]//g' # hex notation sed 's/[\o000-\o177]//g' # octal notation
You should also be able to achieve the same effect with the tr command:
tr -d '[\000-\177]'
Both methods assume UTF8 encoding of your input file. Multi-byte characters have their highest bit set, so you can simply strip everything that's a standard ASCII (7 bits) character.
To keep everything except some well defined characters, use a negative character classe:
sed 's/[^characters you want to keep]//g'
Using a pattern alike to [^…]\+ might improve performance of the regex.
Need Your Help
In Visual Studio, when would I want to use the Test View?
visual-studio unit-testing visual-studio-2008 visual-studio-2005
For managing unit tests in Visual Studio, I use the Test List Editor. There's also a Test View which looks similar but more limited. When would I want to use the Test View as opposed to the Test List