Apostrophe within Python lookbehind assertion
I'm trying to use a Python regular expression to get the first token of a character-separated string. I don't want to treat backslashed separators as real separators, so I'm using a negative lookbehind assertion. When the separator is a comma, it works without problem.
>>> import re >>> re.match("(.*?)(?<!\\\\),.*", "Hello\, world!,This is a comma separated string,Third value").groups(1) 'Hello\\, world!'
Whereas the exact same code by replacing the comma with an apostrophe does not work at all.
>>> import re >>> re.match("(.*?)(?<!\\\\)'.*", "Hello\' world!'This is an apostrophe separated string'Third value").groups(1) 'Hello' >>>
I'm using python 2.7.2, but I have the same behavior with Python 3 (tested on Ideone). The Python re documentation does not indicate that ' is a special character, so I'm really wondering, why is my ' treated differently?
(Please, no comments: Who would want to have an apostrophe-separated file. Well... I do...)
As you can see "\'" doesn't actually have a \\ in it. Hence when you change it to "\\'" the pattern matches producing:
"\'" is actually an escape sequence:
\' Single quote (')
Clearly, the reason
>>> ord("\'") == ord("'") True
Is because "\'" is equivalent to "'". It makes sense \' is an escape sequence:
>>> 'i\'ll' "i'll"