Regex match of apostrophe in autohotkey script
I have an autohotkey script which looks up a word in a bilingual dictionary when I double click any word on a webpage. If I click on something like "l'homme" the l' is copied into the clipboard as well as the homme. I want the autohotkey script to strip out everything up to and including the apostrophe.
I can't get autohotkey to match the apostrophe. Below is a sample script which prints out the ascii values of the first four characters. If I double click "l'homme" on this page, it prints out: 108,8217,104,111. The second character is clearly not the ascii code for an apostrophe. I think it's most probably something to do with the HTML representation of an apostrophe, but I haven't been able to get to the bottom of it. I've tried using autohotkey's transform, HTML function without any luck.
I've tried both the Unicode and non-Unicode versions of autohotkey. I've saved the script in UTF-8.
#Persistent return OnClipboardChange: ;debugging info: c1 := Asc(SubStr(clipboard,1,1)) c2 := Asc(SubStr(clipboard,2,1)) c3 := Asc(SubStr(clipboard,3,1)) c4 := Asc(SubStr(clipboard,4,1)) Msgbox 0,info, char1: %c1% `nchar2: %c2% `nchar3: %c3% `nchar4: %c4% ;the line below is what I want to use, but it doesn't find a match stripToApostrophe:= RegExReplace(clipboard,".*’")
There is the standard quote ' and there is the "curling" quote ’.
Your regex might have to be
to cover both cases.
Maybe you'd like to make it non-greedy, too, if a word can have more than one apostrophe and you only want to remove the first:
Interesting. I tried this:
w1 := "l’homme" w2 := "l'homme" c1 := Asc(SubStr(w1,2,1)) c2 := Asc(SubStr(w2,2,1)) v1 := RegExReplace(w1, ".*?['’]") v2 := RegExReplace(w2, ".*?['’]") MsgBox 0,info, %c1% - %c2% - %v1% - %v2% return
And got back 146 - 39 - homme - homme. I'm editing from Notepad. Is it possible that our regex, while we think we're typing 8217, actually has 146 upon our pasting?
Apparently unicode support was added only for AutoHotkey_L. Using it, I believe the correct regex should be either
".*?(" Chr(0x0027) "|" Chr(0x0092) "|" Chr(0x2019) ")"