Negative backreferences in MySQL REGEXP
MySQL manual is not very detailed about what expressions it supports, so I am not sure if the following is possible with MySQL at all.
I am trying to create a query with RLIKE which matches the following.
The task is to get from SQL all the sentences which contains at least any two words from the given sentence.
Let's say, I have some certain words to use in regex:
I have following sentences in the database:
hello from dog hello hello cat dog says hello dog dog goes away big bad dog
From those all I want to match only
hello from dog dog says hello
For now I have it like this:
SELECT * FROM test WHERE test RLIKE '(hello|dog).*(hello|dog)'
The problem is - I get also those unneeded
hello hello cat dog dog goes away
So I guess, I need a backreference right before the second (hello|dog).
In pseudo code it would look like this:
RLIKE '(hello OR dog) anything can be here (hello OR dog, but not the word which already was in the previous group)'
so it could be like:
'(hello|dog).*(negative backreference to the 1st group goes here)(hello|dog)'
Can such negative backreference be done in MySQL regex? Or maybe there is some better way to write the regex which does the same thing, but also considering that the query will get generated by some C++ code, so it shouldn't be too complex to generate?
MySQL uses a Posix Extended Regular Expression engine (POSIX ERE) and therefore doesn't support backreferences at all. It also doesn't support lookaround which you would need in order to construct a single regex that could handle this.
Therefore you have to spell out all possible combinations:
Of course, this will get unwieldy if the number of match candidates increases, so regular expressions are not the right tool for this in MySQL, unless you can install/use LIB_MYSQLUDF_PREG.