Need regex to extract fields from string
I need to extract the title, location, and price from a string like this:
10' Starcraft pop up camper (Newport) $5500
It should be obvious which are which.
However, there are also cases like this:
10' (approx.) Starcraft pop up camper (Drigg's Town, PA) $5500
When I use a simple regex, I can match the first string correctly, but not the second:
^(?<title>.+?) \((?<area>.+?)\) \$(?<price>[\d]+)$
I'm pretty sure lookaheads/backreferences can handle this, but I don't know how. Can someone help me out with some explanation? (And maybe references to an easy to read article on the subject.)
With only 2 examples, the best I can suggest is to change the lazy quantifier to greedy quantifier for title capturing group:
^(?<title>.+) \((?<area>.+?)\) \$(?<price>[\d]+)$ ^^ Here
Effectively, the pattern in area capturing group will now capture the text inside the last brackets () (providing that it is followed by text that can be matched by price capturing group).
The greedy quantifier in title consumes as much text as possible, and force the area capturing group to take the furthest possible match.
Another way is to make sure the sub-pattern in area capturing group does not contain ():
^(?<title>.+) \((?<area>[^()]+)\) \$(?<price>[\d]+)$ ^^ ^^^^^^ Here Here
I also remove the lazy quantifier, since it is redundant. There is only one way to match bracket () characters, which is before and after the text captured by area capturing group.
The 2 solutions above assume that area will never contain bracket () characters. The pattern is going to be slightly more complicated if you want to allow that.