Try the following following expression to find a pattern of: "a" (optional) followed by up to two words, followed by "b", followed by "c".
The input is "x1 x2 x3 a b c" and therefore the match is "x3 a b c", since it satisfied the condition of up to two words before the "b".
Match match = Regex.Match("x1 x2 x3 a b c", @"((a\s+)?(\w+\s+){0,2}b\s+c)");
However, turn the condition of up to two words to be lazy, and you get the match "x2 x3 a b c", which does not qualify the expression at all (lazy or not).
Match match = Regex.Match("x1 x2 x3 a b c", @"((a\s+)?(\w+\s+){0,2}?b\s+c)");
So why do we get "x2 x3 a b c" as a result of this regex?
This appears to be a Regex bug. There are 2 groups in the match, this non-greedy pattern is producing. The first group is the entire string "a x1 x2 x3 b c". The second group is "x3". But examine the second group's "Captures" collection and you will see 3 captures, namely x1, x2, x3. Therefore, (\w+\s+){0,2}? captures 3 instead of at most two words. Hence, I believe it's a bug.
Given the pattern, "a\s(\w+\s+){0,2}?b\s+c", the following strings should produce the following results
"a a b c" should match "a b c" at index=2 (but it matches "a a b c" instead)
"a x1 b c" should match "a x1 b c"
"a x1 x2 b c" should match "a x1 x2 b c"
"a x1 x2 x3 b c" should not match (but it does)
"a a x1 b c" should match "a x1 b c" at Index=2 (but it matches "a a x1 b c" instead)
"a a a x1 b c" should match "a x1 b c" at index=4 (but it matches "a a a x1 b c" instead)
"a a x1 x2 b c" should match "a x1 x2 b c" at index=2 (but it matches "a a x1 x2 b c" instead)
"a a a x1 x2 b c" should match "a x1 x2 b c" at index=4 (but it matches "a a x1 x2 b c" at index=2 instead)
The other pattern ("a\s(\w+\s+){0,2}b\s+c") works properly, i.e.,
"a a b c" should match "a a b c"
"a x1 b c" should match "a x1 b c"
"a x1 x2 b c" should match "a x1 x2 b c"
"a x1 x2 x3 b c" should not match
"a a x1 b c" should match "a a x1 b c"
"a a a x1 b c" should match "a a x1 b c" at index=2
"a a x1 x2 b c" should match "a x1 x2 b c" at index=2
"a a a x1 x2 b c" should match "a x1 x2 b c" at index=4
At first, I thought {0,2}? made no sense, but given the strings like "a a x1 b c", there is certainly a use for a non-greedy {n,m}.
No comments:
Post a Comment