Closed Bug 351344 Opened 18 years ago Closed 16 years ago

Perl RegExp test 257: /((\3|b)\2(a)){2,}/.exec('bbaababbabaaaaabbaaaabba')

Categories

(Core :: JavaScript Engine, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: bc, Unassigned)

References

()

Details

expected bbaaaabba,bba,b,a (MSIE6 returns null) actual abba,bba,b,a
My understanding of this is: The first time through, \3 is the empty string. So we're looking for /((|b)\2(a))/ the first time, which is either /(()(a))/ or /((b)b(a))/. The first string that matches that is the first three characters, "bba". The second and subsequent times \3 is "a" and \2 is "b", so we're looking for /((a|b)b(a))/, either "aba" or "bba". The former matches, so we're up to "bbaaba" as a match, then the latter matches, giving us "bbaababba". After that nothing matches and we're done. (If, the first time through, we had matched "a" instead of "bba", then the second and subsequent times, \3 would be "a", and \2 would be the empty string, so we'd be looking for /((a|b)(a))/ the first time, which is either /((a)(a))/ or /((b)(a))/.) Am I wrong? (I'm saying the expected result is "bbaababba", not "bbaaaabba". I agree with the other parts of the expected results.)
What Mozilla returns makes sense if we assume that the captures are replaced with each match, as would seem to be suggested by: /((a|b)\2)+/.exec("aaaabbb") ...though I don't understand the difference between these two: /((a|b)\2)+/.exec("aaabbb") /((a|b)\2)+/.exec("aaaabbb")
The spewage from perl re debug: Compiling REx `((\3|b)\2(a)){2,}' size 27 Got 220 bytes for offset annotations. first at 7 1: CURLYX[0] {2,32767}(26) 3: OPEN1(5) 5: OPEN2(7) 7: BRANCH(10) 8: REF3(13) 10: BRANCH(13) 11: EXACT <b>(13) 13: CLOSE2(15) 15: REF2(17) 17: OPEN3(19) 19: EXACT <a>(21) 21: CLOSE3(23) 23: CLOSE1(25) 25: WHILEM[1/1](0) 26: NOTHING(27) 27: END(0) floating "a" at 1..2147483647 (checking floating) minlen 2 Offsets: [27] 14[4] 0[0] 1[1] 0[0] 2[1] 0[0] 2[1] 3[2] 0[0] 5[1] 6[1] 0[0] 7[1] 0[0] 8[2] 0[0] 10[1] 0[0] 11[1] 0[0] 12[1] 0[0] 13[1] 0[0] 17[0] 17[0] 18[0] Guessing start of match, REx "((\3|b)\2(a)){2,}" against "bbaababbabaaaaabbaaaabba"... Found floating substr "a" at offset 2... Guessed: match at offset 0 Matching REx "((\3|b)\2(a)){2,}" against "bbaababbabaaaaabbaaaabba" Setting an EVAL scope, savestack=3 0 <> <bbaababbabaa> | 1: CURLYX[0] {2,32767} 0 <> <bbaababbabaa> | 25: WHILEM[1/1] 0 out of 2..32767 cc=bffff434 0 <> <bbaababbabaa> | 3: OPEN1 0 <> <bbaababbabaa> | 5: OPEN2 0 <> <bbaababbabaa> | 7: BRANCH Setting an EVAL scope, savestack=13 0 <> <bbaababbabaa> | 8: REF3 failed... 0 <> <bbaababbabaa> | 11: EXACT <b> 1 <b> <baababbabaa> | 13: CLOSE2 1 <b> <baababbabaa> | 15: REF2 2 <bb> <aababbabaa> | 17: OPEN3 2 <bb> <aababbabaa> | 19: EXACT <a> 3 <bba> <ababbabaa> | 21: CLOSE3 3 <bba> <ababbabaa> | 23: CLOSE1 3 <bba> <ababbabaa> | 25: WHILEM[1/1] 1 out of 2..32767 cc=bffff434 3 <bba> <ababbabaa> | 3: OPEN1 3 <bba> <ababbabaa> | 5: OPEN2 3 <bba> <ababbabaa> | 7: BRANCH Setting an EVAL scope, savestack=23 3 <bba> <ababbabaa> | 8: REF3 4 <bbaa> <babbabaa> | 13: CLOSE2 4 <bbaa> <babbabaa> | 15: REF2 failed... 3 <bba> <ababbabaa> | 11: EXACT <b> failed... failed... Clearing an EVAL scope, savestack=13..23 failed... failed... Setting an EVAL scope, savestack=3 1 <b> <baababbabaa> | 1: CURLYX[0] {2,32767} 1 <b> <baababbabaa> | 25: WHILEM[1/1] 0 out of 2..32767 cc=bffff434 1 <b> <baababbabaa> | 3: OPEN1 1 <b> <baababbabaa> | 5: OPEN2 1 <b> <baababbabaa> | 7: BRANCH Setting an EVAL scope, savestack=13 1 <b> <baababbabaa> | 8: REF3 failed... 1 <b> <baababbabaa> | 11: EXACT <b> 2 <bb> <aababbabaa> | 13: CLOSE2 2 <bb> <aababbabaa> | 15: REF2 failed... failed... failed... Setting an EVAL scope, savestack=3 2 <bb> <aababbabaa> | 1: CURLYX[0] {2,32767} 2 <bb> <aababbabaa> | 25: WHILEM[1/1] 0 out of 2..32767 cc=bffff434 2 <bb> <aababbabaa> | 3: OPEN1 2 <bb> <aababbabaa> | 5: OPEN2 2 <bb> <aababbabaa> | 7: BRANCH Setting an EVAL scope, savestack=13 2 <bb> <aababbabaa> | 8: REF3 failed... 2 <bb> <aababbabaa> | 11: EXACT <b> failed... failed... failed... Setting an EVAL scope, savestack=3 3 <bba> <ababbabaa> | 1: CURLYX[0] {2,32767} 3 <bba> <ababbabaa> | 25: WHILEM[1/1] 0 out of 2..32767 cc=bffff434 3 <bba> <ababbabaa> | 3: OPEN1 3 <bba> <ababbabaa> | 5: OPEN2 3 <bba> <ababbabaa> | 7: BRANCH Setting an EVAL scope, savestack=13 3 <bba> <ababbabaa> | 8: REF3 failed... 3 <bba> <ababbabaa> | 11: EXACT <b> failed... failed... failed... Setting an EVAL scope, savestack=3 4 <bbaa> <babbabaa> | 1: CURLYX[0] {2,32767} 4 <bbaa> <babbabaa> | 25: WHILEM[1/1] 0 out of 2..32767 cc=bffff434 4 <bbaa> <babbabaa> | 3: OPEN1 4 <bbaa> <babbabaa> | 5: OPEN2 4 <bbaa> <babbabaa> | 7: BRANCH Setting an EVAL scope, savestack=13 4 <bbaa> <babbabaa> | 8: REF3 failed... 4 <bbaa> <babbabaa> | 11: EXACT <b> 5 <bbaab> <abbabaa> | 13: CLOSE2 5 <bbaab> <abbabaa> | 15: REF2 failed... failed... failed... Setting an EVAL scope, savestack=3 5 <bbaab> <abbabaa> | 1: CURLYX[0] {2,32767} 5 <bbaab> <abbabaa> | 25: WHILEM[1/1] 0 out of 2..32767 cc=bffff434 5 <bbaab> <abbabaa> | 3: OPEN1 5 <bbaab> <abbabaa> | 5: OPEN2 5 <bbaab> <abbabaa> | 7: BRANCH Setting an EVAL scope, savestack=13 5 <bbaab> <abbabaa> | 8: REF3 failed... 5 <bbaab> <abbabaa> | 11: EXACT <b> failed... failed... failed... Setting an EVAL scope, savestack=3 6 <baaba> <bbabaaa> | 1: CURLYX[0] {2,32767} 6 <baaba> <bbabaaa> | 25: WHILEM[1/1] 0 out of 2..32767 cc=bffff434 6 <baaba> <bbabaaa> | 3: OPEN1 6 <baaba> <bbabaaa> | 5: OPEN2 6 <baaba> <bbabaaa> | 7: BRANCH Setting an EVAL scope, savestack=13 6 <baaba> <bbabaaa> | 8: REF3 failed... 6 <baaba> <bbabaaa> | 11: EXACT <b> 7 <aabab> <babaaaa> | 13: CLOSE2 7 <aabab> <babaaaa> | 15: REF2 8 <ababb> <abaaaaa> | 17: OPEN3 8 <ababb> <abaaaaa> | 19: EXACT <a> 9 <babba> <baaaaab> | 21: CLOSE3 9 <babba> <baaaaab> | 23: CLOSE1 9 <babba> <baaaaab> | 25: WHILEM[1/1] 1 out of 2..32767 cc=bffff434 9 <babba> <baaaaab> | 3: OPEN1 9 <babba> <baaaaab> | 5: OPEN2 9 <babba> <baaaaab> | 7: BRANCH Setting an EVAL scope, savestack=23 9 <babba> <baaaaab> | 8: REF3 failed... 9 <babba> <baaaaab> | 11: EXACT <b> 10 <abbab> <aaaaabb> | 13: CLOSE2 10 <abbab> <aaaaabb> | 15: REF2 failed... failed... Clearing an EVAL scope, savestack=13..23 failed... failed... Setting an EVAL scope, savestack=3 7 <aabab> <babaaaa> | 1: CURLYX[0] {2,32767} 7 <aabab> <babaaaa> | 25: WHILEM[1/1] 0 out of 2..32767 cc=bffff434 7 <aabab> <babaaaa> | 3: OPEN1 7 <aabab> <babaaaa> | 5: OPEN2 7 <aabab> <babaaaa> | 7: BRANCH Setting an EVAL scope, savestack=13 7 <aabab> <babaaaa> | 8: REF3 failed... 7 <aabab> <babaaaa> | 11: EXACT <b> 8 <ababb> <abaaaaa> | 13: CLOSE2 8 <ababb> <abaaaaa> | 15: REF2 failed... failed... failed... Setting an EVAL scope, savestack=3 8 <ababb> <abaaaaa> | 1: CURLYX[0] {2,32767} 8 <ababb> <abaaaaa> | 25: WHILEM[1/1] 0 out of 2..32767 cc=bffff434 8 <ababb> <abaaaaa> | 3: OPEN1 8 <ababb> <abaaaaa> | 5: OPEN2 8 <ababb> <abaaaaa> | 7: BRANCH Setting an EVAL scope, savestack=13 8 <ababb> <abaaaaa> | 8: REF3 failed... 8 <ababb> <abaaaaa> | 11: EXACT <b> failed... failed... failed... Setting an EVAL scope, savestack=3 9 <babba> <baaaaab> | 1: CURLYX[0] {2,32767} 9 <babba> <baaaaab> | 25: WHILEM[1/1] 0 out of 2..32767 cc=bffff434 9 <babba> <baaaaab> | 3: OPEN1 9 <babba> <baaaaab> | 5: OPEN2 9 <babba> <baaaaab> | 7: BRANCH Setting an EVAL scope, savestack=13 9 <babba> <baaaaab> | 8: REF3 failed... 9 <babba> <baaaaab> | 11: EXACT <b> 10 <abbab> <aaaaabb> | 13: CLOSE2 10 <abbab> <aaaaabb> | 15: REF2 failed... failed... failed... Setting an EVAL scope, savestack=3 10 <abbab> <aaaaabb> | 1: CURLYX[0] {2,32767} 10 <abbab> <aaaaabb> | 25: WHILEM[1/1] 0 out of 2..32767 cc=bffff434 10 <abbab> <aaaaabb> | 3: OPEN1 10 <abbab> <aaaaabb> | 5: OPEN2 10 <abbab> <aaaaabb> | 7: BRANCH Setting an EVAL scope, savestack=13 10 <abbab> <aaaaabb> | 8: REF3 failed... 10 <abbab> <aaaaabb> | 11: EXACT <b> failed... failed... failed... Setting an EVAL scope, savestack=3 11 <bbaba> <aaaabba> | 1: CURLYX[0] {2,32767} 11 <bbaba> <aaaabba> | 25: WHILEM[1/1] 0 out of 2..32767 cc=bffff434 11 <bbaba> <aaaabba> | 3: OPEN1 11 <bbaba> <aaaabba> | 5: OPEN2 11 <bbaba> <aaaabba> | 7: BRANCH Setting an EVAL scope, savestack=13 11 <bbaba> <aaaabba> | 8: REF3 failed... 11 <bbaba> <aaaabba> | 11: EXACT <b> failed... failed... failed... Setting an EVAL scope, savestack=3 12 <babaa> <aaabbaa> | 1: CURLYX[0] {2,32767} 12 <babaa> <aaabbaa> | 25: WHILEM[1/1] 0 out of 2..32767 cc=bffff434 12 <babaa> <aaabbaa> | 3: OPEN1 12 <babaa> <aaabbaa> | 5: OPEN2 12 <babaa> <aaabbaa> | 7: BRANCH Setting an EVAL scope, savestack=13 12 <babaa> <aaabbaa> | 8: REF3 failed... 12 <babaa> <aaabbaa> | 11: EXACT <b> failed... failed... failed... Setting an EVAL scope, savestack=3 13 <abaaa> <aabbaaa> | 1: CURLYX[0] {2,32767} 13 <abaaa> <aabbaaa> | 25: WHILEM[1/1] 0 out of 2..32767 cc=bffff434 13 <abaaa> <aabbaaa> | 3: OPEN1 13 <abaaa> <aabbaaa> | 5: OPEN2 13 <abaaa> <aabbaaa> | 7: BRANCH Setting an EVAL scope, savestack=13 13 <abaaa> <aabbaaa> | 8: REF3 failed... 13 <abaaa> <aabbaaa> | 11: EXACT <b> failed... failed... failed... Setting an EVAL scope, savestack=3 14 <baaaa> <abbaaaa> | 1: CURLYX[0] {2,32767} 14 <baaaa> <abbaaaa> | 25: WHILEM[1/1] 0 out of 2..32767 cc=bffff434 14 <baaaa> <abbaaaa> | 3: OPEN1 14 <baaaa> <abbaaaa> | 5: OPEN2 14 <baaaa> <abbaaaa> | 7: BRANCH Setting an EVAL scope, savestack=13 14 <baaaa> <abbaaaa> | 8: REF3 failed... 14 <baaaa> <abbaaaa> | 11: EXACT <b> failed... failed... failed... Setting an EVAL scope, savestack=3 15 <aaaaa> <bbaaaab> | 1: CURLYX[0] {2,32767} 15 <aaaaa> <bbaaaab> | 25: WHILEM[1/1] 0 out of 2..32767 cc=bffff434 15 <aaaaa> <bbaaaab> | 3: OPEN1 15 <aaaaa> <bbaaaab> | 5: OPEN2 15 <aaaaa> <bbaaaab> | 7: BRANCH Setting an EVAL scope, savestack=13 15 <aaaaa> <bbaaaab> | 8: REF3 failed... 15 <aaaaa> <bbaaaab> | 11: EXACT <b> 16 <aaaab> <baaaabb> | 13: CLOSE2 16 <aaaab> <baaaabb> | 15: REF2 17 <aaabb> <aaaabba> | 17: OPEN3 17 <aaabb> <aaaabba> | 19: EXACT <a> 18 <aaabba> <aaabba> | 21: CLOSE3 18 <aaabba> <aaabba> | 23: CLOSE1 18 <aaabba> <aaabba> | 25: WHILEM[1/1] 1 out of 2..32767 cc=bffff434 18 <aaabba> <aaabba> | 3: OPEN1 18 <aaabba> <aaabba> | 5: OPEN2 18 <aaabba> <aaabba> | 7: BRANCH Setting an EVAL scope, savestack=23 18 <aaabba> <aaabba> | 8: REF3 19 <aaabbaa> <aabba> | 13: CLOSE2 19 <aaabbaa> <aabba> | 15: REF2 20 <aaabbaaa> <abba> | 17: OPEN3 20 <aaabbaaa> <abba> | 19: EXACT <a> 21 <aaabbaaaa> <bba> | 21: CLOSE3 21 <aaabbaaaa> <bba> | 23: CLOSE1 21 <aaabbaaaa> <bba> | 25: WHILEM[1/1] 2 out of 2..32767 cc=bffff434 Setting an EVAL scope, savestack=41 21 <aaabbaaaa> <bba> | 3: OPEN1 21 <aaabbaaaa> <bba> | 5: OPEN2 21 <aaabbaaaa> <bba> | 7: BRANCH Setting an EVAL scope, savestack=51 21 <aaabbaaaa> <bba> | 8: REF3 failed... 21 <aaabbaaaa> <bba> | 11: EXACT <b> 22 <aaabbaaaab> <ba> | 13: CLOSE2 22 <aaabbaaaab> <ba> | 15: REF2 23 <aaabbaaaabb> <a> | 17: OPEN3 23 <aaabbaaaabb> <a> | 19: EXACT <a> 24 <aaabbaaaabba> <> | 21: CLOSE3 24 <aaabbaaaabba> <> | 23: CLOSE1 24 <aaabbaaaabba> <> | 25: WHILEM[1/1] 3 out of 2..32767 cc=bffff434 Setting an EVAL scope, savestack=69 24 <aaabbaaaabba> <> | 3: OPEN1 24 <aaabbaaaabba> <> | 5: OPEN2 24 <aaabbaaaabba> <> | 7: BRANCH Setting an EVAL scope, savestack=79 24 <aaabbaaaabba> <> | 8: REF3 failed... 24 <aaabbaaaabba> <> | 11: EXACT <b> failed... Clearing an EVAL scope, savestack=69..79 restoring \1 to 21(21)..24 restoring \2 to 21(21)..22 restoring \3 to 23(23)..24 failed, try continuation... 24 <aaabbaaaabba> <> | 26: NOTHING 24 <aaabbaaaabba> <> | 27: END Match successful! true Freeing REx: `"((\\3|b)\\2(a)){2,}"'
(In reply to comment #1) > My understanding of this is: > > The second and subsequent times \3 is "a" and \2 is "b", so we're looking for You are forgetting RepeatMatcher step 4 in ECMAScript 15.10.2.5 "For every integer k that satisfies parenIndex < k and k ≤ parenIndex+parenCount, set cap[k] to undefined." So for the second and subsequent times the capture registers are undefined: \3 and \2 should match the empty string again. This is different from Perl. See also bug 476146 which has another example of the same problem and where the IMHO correct expected output is given. Also you can see there that the regexp matches at a different starting place if a quantifier is made non-greedy.
> /((\3|b)\2(a)){2,}/.exec('bbaababbabaaaaabbaaaabba') Expected by Bob Clary: ["bbaaaabba", "bba", "b", "a"] Perl, PCRE, .NET, Java, etc.: ["bbaaaabba", "bba", "b", "a"] Opera 9.52: ["bbaa", "a", "", "a"] IE6: null Firefox 3, Chrome 1: ["abba", "bba", "b", "a"] ES3: ["abba", "bba", "b", "a"] Erik Corry is correct. This is not a bug in Firefox. Relevent portions of ECMA-262 v3 are quoted below: 15.10.2.5: Step 4 of the RepeatMatcher clears Atom's captures each time Atom is repeated. 15.10.2.9: If the regular expression has n or more capturing parentheses but the nth one is undefined because it hasn't captured anything, then the backreference always succeeds. --- JavaScript regular expressions follow the two referenced behavioral rules that are the cause of the confusion here and prevent this regex and subject string from working like most other regex flavors. The first issue is what is matched by backreferences to capturing groups that have not yet participated in a match. ES3 dictates that such backreferences match the empty string, or in other words, they always match successfully. I've opened an ES-Harmony ticket regarding this (since I don't like this behavior and I think there will be minimal backcompat issues if this is changed) at http://bugs.ecmascript.org/ticket/376 The second difference with ES3-flavor regexes involves the value remembered by capturing groups nested within a repeated, outer group. With most regex flavors, the value remembered by a capturing group within a repeated grouping is whatever the group matched the last time it participated in the match. So, the value of backreference 1 after /(?:(a)|(b))+/ is used to match "ab" would be "a". However, according to ECMA-262 v3, the value of backreferences to nested groups is reset every time the outer group is repeated. Hence, /(?:(a)|(b))+/ would still match "ab", but backreference 1 after the match is complete would reference a non-participating capturing group, which in JavaScript would match an empty string within the regex itself and be returned as `undefined` in e.g. the array returned by `RegExp.prototype.exec`. I've not personally opened a ticket on this bit of bahavior because although I don't like it and find it considerably less intuitive and useful than the typical behavior, I think the arguments against this are more based on personal preference. However, changing this would also be unlikely to cause backcompat issues since IE gets this second rule wrong. Once again, this bug report is invalid according to ES3.
the initial reason for this bug report was for checking perl compatibility and deciding what other perl compatible changes to make. I'll go ahead and mark this invalid unless someone decides that we should change to match perl.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → INVALID
Please reopen. Opera has it right, FF has it wrong. The perl behaviour is not right, but nor is the current FF behaviour. To illustrate this, add a ? to make the {} non-greedy: javascript:alert(/((\3|b)\2(a)){2,}?/.exec('bbaababbabaaaaabbaaaabba')) now FF behaves correctly, matching Opera (and newest Chrome beta). This is inconsistent since a greedy/nongreedy distinction should never affect whether or not a regexp matches at a given index, yet that is what is happening here (the ? makes it match at position 0, where the greedy version doesn't match at position 0). I can't work out how to mark this bug reopened. See also the bug I marked as a duplicate.
let's reopen bug 476146.
Need a ruling from an ECMA-262 Ed. 3 RegExp language lawyer, or an appeals-court judge at this point (since Steve Levithan is already a judge in my book). I'm not that judge, so cc'ing Waldemar. I'm also interested in getting future ES-Harmony fixed if possible (if browser implementations do not agree, but we can agree on better future semantics). Steve has http://bugs.ecmascript.org/ permission, Erik please mail me if you would like ticket editing/filing access too. Bugzilla is good for conform-to-spec and (this trumps the spec often) conform-to-web-de-facto-standard bugs, but we generally try not to forge new standards by ourselves, so please use http://bugs.ecmascript.org. /be
My bad! Thanks for the vote of confidence, Brendan, but Erik is correct here. I'd stepped through what I expected to happen at the beginning of the string, but missed the fact that after \3 and \2 match the empty string at the beginning of the subject string, trying and failing to match "a" should cause backtracking into the second alternative in capturing group 2 ("b"), resulting in the eventual outcome returned by Opera.
(In reply to comment #10) > > I'm also interested in getting future ES-Harmony fixed if possible (if browser > implementations do not agree, but we can agree on better future semantics). I don't think the spec is a problem, though it is a little strange the way it differs from perl here. The ES3 spec is apparently implemented faithfully by Opera, and FF gets it right too, apart from this relatively minor bug. The Safari people are completely redoing their implementation and so are Google (where I happen to work but for whom I cannot speak). > Bugzilla is good for conform-to-spec and (this trumps the spec often) > conform-to-web-de-facto-standard bugs, but we generally try not to forge new > standards by ourselves, so please use http://bugs.ecmascript.org. > > /be (In reply to comment #10) > Need a ruling from an ECMA-262 Ed. 3 RegExp language lawyer, or an > appeals-court judge at this point (since Steve Levithan is already a judge in > my book). I'm not that judge, so cc'ing Waldemar. > > I'm also interested in getting future ES-Harmony fixed if possible (if browser > implementations do not agree, but we can agree on better future semantics). > Steve has http://bugs.ecmascript.org/ permission, Erik please mail me if you > would like ticket editing/filing access too. > > Bugzilla is good for conform-to-spec and (this trumps the spec often) > conform-to-web-de-facto-standard bugs, but we generally try not to forge new > standards by ourselves, so please use http://bugs.ecmascript.org. > > /be
You need to log in before you can comment on or make changes to this bug.