Closed
Bug 351344
Opened 18 years ago
Closed 16 years ago
Perl RegExp test 257: /((\3|b)\2(a)){2,}/.exec('bbaababbabaaaaabbaaaabba')
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
INVALID
People
(Reporter: bc, Unassigned)
References
()
Details
expected bbaaaabba,bba,b,a (MSIE6 returns null)
actual abba,bba,b,a
Comment 1•17 years ago
|
||
My understanding of this is:
The first time through, \3 is the empty string. So we're looking for /((|b)\2(a))/ the first time, which is either /(()(a))/ or /((b)b(a))/. The first string that matches that is the first three characters, "bba".
The second and subsequent times \3 is "a" and \2 is "b", so we're looking for /((a|b)b(a))/, either "aba" or "bba". The former matches, so we're up to "bbaaba" as a match, then the latter matches, giving us "bbaababba". After that nothing matches and we're done.
(If, the first time through, we had matched "a" instead of "bba", then the second and subsequent times, \3 would be "a", and \2 would be the empty string, so we'd be looking for /((a|b)(a))/ the first time, which is either /((a)(a))/ or /((b)(a))/.)
Am I wrong? (I'm saying the expected result is "bbaababba", not "bbaaaabba". I agree with the other parts of the expected results.)
Comment 2•17 years ago
|
||
What Mozilla returns makes sense if we assume that the captures are replaced with each match, as would seem to be suggested by:
/((a|b)\2)+/.exec("aaaabbb")
...though I don't understand the difference between these two:
/((a|b)\2)+/.exec("aaabbb")
/((a|b)\2)+/.exec("aaaabbb")
Comment 3•17 years ago
|
||
The spewage from perl re debug:
Compiling REx `((\3|b)\2(a)){2,}'
size 27 Got 220 bytes for offset annotations.
first at 7
1: CURLYX[0] {2,32767}(26)
3: OPEN1(5)
5: OPEN2(7)
7: BRANCH(10)
8: REF3(13)
10: BRANCH(13)
11: EXACT <b>(13)
13: CLOSE2(15)
15: REF2(17)
17: OPEN3(19)
19: EXACT <a>(21)
21: CLOSE3(23)
23: CLOSE1(25)
25: WHILEM[1/1](0)
26: NOTHING(27)
27: END(0)
floating "a" at 1..2147483647 (checking floating) minlen 2
Offsets: [27]
14[4] 0[0] 1[1] 0[0] 2[1] 0[0] 2[1] 3[2] 0[0] 5[1] 6[1] 0[0] 7[1] 0[0] 8[2] 0[0] 10[1] 0[0] 11[1] 0[0] 12[1] 0[0] 13[1] 0[0] 17[0] 17[0] 18[0]
Guessing start of match, REx "((\3|b)\2(a)){2,}" against "bbaababbabaaaaabbaaaabba"...
Found floating substr "a" at offset 2...
Guessed: match at offset 0
Matching REx "((\3|b)\2(a)){2,}" against "bbaababbabaaaaabbaaaabba"
Setting an EVAL scope, savestack=3
0 <> <bbaababbabaa> | 1: CURLYX[0] {2,32767}
0 <> <bbaababbabaa> | 25: WHILEM[1/1]
0 out of 2..32767 cc=bffff434
0 <> <bbaababbabaa> | 3: OPEN1
0 <> <bbaababbabaa> | 5: OPEN2
0 <> <bbaababbabaa> | 7: BRANCH
Setting an EVAL scope, savestack=13
0 <> <bbaababbabaa> | 8: REF3
failed...
0 <> <bbaababbabaa> | 11: EXACT <b>
1 <b> <baababbabaa> | 13: CLOSE2
1 <b> <baababbabaa> | 15: REF2
2 <bb> <aababbabaa> | 17: OPEN3
2 <bb> <aababbabaa> | 19: EXACT <a>
3 <bba> <ababbabaa> | 21: CLOSE3
3 <bba> <ababbabaa> | 23: CLOSE1
3 <bba> <ababbabaa> | 25: WHILEM[1/1]
1 out of 2..32767 cc=bffff434
3 <bba> <ababbabaa> | 3: OPEN1
3 <bba> <ababbabaa> | 5: OPEN2
3 <bba> <ababbabaa> | 7: BRANCH
Setting an EVAL scope, savestack=23
3 <bba> <ababbabaa> | 8: REF3
4 <bbaa> <babbabaa> | 13: CLOSE2
4 <bbaa> <babbabaa> | 15: REF2
failed...
3 <bba> <ababbabaa> | 11: EXACT <b>
failed...
failed...
Clearing an EVAL scope, savestack=13..23
failed...
failed...
Setting an EVAL scope, savestack=3
1 <b> <baababbabaa> | 1: CURLYX[0] {2,32767}
1 <b> <baababbabaa> | 25: WHILEM[1/1]
0 out of 2..32767 cc=bffff434
1 <b> <baababbabaa> | 3: OPEN1
1 <b> <baababbabaa> | 5: OPEN2
1 <b> <baababbabaa> | 7: BRANCH
Setting an EVAL scope, savestack=13
1 <b> <baababbabaa> | 8: REF3
failed...
1 <b> <baababbabaa> | 11: EXACT <b>
2 <bb> <aababbabaa> | 13: CLOSE2
2 <bb> <aababbabaa> | 15: REF2
failed...
failed...
failed...
Setting an EVAL scope, savestack=3
2 <bb> <aababbabaa> | 1: CURLYX[0] {2,32767}
2 <bb> <aababbabaa> | 25: WHILEM[1/1]
0 out of 2..32767 cc=bffff434
2 <bb> <aababbabaa> | 3: OPEN1
2 <bb> <aababbabaa> | 5: OPEN2
2 <bb> <aababbabaa> | 7: BRANCH
Setting an EVAL scope, savestack=13
2 <bb> <aababbabaa> | 8: REF3
failed...
2 <bb> <aababbabaa> | 11: EXACT <b>
failed...
failed...
failed...
Setting an EVAL scope, savestack=3
3 <bba> <ababbabaa> | 1: CURLYX[0] {2,32767}
3 <bba> <ababbabaa> | 25: WHILEM[1/1]
0 out of 2..32767 cc=bffff434
3 <bba> <ababbabaa> | 3: OPEN1
3 <bba> <ababbabaa> | 5: OPEN2
3 <bba> <ababbabaa> | 7: BRANCH
Setting an EVAL scope, savestack=13
3 <bba> <ababbabaa> | 8: REF3
failed...
3 <bba> <ababbabaa> | 11: EXACT <b>
failed...
failed...
failed...
Setting an EVAL scope, savestack=3
4 <bbaa> <babbabaa> | 1: CURLYX[0] {2,32767}
4 <bbaa> <babbabaa> | 25: WHILEM[1/1]
0 out of 2..32767 cc=bffff434
4 <bbaa> <babbabaa> | 3: OPEN1
4 <bbaa> <babbabaa> | 5: OPEN2
4 <bbaa> <babbabaa> | 7: BRANCH
Setting an EVAL scope, savestack=13
4 <bbaa> <babbabaa> | 8: REF3
failed...
4 <bbaa> <babbabaa> | 11: EXACT <b>
5 <bbaab> <abbabaa> | 13: CLOSE2
5 <bbaab> <abbabaa> | 15: REF2
failed...
failed...
failed...
Setting an EVAL scope, savestack=3
5 <bbaab> <abbabaa> | 1: CURLYX[0] {2,32767}
5 <bbaab> <abbabaa> | 25: WHILEM[1/1]
0 out of 2..32767 cc=bffff434
5 <bbaab> <abbabaa> | 3: OPEN1
5 <bbaab> <abbabaa> | 5: OPEN2
5 <bbaab> <abbabaa> | 7: BRANCH
Setting an EVAL scope, savestack=13
5 <bbaab> <abbabaa> | 8: REF3
failed...
5 <bbaab> <abbabaa> | 11: EXACT <b>
failed...
failed...
failed...
Setting an EVAL scope, savestack=3
6 <baaba> <bbabaaa> | 1: CURLYX[0] {2,32767}
6 <baaba> <bbabaaa> | 25: WHILEM[1/1]
0 out of 2..32767 cc=bffff434
6 <baaba> <bbabaaa> | 3: OPEN1
6 <baaba> <bbabaaa> | 5: OPEN2
6 <baaba> <bbabaaa> | 7: BRANCH
Setting an EVAL scope, savestack=13
6 <baaba> <bbabaaa> | 8: REF3
failed...
6 <baaba> <bbabaaa> | 11: EXACT <b>
7 <aabab> <babaaaa> | 13: CLOSE2
7 <aabab> <babaaaa> | 15: REF2
8 <ababb> <abaaaaa> | 17: OPEN3
8 <ababb> <abaaaaa> | 19: EXACT <a>
9 <babba> <baaaaab> | 21: CLOSE3
9 <babba> <baaaaab> | 23: CLOSE1
9 <babba> <baaaaab> | 25: WHILEM[1/1]
1 out of 2..32767 cc=bffff434
9 <babba> <baaaaab> | 3: OPEN1
9 <babba> <baaaaab> | 5: OPEN2
9 <babba> <baaaaab> | 7: BRANCH
Setting an EVAL scope, savestack=23
9 <babba> <baaaaab> | 8: REF3
failed...
9 <babba> <baaaaab> | 11: EXACT <b>
10 <abbab> <aaaaabb> | 13: CLOSE2
10 <abbab> <aaaaabb> | 15: REF2
failed...
failed...
Clearing an EVAL scope, savestack=13..23
failed...
failed...
Setting an EVAL scope, savestack=3
7 <aabab> <babaaaa> | 1: CURLYX[0] {2,32767}
7 <aabab> <babaaaa> | 25: WHILEM[1/1]
0 out of 2..32767 cc=bffff434
7 <aabab> <babaaaa> | 3: OPEN1
7 <aabab> <babaaaa> | 5: OPEN2
7 <aabab> <babaaaa> | 7: BRANCH
Setting an EVAL scope, savestack=13
7 <aabab> <babaaaa> | 8: REF3
failed...
7 <aabab> <babaaaa> | 11: EXACT <b>
8 <ababb> <abaaaaa> | 13: CLOSE2
8 <ababb> <abaaaaa> | 15: REF2
failed...
failed...
failed...
Setting an EVAL scope, savestack=3
8 <ababb> <abaaaaa> | 1: CURLYX[0] {2,32767}
8 <ababb> <abaaaaa> | 25: WHILEM[1/1]
0 out of 2..32767 cc=bffff434
8 <ababb> <abaaaaa> | 3: OPEN1
8 <ababb> <abaaaaa> | 5: OPEN2
8 <ababb> <abaaaaa> | 7: BRANCH
Setting an EVAL scope, savestack=13
8 <ababb> <abaaaaa> | 8: REF3
failed...
8 <ababb> <abaaaaa> | 11: EXACT <b>
failed...
failed...
failed...
Setting an EVAL scope, savestack=3
9 <babba> <baaaaab> | 1: CURLYX[0] {2,32767}
9 <babba> <baaaaab> | 25: WHILEM[1/1]
0 out of 2..32767 cc=bffff434
9 <babba> <baaaaab> | 3: OPEN1
9 <babba> <baaaaab> | 5: OPEN2
9 <babba> <baaaaab> | 7: BRANCH
Setting an EVAL scope, savestack=13
9 <babba> <baaaaab> | 8: REF3
failed...
9 <babba> <baaaaab> | 11: EXACT <b>
10 <abbab> <aaaaabb> | 13: CLOSE2
10 <abbab> <aaaaabb> | 15: REF2
failed...
failed...
failed...
Setting an EVAL scope, savestack=3
10 <abbab> <aaaaabb> | 1: CURLYX[0] {2,32767}
10 <abbab> <aaaaabb> | 25: WHILEM[1/1]
0 out of 2..32767 cc=bffff434
10 <abbab> <aaaaabb> | 3: OPEN1
10 <abbab> <aaaaabb> | 5: OPEN2
10 <abbab> <aaaaabb> | 7: BRANCH
Setting an EVAL scope, savestack=13
10 <abbab> <aaaaabb> | 8: REF3
failed...
10 <abbab> <aaaaabb> | 11: EXACT <b>
failed...
failed...
failed...
Setting an EVAL scope, savestack=3
11 <bbaba> <aaaabba> | 1: CURLYX[0] {2,32767}
11 <bbaba> <aaaabba> | 25: WHILEM[1/1]
0 out of 2..32767 cc=bffff434
11 <bbaba> <aaaabba> | 3: OPEN1
11 <bbaba> <aaaabba> | 5: OPEN2
11 <bbaba> <aaaabba> | 7: BRANCH
Setting an EVAL scope, savestack=13
11 <bbaba> <aaaabba> | 8: REF3
failed...
11 <bbaba> <aaaabba> | 11: EXACT <b>
failed...
failed...
failed...
Setting an EVAL scope, savestack=3
12 <babaa> <aaabbaa> | 1: CURLYX[0] {2,32767}
12 <babaa> <aaabbaa> | 25: WHILEM[1/1]
0 out of 2..32767 cc=bffff434
12 <babaa> <aaabbaa> | 3: OPEN1
12 <babaa> <aaabbaa> | 5: OPEN2
12 <babaa> <aaabbaa> | 7: BRANCH
Setting an EVAL scope, savestack=13
12 <babaa> <aaabbaa> | 8: REF3
failed...
12 <babaa> <aaabbaa> | 11: EXACT <b>
failed...
failed...
failed...
Setting an EVAL scope, savestack=3
13 <abaaa> <aabbaaa> | 1: CURLYX[0] {2,32767}
13 <abaaa> <aabbaaa> | 25: WHILEM[1/1]
0 out of 2..32767 cc=bffff434
13 <abaaa> <aabbaaa> | 3: OPEN1
13 <abaaa> <aabbaaa> | 5: OPEN2
13 <abaaa> <aabbaaa> | 7: BRANCH
Setting an EVAL scope, savestack=13
13 <abaaa> <aabbaaa> | 8: REF3
failed...
13 <abaaa> <aabbaaa> | 11: EXACT <b>
failed...
failed...
failed...
Setting an EVAL scope, savestack=3
14 <baaaa> <abbaaaa> | 1: CURLYX[0] {2,32767}
14 <baaaa> <abbaaaa> | 25: WHILEM[1/1]
0 out of 2..32767 cc=bffff434
14 <baaaa> <abbaaaa> | 3: OPEN1
14 <baaaa> <abbaaaa> | 5: OPEN2
14 <baaaa> <abbaaaa> | 7: BRANCH
Setting an EVAL scope, savestack=13
14 <baaaa> <abbaaaa> | 8: REF3
failed...
14 <baaaa> <abbaaaa> | 11: EXACT <b>
failed...
failed...
failed...
Setting an EVAL scope, savestack=3
15 <aaaaa> <bbaaaab> | 1: CURLYX[0] {2,32767}
15 <aaaaa> <bbaaaab> | 25: WHILEM[1/1]
0 out of 2..32767 cc=bffff434
15 <aaaaa> <bbaaaab> | 3: OPEN1
15 <aaaaa> <bbaaaab> | 5: OPEN2
15 <aaaaa> <bbaaaab> | 7: BRANCH
Setting an EVAL scope, savestack=13
15 <aaaaa> <bbaaaab> | 8: REF3
failed...
15 <aaaaa> <bbaaaab> | 11: EXACT <b>
16 <aaaab> <baaaabb> | 13: CLOSE2
16 <aaaab> <baaaabb> | 15: REF2
17 <aaabb> <aaaabba> | 17: OPEN3
17 <aaabb> <aaaabba> | 19: EXACT <a>
18 <aaabba> <aaabba> | 21: CLOSE3
18 <aaabba> <aaabba> | 23: CLOSE1
18 <aaabba> <aaabba> | 25: WHILEM[1/1]
1 out of 2..32767 cc=bffff434
18 <aaabba> <aaabba> | 3: OPEN1
18 <aaabba> <aaabba> | 5: OPEN2
18 <aaabba> <aaabba> | 7: BRANCH
Setting an EVAL scope, savestack=23
18 <aaabba> <aaabba> | 8: REF3
19 <aaabbaa> <aabba> | 13: CLOSE2
19 <aaabbaa> <aabba> | 15: REF2
20 <aaabbaaa> <abba> | 17: OPEN3
20 <aaabbaaa> <abba> | 19: EXACT <a>
21 <aaabbaaaa> <bba> | 21: CLOSE3
21 <aaabbaaaa> <bba> | 23: CLOSE1
21 <aaabbaaaa> <bba> | 25: WHILEM[1/1]
2 out of 2..32767 cc=bffff434
Setting an EVAL scope, savestack=41
21 <aaabbaaaa> <bba> | 3: OPEN1
21 <aaabbaaaa> <bba> | 5: OPEN2
21 <aaabbaaaa> <bba> | 7: BRANCH
Setting an EVAL scope, savestack=51
21 <aaabbaaaa> <bba> | 8: REF3
failed...
21 <aaabbaaaa> <bba> | 11: EXACT <b>
22 <aaabbaaaab> <ba> | 13: CLOSE2
22 <aaabbaaaab> <ba> | 15: REF2
23 <aaabbaaaabb> <a> | 17: OPEN3
23 <aaabbaaaabb> <a> | 19: EXACT <a>
24 <aaabbaaaabba> <> | 21: CLOSE3
24 <aaabbaaaabba> <> | 23: CLOSE1
24 <aaabbaaaabba> <> | 25: WHILEM[1/1]
3 out of 2..32767 cc=bffff434
Setting an EVAL scope, savestack=69
24 <aaabbaaaabba> <> | 3: OPEN1
24 <aaabbaaaabba> <> | 5: OPEN2
24 <aaabbaaaabba> <> | 7: BRANCH
Setting an EVAL scope, savestack=79
24 <aaabbaaaabba> <> | 8: REF3
failed...
24 <aaabbaaaabba> <> | 11: EXACT <b>
failed...
Clearing an EVAL scope, savestack=69..79
restoring \1 to 21(21)..24
restoring \2 to 21(21)..22
restoring \3 to 23(23)..24
failed, try continuation...
24 <aaabbaaaabba> <> | 26: NOTHING
24 <aaabbaaaabba> <> | 27: END
Match successful!
true
Freeing REx: `"((\\3|b)\\2(a)){2,}"'
Comment 4•16 years ago
|
||
(In reply to comment #1)
> My understanding of this is:
>
> The second and subsequent times \3 is "a" and \2 is "b", so we're looking for
You are forgetting RepeatMatcher step 4 in ECMAScript 15.10.2.5 "For every integer k that satisfies parenIndex < k and k ≤ parenIndex+parenCount, set cap[k] to undefined." So for the second and subsequent times the capture registers are undefined: \3 and \2 should match the empty string again.
This is different from Perl. See also bug 476146 which has another example of the same problem and where the IMHO correct expected output is given. Also you can see there that the regexp matches at a different starting place if a quantifier is made non-greedy.
Comment 6•16 years ago
|
||
> /((\3|b)\2(a)){2,}/.exec('bbaababbabaaaaabbaaaabba')
Expected by Bob Clary: ["bbaaaabba", "bba", "b", "a"]
Perl, PCRE, .NET, Java, etc.: ["bbaaaabba", "bba", "b", "a"]
Opera 9.52: ["bbaa", "a", "", "a"]
IE6: null
Firefox 3, Chrome 1: ["abba", "bba", "b", "a"]
ES3: ["abba", "bba", "b", "a"]
Erik Corry is correct. This is not a bug in Firefox. Relevent portions of ECMA-262 v3 are quoted below:
15.10.2.5: Step 4 of the RepeatMatcher clears Atom's captures each time Atom is repeated.
15.10.2.9: If the regular expression has n or more capturing parentheses but the nth one is undefined because it hasn't captured anything, then the backreference always succeeds.
---
JavaScript regular expressions follow the two referenced behavioral rules that are the cause of the confusion here and prevent this regex and subject string from working like most other regex flavors. The first issue is what is matched by backreferences to capturing groups that have not yet participated in a match. ES3 dictates that such backreferences match the empty string, or in other words, they always match successfully. I've opened an ES-Harmony ticket regarding this (since I don't like this behavior and I think there will be minimal backcompat issues if this is changed) at http://bugs.ecmascript.org/ticket/376
The second difference with ES3-flavor regexes involves the value remembered by capturing groups nested within a repeated, outer group. With most regex flavors, the value remembered by a capturing group within a repeated grouping is whatever the group matched the last time it participated in the match. So, the value of backreference 1 after /(?:(a)|(b))+/ is used to match "ab" would be "a". However, according to ECMA-262 v3, the value of backreferences to nested groups is reset every time the outer group is repeated. Hence, /(?:(a)|(b))+/ would still match "ab", but backreference 1 after the match is complete would reference a non-participating capturing group, which in JavaScript would match an empty string within the regex itself and be returned as `undefined` in e.g. the array returned by `RegExp.prototype.exec`. I've not personally opened a ticket on this bit of bahavior because although I don't like it and find it considerably less intuitive and useful than the typical behavior, I think the arguments against this are more based on personal preference. However, changing this would also be unlikely to cause backcompat issues since IE gets this second rule wrong.
Once again, this bug report is invalid according to ES3.
Reporter | ||
Comment 7•16 years ago
|
||
the initial reason for this bug report was for checking perl compatibility and deciding what other perl compatible changes to make. I'll go ahead and mark this invalid unless someone decides that we should change to match perl.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → INVALID
Comment 8•16 years ago
|
||
Please reopen. Opera has it right, FF has it wrong. The perl behaviour is not right, but nor is the current FF behaviour. To illustrate this, add a ? to make the {} non-greedy:
javascript:alert(/((\3|b)\2(a)){2,}?/.exec('bbaababbabaaaaabbaaaabba'))
now FF behaves correctly, matching Opera (and newest Chrome beta). This is inconsistent since a greedy/nongreedy distinction should never affect whether or not a regexp matches at a given index, yet that is what is happening here (the ? makes it match at position 0, where the greedy version doesn't match at position 0). I can't work out how to mark this bug reopened. See also the bug I marked as a duplicate.
Reporter | ||
Comment 9•16 years ago
|
||
let's reopen bug 476146.
Comment 10•16 years ago
|
||
Need a ruling from an ECMA-262 Ed. 3 RegExp language lawyer, or an appeals-court judge at this point (since Steve Levithan is already a judge in my book). I'm not that judge, so cc'ing Waldemar.
I'm also interested in getting future ES-Harmony fixed if possible (if browser implementations do not agree, but we can agree on better future semantics). Steve has http://bugs.ecmascript.org/ permission, Erik please mail me if you would like ticket editing/filing access too.
Bugzilla is good for conform-to-spec and (this trumps the spec often) conform-to-web-de-facto-standard bugs, but we generally try not to forge new standards by ourselves, so please use http://bugs.ecmascript.org.
/be
Comment 11•16 years ago
|
||
My bad! Thanks for the vote of confidence, Brendan, but Erik is correct here.
I'd stepped through what I expected to happen at the beginning of the string, but missed the fact that after \3 and \2 match the empty string at the beginning of the subject string, trying and failing to match "a" should cause backtracking into the second alternative in capturing group 2 ("b"), resulting in the eventual outcome returned by Opera.
Comment 12•16 years ago
|
||
(In reply to comment #10)
>
> I'm also interested in getting future ES-Harmony fixed if possible (if browser
> implementations do not agree, but we can agree on better future semantics).
I don't think the spec is a problem, though it is a little strange the way it differs from perl here. The ES3 spec is apparently implemented faithfully by Opera, and FF gets it right too, apart from this relatively minor bug. The Safari people are completely redoing their implementation and so are Google (where I happen to work but for whom I cannot speak).
> Bugzilla is good for conform-to-spec and (this trumps the spec often)
> conform-to-web-de-facto-standard bugs, but we generally try not to forge new
> standards by ourselves, so please use http://bugs.ecmascript.org.
>
> /be
(In reply to comment #10)
> Need a ruling from an ECMA-262 Ed. 3 RegExp language lawyer, or an
> appeals-court judge at this point (since Steve Levithan is already a judge in
> my book). I'm not that judge, so cc'ing Waldemar.
>
> I'm also interested in getting future ES-Harmony fixed if possible (if browser
> implementations do not agree, but we can agree on better future semantics).
> Steve has http://bugs.ecmascript.org/ permission, Erik please mail me if you
> would like ticket editing/filing access too.
>
> Bugzilla is good for conform-to-spec and (this trumps the spec often)
> conform-to-web-de-facto-standard bugs, but we generally try not to forge new
> standards by ourselves, so please use http://bugs.ecmascript.org.
>
> /be
You need to log in
before you can comment on or make changes to this bug.
Description
•