Regular expression "(A)?(A.*)" picks 'A' twice. var str = "ABCDEF"; var regexp = /^(A)?(A.*)$/; regexp.exec(str); Actual Result: RegExp.$1 = 'A'; RegExp.$2 = 'ABCDEF'; Expected Result: RegExp.$1 = ''; RegExp.$2 = 'ABCDEF';
Using Mozilla 2001062113 WinNT. The corrected testcase has the test function called from button onClick() rather than from form onSubmit(). This is necessary to run the testcase on the Bugzilla server. It is very interesting to run the testcase with the test strings "AAA", then "AA", then "A": regexp = /^(A)?(A.*)$/; str = "AAA" NN4.7 $1=A, $2=AA Moz: $1=A, $2=AA str = "AA" NN4.7 $1=A, $2=A Moz: $1=A, $2=A str = "A" NN4.7 $1= , $2=A Moz: $1=A, $2=A It is only when there is a single "A" that things go wrong in Moz -
Here is how Perl handles this: pattern /(A)?(A.*)/ string AAA $1=A $2=AA string AA $1=A $2=A string A $1= $2=A
Certainly Mozilla (and JS1.5) are not handling the string "A" correctly. But I'm puzzled the way NN4.7, (and IE4.7), and Perl handle it. Since the regexp is /^(A)?(A.*)$/, and we do a greedy match from left to right, why do they assign the match to the SECOND subexpression, and not the FIRST? i.e. for the string "A", shouldn't we expect $1=A, $2= ???????????????????????????????????????????????????????
Possible answer: NN4.7, IE4.7 and Perl are not being "greedy". "No match" is a successful outcome for the ? operator, since it means "match 0 or 1 time". So if you're not being greedy, you can match 0 times and move on to the next subexpression to the right...
Note NN4.7, IE4.7 and Perl ARE being "greedy" when the string is "AAA" or "AA". pattern /(A)?(A.*)/ string AAA $1=A $2=AA string AA $1=A $2=A They "consume" as much as they can (1 character) in the first subexpression, and give only the remainder to the second one. But when the string is "A", they do not consume as much as they can in the first subexpression. The first subexpression accepts less: a 0-character match, to let the second subexpression match. "Altruistic" behavior to allow other parts of the pattern to match ...
Testcase added to JS test suite: js/tests/ecma_3/RegExp/regress-87231.js Note: depends on latest version of: js/tests/ecma_3/RegExp/shell.js
Note: this testcase is also failing in Rhino...
Rhino fix checked in.
Rhino fix verified on WinNT. Testcase passes in both rhino, rhinoi shells -
Created attachment 53792 [details] [diff] [review] Reset paren contents after failed greedy child attempt
cc'ing reviewers -
Created attachment 60740 [details] [diff] [review] Here's the patch applied to 3.49 Seems like there were some problems applying the earlier patch
Created attachment 60742 [details] [diff] [review] Here's the patch applied to 3.49 Once again, with context.
Created attachment 62276 [details] [diff] [review] Patched patch I messed up shifting to later revision, trying again,
Comment on attachment 62276 [details] [diff] [review] Patched patch r=khanson
Comment on attachment 62276 [details] [diff] [review] Patched patch firstname.lastname@example.org
Another patch with r= and sr=, but not checked in, and it's now < 4 hours till 0.9.8 freezes. What's the hold-up? /be
Fix checked in.
Marking Verified FIXED. The above testcase now passes in the debug and optimized JS shell on Linux, WinNT, and Mac9.1.