Closed Bug 87231 Opened 23 years ago Closed 23 years ago

Regular expression "(A)?(A.*)" picks 'A' twice

Categories

(Core :: JavaScript Engine, defect)

x86
All
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla0.9.8

People

(Reporter: kazhik, Assigned: rogerl)

Details

(Whiteboard: [fixed in Rhino 2001-08-27])

Attachments

(3 files, 3 obsolete files)

Regular expression "(A)?(A.*)" picks 'A' twice.

var str = "ABCDEF";
var regexp = /^(A)?(A.*)$/;
regexp.exec(str);

Actual Result: RegExp.$1 = 'A'; RegExp.$2 = 'ABCDEF';
Expected Result: RegExp.$1 = ''; RegExp.$2 = 'ABCDEF';
Attached file Testcase
Using Mozilla 2001062113 WinNT.

The corrected testcase has the test function called from button onClick()
rather than from form onSubmit(). This is necessary to run the testcase
on the Bugzilla server. 

It is very interesting to run the testcase with the test strings 
"AAA", then "AA",  then "A": 

regexp = /^(A)?(A.*)$/;      
str = "AAA"

NN4.7 $1=A, $2=AA
Moz:  $1=A, $2=AA


str = "AA"

NN4.7 $1=A, $2=A
Moz:  $1=A, $2=A


str = "A"

NN4.7 $1= , $2=A
Moz:  $1=A, $2=A


It is only when there is a single "A" that things go wrong in Moz - 
Here is how Perl handles this:

pattern  /(A)?(A.*)/

string   AAA
$1=A
$2=AA

string   AA
$1=A
$2=A

string   A
$1=
$2=A
Certainly Mozilla (and JS1.5) are not handling the string "A" correctly.


But I'm puzzled the way NN4.7, (and IE4.7), and Perl handle it. 
Since the regexp is /^(A)?(A.*)$/, and we do a greedy match from 
left to right, why do they assign the match to the SECOND subexpression, 
and not the FIRST? 

i.e. for the string "A", shouldn't we expect $1=A, $2=

???????????????????????????????????????????????????????
Possible answer: NN4.7, IE4.7 and Perl are not being "greedy".

"No match" is a successful outcome for the ? operator, since it means 
"match 0 or 1 time". So if you're not being greedy, you can match 0 times
and move on to the next subexpression to the right...
Note NN4.7, IE4.7 and Perl ARE being "greedy" when the string is "AAA" or "AA". 


pattern  /(A)?(A.*)/

string   AAA
$1=A
$2=AA

string   AA
$1=A
$2=A


They "consume" as much as they can (1 character) in the first 
subexpression, and give only the remainder to the second one.

But when the string is "A", they do not consume as much as they
can in the first subexpression. The first subexpression accepts
less: a 0-character match, to let the second subexpression match. 

"Altruistic" behavior to allow other parts of the pattern to match ...
Testcase added to JS test suite:

              js/tests/ecma_3/RegExp/regress-87231.js

Note: depends on latest version of:

              js/tests/ecma_3/RegExp/shell.js
Note: this testcase is also failing in Rhino...
Whiteboard: [need fix in Rhino, too]
Rhino fix checked in.
Status: NEW → ASSIGNED
Rhino fix verified on WinNT. Testcase passes in both rhino, rhinoi shells -
Whiteboard: [need fix in Rhino, too] → [fixed in Rhino 2001-08-27]
cc'ing reviewers -
Attached patch Here's the patch applied to 3.49 (obsolete) — Splinter Review
Seems like there were some problems applying the earlier patch
Attached patch Here's the patch applied to 3.49 (obsolete) — Splinter Review
Once again, with context.
Attachment #60740 - Attachment is obsolete: true
Attached patch Patched patchSplinter Review
I messed up shifting to later revision, trying again,
Attachment #53792 - Attachment is obsolete: true
Attachment #60742 - Attachment is obsolete: true
Comment on attachment 62276 [details] [diff] [review]
Patched patch

r=khanson
Attachment #62276 - Flags: review+
Another patch with r= and sr=, but not checked in, and it's now < 4 hours till
0.9.8 freezes.  What's the hold-up?

/be
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla0.9.8
Marking Verified FIXED. The above testcase now passes in the
debug and optimized JS shell on Linux, WinNT, and Mac9.1.
Status: RESOLVED → VERIFIED
Flags: testcase+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: