Closed
Bug 209919
Opened 21 years ago
Closed 21 years ago
Regular expression differs from IS to Mozilla
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: sagdjb, Assigned: rogerl)
References
Details
(Keywords: js1.5)
Attachments
(1 file)
733 bytes,
text/html
|
Details |
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705)
Build Identifier:
Then trying to execute the regular expression: /^\-?(\d{1,}|\.{0,})*(\,\d{1,})?
$/ which accepts input like 100.00 100,00 1.000,00
This work in IE5+, but returns false when executing it in javascript to the
mozilla browser.
testexp = /^\-?(\d{1,}|\.{0,})*(\,\d{1,})?$/;
if (!testexp.test(this.typedvalue)
{
alert('error in input');
}
Is this due to error in regular expression, or error in javascript engine?
Reproducible: Always
Steps to Reproduce:
1. execute regular expression
Actual Results:
It returns false or valid input
Expected Results:
return true
This is because {0,} translates to {0,0} in Mozilla, which is wrong.
Workaround: Use '*' instead.
- if (num == ren->u.range.max) break;
+ if (num && num == ren->u.range.max) break;
at http://lxr.mozilla.org/seamonkey/source/js/src/jsregexp.c#1714 solves the
problem for me, but I don't know if it's a proper fix.
The patch for bug 85721 seems to fix this too (uses -1 instead of 0 as internal
representation of infinity). BTW, why can't it go in? There'll hardly be as many
regressions as we have open bugs waiting to be fixed by it.
Related to bug 191479.
Simple testcase: javascript:/^x{0,}$/.test("x")
Comment 3•21 years ago
|
||
Confirming bug based on comment 1. Thank you, for the analysis!
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows 2000 → All
Hardware: PC → All
Comment 4•21 years ago
|
||
re = /^\-?(\d{1,}|\.{0,})*(\,\d{1,})?$/;
str = '100.00';
str = '100,00';
str = '1.000,00';
Here is the result of |re.exec(str)| in each case:
Current Mozilla Perl and IE6 patch in bug 85721
null ["100.00", , ] ["100.00", "00", , ]
["100,00", "100", ",00"] ["100,00", , ",00"] ["100,00", "100", ",00"]
null ["1.000,00", , ",00"] ["1.000,00", "000", ",00"]
It's a bit tricky to read, since some submatches themselves contain commas.
I put in double-quotes to make things clearer.
At any rate, it looks like the patch in bug 85721, while it corrects
the error zack-weg@gmx.de pointed out above, is not completely right.
The first submatch should be |undefined| in all three examples -
OS: All → Windows 2000
Hardware: All → PC
Comment 5•21 years ago
|
||
Comment 6•21 years ago
|
||
Testcase added to JS testsuite:
mozilla/js/tests/ecma_3/RegExp/regress-209919.js
Currently failing in Rhino (where 85721 has already been applied)
in the same manner as indicated above for that patch.
And of course failing in SpiderMonkey as in Mozilla above -
Updated•21 years ago
|
Depends on: RegExpPerf
Keywords: js1.5
OS: Windows 2000 → All
Whiteboard: [QA note: fix needed in Rhino as well as SpiderMonkey]
Then, the remaining problem with the patch for bug 85721 is basically this:
/(a|b*)*/.exec( "a" );
which should (?) give [ "a", "" ], but does [ "a", "a" ] (same as current
behavior). [ "a", undefined ] would clearly be wrong.
Comment 8•21 years ago
|
||
rogerl, can you fix this for 1.5alpha in the big patch for 85721? I will review
the next patch in that bug, or the current one if this bug can't be fixed by a
small iteration.
/be
Comment 9•21 years ago
|
||
cc'ing Waldemar - what should this output?
/(a|b*)*/.exec( "a" );
[ "a", "a" ]
[ "a", "" ]
[ "a", undefined ]
Both Perl and IE6 output "" for the backreference (a|b*) in this example.
I thought JS should output |undefined| because the ECMA-262 Ed. 3 spec
says that backreferences must hold |undefined| if not used.
See bug 123437 (SpiderMonkey) and bug 123439 (Rhino).
But now I see zack-weg@gmx.de's point: the backreference (a|b*)
can successfully match the empty string "" at the $ position of "a".
Comment 10•21 years ago
|
||
Regarding comments 7 and 9, the engine is correct as it stands. The only
correct result for:
/(a|b*)*/.exec("a")
is ["a","a"]
The reason for this is ECMA-262 15.10.2.5, third algorithm, step 2.1: "If min is
zero and y's endIndex is equal to x's endIndex, then return failure."
This step of the spec is necessary to prefent the regular expression engine from
falling into infinite loops on patterns such as (a*)*, trying to match
infinitely many empty strings. That step states that, once the minimum repeat
count (which is 0 for *, 1 for +, etc.) has been satisfied, an atom being
repeated must not match the empty string.
If this step in the spec were not there, then the correct behavior for
/(a|b*)*/.exec("a")
would be falling into an infinite loop matching infinitely many empty strings (*
is greedy). There is no situation in which the correct behavior for the pattern
above would be returning ["a",""].
Note also the caveat about the minimum repeat count being satisfied. The
specified result for the slightly modified pattern:
/(a|b*){5,}/.exec("a")
is ["a",""] because the parentheses now match "a" followed by four empty strings.
Comment 11•21 years ago
|
||
Regarding comment 4, the ECMA 262 Edition 3 semantics specify the following:
/^\-?(\d{1,}|\.{0,})*(\,\d{1,})?$/.exec(str)
str = '100.00' should produce ["100.00", "00", undefined]
str = '100,00' should produce ["100,00", "100", ",00"]
str = '1.000,00' should produce ["1.000,00", "000", ",00"]
With the patch in bug 85721 our engine is now correct on these test cases.
Comment 12•21 years ago
|
||
Waldemar: thanks; I have corrected the above testcase:
mozilla/js/tests/ecma_3/RegExp/regress-209919.js
It passes in SpiderMonkey with the patch for bug 85721 applied,
and in Rhino, where 85721 has already been applied.
SUMMARY: this is another bug dependent on bug 85721. Once that
fix goes in, this and several other regexp bugs will go away -
Whiteboard: [QA note: fix needed in Rhino as well as SpiderMonkey]
Comment 13•21 years ago
|
||
waldemar:
> This step of the spec is necessary to prefent the regular expression engine from
> falling into infinite loops on patterns such as (a*)*, trying to match
> infinitely many empty strings. That step states that, once the minimum repeat
> count (which is 0 for *, 1 for +, etc.) has been satisfied, an atom being
> repeated must not match the empty string.
Does this even mean, that
/(b*)*/.exec("a")
should give ["", undefined] instead of ["", ""] ?
The patch for bug 85721 changes the behavior from the latter to the former.
Note that this is incompatible with Perl, which matches the empty string
once before terminating an otherwise infinite loop.
Comment 14•21 years ago
|
||
Yes,
/(b*)*/.exec("a")
should return ["", undefined].
Comment 15•21 years ago
|
||
Fixed by bug 85721's landing.
/be
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•