Closed Bug 209919 Opened 21 years ago Closed 20 years ago

Regular expression differs from IS to Mozilla

Categories

(Core :: JavaScript Engine, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: sagdjb, Assigned: rogerl)

References

Details

(Keywords: js1.5)

Attachments

(1 file)

User-Agent:       Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705)
Build Identifier: 

Then trying to execute the regular expression: /^\-?(\d{1,}|\.{0,})*(\,\d{1,})?
$/ which accepts input like 100.00 100,00 1.000,00

This work in IE5+, but returns false when executing it in javascript to the 
mozilla browser.

testexp = /^\-?(\d{1,}|\.{0,})*(\,\d{1,})?$/;
if (!testexp.test(this.typedvalue)
{
  alert('error in input');
}

Is this due to error in regular expression, or error in javascript engine?

Reproducible: Always

Steps to Reproduce:
1. execute regular expression

Actual Results:  
It returns false or valid input

Expected Results:  
return true
This is because {0,} translates to {0,0} in Mozilla, which is wrong.
Workaround: Use '*' instead.

-          if (num == ren->u.range.max) break;
+          if (num && num == ren->u.range.max) break;

at http://lxr.mozilla.org/seamonkey/source/js/src/jsregexp.c#1714 solves the
problem for me, but I don't know if it's a proper fix.
The patch for bug 85721 seems to fix this too (uses -1 instead of 0 as internal
representation of infinity). BTW, why can't it go in? There'll hardly be as many
regressions as we have open bugs waiting to be fixed by it.

Related to bug 191479.
Simple testcase: javascript:/^x{0,}$/.test("x")
Confirming bug based on comment 1.  Thank you, for the analysis!
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows 2000 → All
Hardware: PC → All
re = /^\-?(\d{1,}|\.{0,})*(\,\d{1,})?$/;

str = '100.00';
str = '100,00';     
str = '1.000,00';    

Here is the result of |re.exec(str)| in each case:


 Current Mozilla           Perl and IE6            patch in bug 85721
      null                 ["100.00", , ]          ["100.00", "00", , ]         
["100,00", "100", ",00"]   ["100,00", , ",00"]     ["100,00", "100", ",00"]     
      null                 ["1.000,00", , ",00"]   ["1.000,00", "000", ",00"]


It's a bit tricky to read, since some submatches themselves contain commas.
I put in double-quotes to make things clearer.

At any rate, it looks like the patch in bug 85721, while it corrects
the error zack-weg@gmx.de pointed out above, is not completely right.
The first submatch should be |undefined| in all three examples -
OS: All → Windows 2000
Hardware: All → PC
Attached file HTML testcase
Testcase added to JS testsuite:

      mozilla/js/tests/ecma_3/RegExp/regress-209919.js

Currently failing in Rhino (where 85721 has already been applied)
in the same manner as indicated above for that patch. 

And of course failing in SpiderMonkey as in Mozilla above -
Depends on: RegExpPerf
Keywords: js1.5
OS: Windows 2000 → All
Whiteboard: [QA note: fix needed in Rhino as well as SpiderMonkey]
Then, the remaining problem with the patch for bug 85721 is basically this:

/(a|b*)*/.exec( "a" );

which should (?) give [ "a", "" ], but does [ "a", "a" ] (same as current
behavior). [ "a", undefined ] would clearly be wrong.
rogerl, can you fix this for 1.5alpha in the big patch for 85721?  I will review
the next patch in that bug, or the current one if this bug can't be fixed by a
small iteration.

/be
cc'ing Waldemar -  what should this output?

/(a|b*)*/.exec( "a" );

[ "a", "a" ] 
[ "a", "" ]
[ "a", undefined ]


Both Perl and IE6 output "" for the backreference (a|b*) in this example. 
I thought JS should output |undefined| because the ECMA-262 Ed. 3 spec 
says that backreferences must hold |undefined| if not used.
See bug 123437 (SpiderMonkey) and bug 123439 (Rhino).

But now I see zack-weg@gmx.de's point: the backreference (a|b*)
can successfully match the empty string "" at the $ position of "a".
Regarding comments 7 and 9, the engine is correct as it stands.  The only
correct result for:

  /(a|b*)*/.exec("a")

is ["a","a"]

The reason for this is ECMA-262 15.10.2.5, third algorithm, step 2.1: "If min is
zero and y's endIndex is equal to x's endIndex, then return failure."

This step of the spec is necessary to prefent the regular expression engine from
falling into infinite loops on patterns such as (a*)*, trying to match
infinitely many empty strings.  That step states that, once the minimum repeat
count (which is 0 for *, 1 for +, etc.) has been satisfied, an atom being
repeated must not match the empty string.

If this step in the spec were not there, then the correct behavior for

  /(a|b*)*/.exec("a")

would be falling into an infinite loop matching infinitely many empty strings (*
is greedy).  There is no situation in which the correct behavior for the pattern
above would be returning ["a",""].

Note also the caveat about the minimum repeat count being satisfied.  The
specified result for the slightly modified pattern:

  /(a|b*){5,}/.exec("a")

is ["a",""] because the parentheses now match "a" followed by four empty strings.
Regarding comment 4, the ECMA 262 Edition 3 semantics specify the following:

/^\-?(\d{1,}|\.{0,})*(\,\d{1,})?$/.exec(str)

str = '100.00'    should produce ["100.00", "00", undefined]
str = '100,00'    should produce ["100,00", "100", ",00"]
str = '1.000,00'  should produce ["1.000,00", "000", ",00"]

With the patch in bug 85721 our engine is now correct on these test cases.
Waldemar: thanks; I have corrected the above testcase:

      mozilla/js/tests/ecma_3/RegExp/regress-209919.js

It passes in SpiderMonkey with the patch for bug 85721 applied,
and in Rhino, where 85721 has already been applied.

SUMMARY: this is another bug dependent on bug 85721. Once that
fix goes in, this and several other regexp bugs will go away -
Whiteboard: [QA note: fix needed in Rhino as well as SpiderMonkey]
waldemar:
> This step of the spec is necessary to prefent the regular expression engine from
> falling into infinite loops on patterns such as (a*)*, trying to match
> infinitely many empty strings.  That step states that, once the minimum repeat
> count (which is 0 for *, 1 for +, etc.) has been satisfied, an atom being
> repeated must not match the empty string.

Does this even mean, that

  /(b*)*/.exec("a")

should give ["", undefined] instead of ["", ""] ?
The patch for bug 85721 changes the behavior from the latter to the former.

Note that this is incompatible with Perl, which matches the empty string
once before terminating an otherwise infinite loop.
Yes,

  /(b*)*/.exec("a")

should return ["", undefined].
Fixed by bug 85721's landing.

/be
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: