Closed Bug 1010154 Opened 10 years ago Closed 10 years ago

regular expression error

Categories

(Core :: JavaScript Engine, defect)

29 Branch
defect
Not set
normal

Tracking

()

VERIFIED DUPLICATE of bug 998785

People

(Reporter: evgpanchenko, Unassigned)

References

()

Details

(Keywords: regression, testcase)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:26.0) Gecko/20100101 Firefox/26.0 (Beta/Release)
Build ID: 20140421221237

Steps to reproduce:

var a = '';
// 17 nbsp is ok!
for(var i=0; i<18; i++) a+= "\xa0";
a = a+'-'+a+'-';
var r = /(\s|\u00A0)+$/g;
a.replace(r, '');



Actual results:

InternalError: an error occurred while executing regular expression


Expected results:

"                 -                 -"
OS: Mac OS X → Windows XP
Summary: regexpr error → regular expression error
This works on 28, but breaks on 29-32.
Status: UNCONFIRMED → NEW
Component: Untriaged → JavaScript Engine
Ever confirmed: true
OS: Windows XP → All
Product: Firefox → Core
Hardware: x86 → All
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
I'm not sure this is a duplicate. Bug 998785 comment 6 and comment 1 say that there used to not be a match (for the case in bug 998785). That is not the case with this testcase; it used to match just fine. Equally, the string length here really doesn't go as far that I would expect an overflow to occur (but then, I don't know YARR very well). Are we just being super-cautious now and therefore missing cases where we'd otherwise match (relatively quickly) ?

(FWIW, to the reporter: the following regexp would work for your usecase and probably doesn't have this problem: /[\s\u00a0]+$/g)
Flags: needinfo?(bzbarsky)
> it used to match just fine

Really?  The testcase as written is a no-op, since it never examines the return value of the replace() call, and since there are no trailing \xa0 in the string anyway, so the replace() call is a no-op.  Here's a somewhat modified testcase that actually has the replace() call do something and then checks whether it did it right:

  var a = '';
  for(var i=0; i<18; i++) a+= "\xa0";
  a = a+'-'+a+'-'+a;
  var r = /(\s|\u00A0)+$/g;
  var b = a.replace(r, '');
  alert(b.length + " -- " + b.charCodeAt(b.length-1));

The correct value for the alert is obviously "38 -- 45".  Firefox 28 and earlier alert "56 -- 160".  As in, the replace call didn't actually do any replacing.

> Equally, the string length here really doesn't go as far that I would expect an overflow
> to occur

The YARR interpreter fails out after 1e6 char match operations.  We used to treat that as "doesn't match".  Now we throw.

In JS regexps, \s matches \xa0.  So this testcase has an alternation in which both branches match all the \xa0 chars in the string.

Matching this regexp involves starting at char 0 and trying to match it there.  This involves making a choice of alternation for each \xa0 as we go, then discovering we didn't match because '-' doesn't match '$'.  Then YARR backtracks and tries the other alternation choice, which also matches until we get to the '-'.  Since 'a' starts with 18 \xa0 chars, that means 2^18 match operations where we try to start the match at the first char, 2^17 where we try to start at the second char, etc.  In all, we end up with 2^19 - 1 match operations for which we try to start somewhere in the first run of whitespace in 'a'.  And another 2^19 - 1 match operations in the second run.  2^20 - 2 = 1048574 > 1e6, so the whole thing fails out.

All of which is to say that this is a _really_ crappy regexp for the job at hand; it should just be /\s+$/.  The fact that YARR fails on it is not exactly good, of course...
Flags: needinfo?(bzbarsky)
One other note on "relatively quickly".  That replace() call takes about 40ms on my hardware before it returns (in Fx28) or throws (Fx29).  Not surprising, since my clock is 2.6GHz and 1e6 operations per 40ms is 2.5e9 operations per second...
And one last note: in Chrome, which doesn't have the exception issue at hand, increasing the loop termination number by 1 doubles the execution time for the regexp, as expected.  If I set the termination number to 25 instead of 18, the replace() call takes several seconds in Chrome.
(I mid-aired with bz, but since I already wrote a comment, I'll post it anyway.)

(In reply to :Gijs Kruitbosch from comment #3)
> I'm not sure this is a duplicate. Bug 998785 comment 6 and comment 1 say
> that there used to not be a match (for the case in bug 998785). That is not
> the case with this testcase; it used to match just fine.

No, it didn't: the test case in comment 0 doesn't match anything: the result of the `a.replace()` call is identical to `a`.

> Equally, the string
> length here really doesn't go as far that I would expect an overflow to
> occur (but then, I don't know YARR very well).

While the length of the string matters to a small extent, it's the pattern that is the issue. The patch in bug 953013 strictly only handles cases where Yarr already bailed out. The only change is that we now bail out loudly instead of silently returning a wrong result.

> Are we just being
> super-cautious now and therefore missing cases where we'd otherwise match
> (relatively quickly) ?

No, see above.
Thanks for the patient explanations! :-)

Marking verified dup.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.