Closed Bug 1092449 Opened 10 years ago Closed 8 years ago

Regexp performance and crash since irregexp

Categories

(Core :: JavaScript Engine, defect)

32 Branch
x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: addons.mozilla, Unassigned)

References

Details

(Keywords: regression)

Attachments

(2 files, 1 obsolete file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.1.17 (KHTML, like Gecko) Version/7.1 Safari/537.85.10

Steps to reproduce:

Here is a new test case related to bug 1084280 and bug 1091459 and not solved by the patches provided in these tickets. 
takes 58ms in Safari, 157ms in FF31, freezes FF between 32 and 36 and crashes FF Nightly.
Component: Untriaged → JavaScript Engine
Product: Firefox → Core
Isn't this the sort of regexp with quantified capturing parens for which Safari and pre-irregexp Firefox will just return false even if it matches?

I just tried the testcase in Chrome and it hangs the renderer process there.

Obviously crashing is no good, nor is hanging without a slow script dialog.
I guess the previous regexp engine YARR was better in some cases than irregexp.
Result with the old regexp engine:
Executed in 147ms; Text length: 2096 chars.

There was a 1st regression:
good=2014-01-10
bad1=2014-01-11
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=37516445a0b5&tochange=e89afc241513
Till Schneidereit — Bug 952898 - String.prototype.startsWith and .endsWith should throw when called with a regexp as first argument. r=jwalden

After 2014-01-11, the testcase doesn't work anymore and sends an error about regex in the console.

Then, irregexp has been implemented, it fixes the testcase but with an horrible perf drop.
bad1=2014-05-16
bad2=2014-05-17
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=58c5a3427997&tochange=2893f60d5903
Blocks: 976446
Keywords: regression
Attached file Firefox regexp test page 2.html (obsolete) —
Another version of the first attachment which does replacements instead of just a test.
(In reply to Boris Zbarsky [:bz] from comment #1)
> Isn't this the sort of regexp with quantified capturing parens for which
> Safari and pre-irregexp Firefox will just return false even if it matches?
> 
> I just tried the testcase in Chrome and it hangs the renderer process there.
> 
> Obviously crashing is no good, nor is hanging without a slow script dialog.

It does return good results in previous versions. I just attached another version which does a replace and works fine on FF31 and Safari.
Sorry. This is the test case with replacements.( Used to work before FF32.)
Attachment #8515524 - Attachment is obsolete: true
> It does return good results in previous versions.

Sometimes.  And sometimes not.  It depends on the string.

That's what comment 2 is showing as well: this regexp is hitting the iteration limit and quitting partway through.
The regexp is used on a csv format. It used to work 100%, even on 10MB csv files with thousands of records (it has been working for years). However, in versions after FF31 the regexp hangs when the pattern doesn't match, whatever the size of the file. If you have an example of a string that gives wrong results with this regexp before FF31 or on Safari, I am interested.

If you look at the 'while' loop of the last test case (attachment 8515525 [details]), all replacements are made without problems until the very last loop when all separators have already been replaced with ##, then the program hangs, which it did not before and still doesn't in Safari. 

Same thing with the test() method of attachment 8515402 [details], the regexp used to return false, as it should, when it was not matching, now if there is no match, it just freezes.
(In reply to OW from comment #7)
> The regexp is used on a csv format. It used to work 100%, even on 10MB csv
> files with thousands of records (it has been working for years). 

And all those years, it worked out of sheer luck in the face of a correctness issue in the regexp implementation.

> However, in
> versions after FF31 the regexp hangs when the pattern doesn't match,
> whatever the size of the file. If you have an example of a string that gives
> wrong results with this regexp before FF31 or on Safari, I am interested.

It's not trivial to create failing test cases for a specific expression. However, if you look at bug 1010154 comment 4, you can see an example of string.replace giving incorrect results for a very simple regexp. If you're interested in details behind this, check bug 953013 comment 5. If you're truly dedicated, you can check out the source code for a version of Firefox that still used Yarr, compile the JS shell (or the browser), start it in a debugger and run your test in that. If you set a breakpoint in YarrInterpreter::matchDisjunction[1], you'll see the error bailout happening that I described in the above-mentioned comment.

Both bugs I linked to (and the ones blocking bug 953013) contain lots more details and discussions about this.

> If you look at the 'while' loop of the last test case (attachment 8515525 [details]
> [details]), all replacements are made without problems until the very last
> loop when all separators have already been replaced with ##, then the
> program hangs, which it did not before and still doesn't in Safari. 

Safari is still using Yarr, so they still have the potentially incorrect behavior here. In Safari, bz's test case from bug 1010154 comment 4 still returns the wrong result. That's why your expression will work in Safari but not in any other browser.

> Same thing with the test() method of attachment 8515402 [details], the
> regexp used to return false, as it should, when it was not matching, now if
> there is no match, it just freezes.

Does it really freeze for you? If so, that is a problem. For me, it shows the slow script dialog after 15 seconds, as expected.

I know it's hard to accept this, both because the issue is so subtle and because it's annoying to deal with. Nevertheless, we cannot (and don't want to) go back to the old broken behavior, because it can cause wrong results. 

[1] http://mxr.mozilla.org/mozilla-esr24/source/js/src/yarr/YarrInterpreter.cpp#1111
(In reply to Till Schneidereit [:till] from comment #8)
> It's not trivial to create failing test cases for a specific expression.
> However, if you look at bug 1010154 comment 4, you can see an example of
> string.replace giving incorrect results for a very simple regexp. 
This comment says: "All of which is to say that this is a _really_ **** regexp for the job at hand; it should just be /\s+$/.  The fact that YARR fails on it is not exactly good, of course..."
Agreed: both sides of the pipe are not an alternation as one matches the other.
It is not the same case in the present test case: /((?:\r|^)(?:"(?:[^"\r]|"")*"|[^"\r,]*)*), */g 
So if there is a reason why the engine could loop until the maximum number of iterations in the example you gave but not in this one as the terms of the alternation are exclusive.
(In reply to OW from comment #9)
> (In reply to Till Schneidereit [:till] from comment #8)
> > It's not trivial to create failing test cases for a specific expression.
> > However, if you look at bug 1010154 comment 4, you can see an example of
> > string.replace giving incorrect results for a very simple regexp. 
> This comment says: "All of which is to say that this is a _really_ ****
> regexp for the job at hand; it should just be /\s+$/.  The fact that YARR
> fails on it is not exactly good, of course..."
> Agreed: both sides of the pipe are not an alternation as one matches the
> other.
> It is not the same case in the present test case:
> /((?:\r|^)(?:"(?:[^"\r]|"")*"|[^"\r,]*)*), */g 
> So if there is a reason why the engine could loop until the maximum number
> of iterations in the example you gave but not in this one as the terms of
> the alternation are exclusive.

It's not the same case, agreed. It does, however, result in the same failure condition in the engine. If you don't believe me, please verify or prove me wrong using the steps I described in comment 8. I don't have the time to figure out the exact steps the engine takes until it - incorrectly - bails out. I know that it does, though. Whether there could be a way to execute this regexp in a reasonable amount of time and memory depends on the exact failure conditions. I have a hunch that there isn't, but I'd love to be proven wrong. Certainly no engine (that I'm aware of) has such an implementation.

Look at it this way: there is one browser that processes your regexp quickly, and that one browser has potentially incorrect behavior. All browsers with correct behavior in this area fail to process your regexp. It doesn't make sense to ask these browsers' developers to change to incorrect behavior because of that.
(And fwiw, neither Chrome nor IE even show the slow script dialog for this expression. IE just executes it until you close the tab and Chrome offers to kill the entire page after a while.)
(In reply to OW from comment #0)
> Steps to reproduce:
> 
> Here is a new test case related to bug 1084280 and bug 1091459 and not
> solved by the patches provided in these tickets. 
> takes 58ms in Safari, 157ms in FF31, freezes FF between 32 and 36 and
> crashes FF Nightly.

This doesn't crash for me, does it still crash for you on a current nightly?
See Also: → 1025347
I have tested your issue on Mac OS X 10.10 with the latest Nightly build (20160314030215) and I can't to  reproduce it. For me, it shows the slow script dialog after 5-10 seconds, as expected. I don't get any crash. Please retest the issue on the latest Firefox version, and see if you can reproduce it.
Flags: needinfo?(addons.mozilla)
Hi,
Marking this as Resolved: Incomplete due to the lack of response from the reporter.
Reporter, please feel free to reopen it if you are still having this issue on the latest Firefox version.
Status: UNCONFIRMED → RESOLVED
Closed: 8 years ago
Flags: needinfo?(addons.mozilla)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: