Closed Bug 1031668 Opened 11 years ago Closed 11 years ago

Javascript RegExp hangs on specific arguments

Categories

(Core :: JavaScript Engine, defect)

32 Branch
x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: marc.chevrier, Unassigned)

Details

Attachments

(1 file)

31.11 KB, application/x-javascript
Details
Attached file regexp-freeze.js
Steps to reproduce: Launch Firefox 32 (Not reproducible with Firefox 30 or 31) execute small javascript (see in attachment) in javascript console OR install Linkificator add-on (https://addons.mozilla.org/en-US/firefox/addon/linkificator/?src=ss) and open following page: https://bugzilla.mozilla.org/show_bug.cgi?id=617564 Actual results: Firefox is completely frozen due to endless execution inside RegExp.exec function. Expected results: Firefox must not freeze in these contexts.
I get a slow script dialog in both cases. That is what should happen here: the regular expression that's executed here has exponential complexity and, given enough input, cannot run to completion in any regexp engine. The reason it "worked" in previous versions of Firefox is that our old regexp engine, Yarr, simply returned a non-match after a short amount of time, making it seem like the expression was processed successfully. For Firefox 32, we have switched to another regexp engine, irregexp, that doesn't have this bug. Marc, if you don't get the slow script dialog, can you try running in safe mode or, better yet, in a clean profile and see if that still reproduces?
Yes, I get also a dialog about slow script but the regexp does not work only in a specific case (i.e. with string 'Yes I got error still. It gave me this: Error: uncaught exception: [Exception... "Component returned failure code: 0x80570016 (NS_ERROR_XPC_GS_RETURNED_FAILURE) [nsIJSCID.getService]" nsresult: "0x80570016 (NS_ERROR_XPC_GS_RETURNED_FAILURE)" location: "JS frame :: javascript:%20window.alert(Components.classes["@mozilla.org/alerts-service;1"].getService(Components.interfaces.nsIAlertsService)); :: :: line 1" data: no]' as described in the small script attached) and works as expected on other texts and matches what it is expected to match! So, I understand the different behavior (regexp engine switch between Firefox 31 and 32) but for me, there is a problem with this new engine because does not work in some specific cases...
Regular expressions are weird beasts. Only that one works on almost all inputs doesn't mean that it works on all of them. To give a very simplified analog: imagine you have a loop over a list. The loop's result tells you something about the list, say whether it satisfies some criteria, so true or false. All the lists that loop deals with have a limited number of entries. All except for one, which is, say, 100 times longer than the others. What our old regexp engine did was effectively very similar to doing some iterations of that loop and then at some point saying "oh well, this is taking awfully long, so let's just say this list doesn't match the criteria, ok?" And so it returns false. Now it's true that in most cases we have seen of this behavior, false actually is the correct answer, so everything *seems* fine. That is what you are seeing. However, we have also seen cases where false is the wrong answer, so the engine produces incorrect results. See bug 1010154 comment 4 for an example of this. The thing is, it's impossible to know which cases would be fine and which wouldn't - we didn't finish processing the data to find the answer to that, after all. So all in all, while it's annoying I'm afraid you'll have to rework that regular expression. You might be able to optimize it by removing some more capturing groups. But really, I'm almost entirely positive that the code would be faster and much, much more understandable if you changed it to a very small parser that explicitly iterates over the string and identifies parts of it. At the very least, you could split your expression into multiple smaller ones, with each of them being responsible for parsing some part of the entire string. I'm going to mark this bug as invalid because there really isn't anything we can do, I hope you understand that based on my explanation above.
Status: UNCONFIRMED → RESOLVED
Closed: 11 years ago
Resolution: --- → INVALID
I just realized that my analog is worse than I thought, so here's a quick followup: the problem has nothing to do with how large the string is your processing. It's fairly simple to construct a regular expression that zips through megabytes of data in a few milliseconds, but if you pass in a string consisting of 100 "a" characters, it would literally take days to finish. As I said, regular expressions are weird beasts.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: