Closed
Bug 330569
Opened 19 years ago
Closed 18 years ago
s.replace(/<!--(.*|\n)*-->/, "") hangs Firefox (exponentially explosive regexp)
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
VERIFIED
FIXED
People
(Reporter: antonglv, Assigned: crowderbt)
References
Details
(Keywords: hang, testcase)
Attachments
(4 files, 7 obsolete files)
450 bytes,
text/html
|
Details | |
86 bytes,
text/html
|
Details | |
7.05 KB,
patch
|
crowderbt
:
review+
|
Details | Diff | Splinter Review |
5.35 KB,
patch
|
mrbkap
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ru-RU; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; ru-RU; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
to reproduce use this code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=windows-1250">
<meta name="generator" content="PSPad editor, www.pspad.com">
<title></title>
</head>
<body>
<!-- hello -->
<script language="JavaScript">
var s = document. body. innerHTML;
var d = s. replace (/<!--(.*|\n)*-->/, "");
alert (d);
</script>
</body>
</html>
Reproducible: Always
Steps to Reproduce:
1. Open code in the Firefox.
2.
3.
Comment 2•19 years ago
|
||
This is an exponentially explosive regexp, given SpiderMonkey's (and ECMA's) naive non-deterministic finite automaton. It should be caught by some static or fast dynamic checks, which mrbkap was going to add. I suspect this is a dup of a bug on his list.
/be
Whiteboard: DUPEME
Updated•19 years ago
|
Summary: s.replace(/<!--(.*|\n)*-->/, ""); method hangs up Firefox (and also IE6, but no Opera 8.5, 9) → s.replace(/<!--(.*|\n)*-->/, "") hangs Firefox (exponentially explosive regexp)
Cant we confirm this bug?
===
BTW, following can be used to strip comments from html
var s = document. body. innerHTML;
var d = s. replace (/<!--([^-]|-[^-])*-->/g, "");
alert (d);
Basically the same as bug 307456.
Reduced testcase:
function test( re, n ) {
for ( var i= 0; i <= n; ++i ) {
var t= Date.now();
re.test( Array( i+1 ).join() );
print( i, Date.now() - t );
}
}
test( /(?:,*)*x/, 22 );
In bug 307456 it is
test( /(?:,|,)*x/, 22 );
Even worse:
test( /(?:,|,|,|,|,)*x/, 10 );
Comment 5•19 years ago
|
||
midair!
Related bugs:
Bug 310405, Bug 265353, Bug 307456. These should all be consolidated. mrbkap? Care to lump these all into one place?
Comment 6•18 years ago
|
||
Comment 7•18 years ago
|
||
Nominating: this hang makes it difficult to test the RegExp implementation in Gecko.
Flags: blocking1.9?
Updated•18 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee | ||
Comment 8•18 years ago
|
||
The perl implementors have attempted to solve a majority of problematic expressions like this by caching the results of previously visited "nodes". A node is described by the program-counter as it executes the regexp, plus the position in input, plus some additional state (ie., the external count for {x,y} style expressions). If all of these things match and have already resulted in a failure, they consider the current path a failure without re-traversing it. This might not be too hard to implement for us, I'm checking it out.
Assignee | ||
Comment 9•18 years ago
|
||
Also need to track info about backreferences in the state table.
Assignee | ||
Comment 10•18 years ago
|
||
Bug 310405 is definitely -not- related to this, btw.
Assignee | ||
Comment 11•18 years ago
|
||
Unless bug 310405 is going to be re-described to be "more things in spidermonkey need to check the branch-callback"
Assignee | ||
Updated•18 years ago
|
Status: NEW → ASSIGNED
Comment 12•18 years ago
|
||
Brian, can you take this one?
/be
Assignee | ||
Updated•18 years ago
|
Assignee: general → crowder
Status: ASSIGNED → NEW
Assignee | ||
Updated•18 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 13•18 years ago
|
||
mrbkap: Would you mind describing the specific heuristics you had in mind for this? I come back to this exponential regexp problem from time to time only to be discouraged by how limited our regexp code seems to be; perhaps your perspective is more optimistic.
Assignee | ||
Updated•18 years ago
|
OS: Windows XP → All
Hardware: PC → All
Assignee | ||
Comment 15•18 years ago
|
||
So the idea here is, rather than secretly compensate for bad regular expressions, we give the user the ability to turn on or off an extension of the regex engine that causes it to give up and throw an exception for an expression which is behaving with worse than O(n^3) complexity. Actually, we offer a bit more room than that, since we count actual backtracks, not total states generated. This makes backtracking itself a tiny tiny bit more expensive, but shouldn't dramatically impact the overall performance of the regex engine (as Perl's solution does). This could also prove to be a useful tool for implementors of JS regex.
A nice enhancement to this (though I'm not sure how it would be undertaken) would be to allow the exception to be caught and still somehow resume the regex operation. I'm not sure if that's possible or not. In any case, it is highly likely that a badly-behaved expression like this could be re-architected to perform in closer-to-linear-time.
An arbitrary fudge factor of a few hundred thousand backtrack states is also permitted. Perhaps this is better done another way, also? An enhancement might be to make this fudge factor tunable by embedders.
Attachment #252372 -
Flags: review?(mrbkap)
Comment 16•18 years ago
|
||
Comment on attachment 252372 [details] [diff] [review]
exeption on exponential regex (> O(n^3))
This is basically what I had in mind. Perhaps reverse the conditions in the if statement to avoid the function call?
Attachment #252372 -
Flags: review?(mrbkap) → review+
Assignee | ||
Comment 17•18 years ago
|
||
Should be better: saves a function-call on every backtrack. Thanks to Waldo for suggestion.
Attachment #252372 -
Attachment is obsolete: true
Attachment #252376 -
Flags: review?(mrbkap)
Assignee | ||
Comment 18•18 years ago
|
||
heh, mrbkap: Waldo beat you to it by like four seconds. :) Thanks for the feedback. Any thoughts on either the tuning or the "resuming" feature?
Comment 19•18 years ago
|
||
Comment on attachment 252376 [details] [diff] [review]
Removing a function call
>Index: jsregexp.c
> return NULL;
> }
>+ gData->backTrackCount++;
>+ if (gData->explosiveRELimit &&
Nit: I think surrounding style is to explicitly compare ints with 0. Also, want to add some newlines around this to allow it to breath some?
>+ gData.explosiveRELimit = 0;
>+ if (JS_GetOptions(cx) & JSOPTION_THROW_ON_EXPLOSIVE_RE) {
>+ gData.explosiveRELimit = length * length * length; /* O(n^3) */
>+ if (gData.explosiveRELimit < 400000) /* Fudge for short strings */
>+ gData.explosiveRELimit = 400000;
>+ }
You could move the gData.explosiveRELimit into an else clause if you think it's worth it.
I don't think having a 'resume' option is worth the work it'd be. I think it would be possible (just store the gData somewhere useful and properly restore all of the required things), but let's see if it's really needed.
Attachment #252376 -
Flags: review?(mrbkap) → review+
Assignee | ||
Comment 20•18 years ago
|
||
With mrbkap's nits addressed. I sort of liked leaving out the != 0 since it looks more like testing to see if a feature is enabled, but I'll go with house style here. I also liked the "limit = 0" setting early, since it looks like init code that must be run (ie., the limit MUST be set to something), but again I'm not attached. I'll land this on the trunk this evening, if no one else has feedback, and then start looking at Firefox preferences for actually enabling it?
What about ditching the "magic number" of 400000. Would I be better off with a per-context setting in the API? JS_SetRegexMinLimit(cx, 400000) or something instead?
Attachment #252376 -
Attachment is obsolete: true
Attachment #252383 -
Flags: review+
Comment 21•18 years ago
|
||
(In reply to comment #20)
I'm not married to my suggestions, so don't take them as demands. One thing to note is that if the then clause of an if statement is braced, the else clause should be too, even if it's unnecessary.
> What about ditching the "magic number" of 400000. Would I be better off with a
> per-context setting in the API? JS_SetRegexMinLimit(cx, 400000) or something
> instead?
I hope Brendan will chime in on this question.
Updated•18 years ago
|
Whiteboard: DUPEME
Assignee | ||
Comment 22•18 years ago
|
||
Comment on attachment 252383 [details] [diff] [review]
v3, nits picked
Brendan: mind taking a look at this and addressing some of the comment questions? Thanks
Attachment #252383 -
Flags: review+ → review?(brendan)
Comment 23•18 years ago
|
||
Comment on attachment 252383 [details] [diff] [review]
v3, nits picked
>+ {"strict", JSOPTION_STRICT},
>+ {"werror", JSOPTION_WERROR},
>+ {"atline", JSOPTION_ATLINE},
>+ {"xml", JSOPTION_XML},
>+ {"explosive_re_throws", JSOPTION_THROW_ON_EXPLOSIVE_RE },
Would want a shorter option name, and to match the string to the #define name. How about JSOPTION_RELIMIT?
>+MSG_DEF(JSMSG_EXPLOSIVE_REGEX, 219, 0, JSEXN_INTERNALERR, "regular expression is exponential")
We don't know it's "exponential", just higher order than n^3 by backtrack count. Plus users don't know what exponential means. How about "regular expression too complex"? Hey, we already have that one: JSMSG_REGEXP_TOO_COMPLEX.
>+ size_t backTrackCount; /* how many times we've backtracked */
>+ size_t explosiveRELimit; /* max backtrack states to generate */
Names should be siblins: backTraceLimit for backTrackCount (/me wishes rogerl has lowercased the T, but no point in changing now).
>+ if (gData->explosiveRELimit != 0 &&
I agree with your original inclination to test as a boolean, so do that if you still prefer.
>@@ -3216,16 +3229,17 @@ InitMatch(JSContext *cx, REGlobalData *g
> JS_ARENA_ALLOCATE_CAST(gData->backTrackStack, REBackTrackData *,
> &gData->pool,
> INITIAL_BACKTRACK);
> if (!gData->backTrackStack)
> goto bad;
>
> gData->backTrackSP = gData->backTrackStack;
> gData->cursz = 0;
>+ gData->backTrackCount = 0;
Note backTrackCount zero-init'ed in InitMatch (a single-use static subroutine, but let that pass).
>@@ -3287,16 +3301,24 @@ js_ExecuteRegExp(JSContext *cx, JSRegExp
> gData.cpbegin = cp;
> gData.cpend = cp + length;
> cp += start;
> gData.start = start;
> gData.skipped = 0;
>
> JS_InitArenaPool(&gData.pool, "RegExpPool", 8096, 4);
> x = InitMatch(cx, &gData, re);
>+
>+ if (JS_GetOptions(cx) & JSOPTION_THROW_ON_EXPLOSIVE_RE) {
>+ gData.explosiveRELimit = length * length * length; /* O(n^3) */
>+ if (gData.explosiveRELimit < 400000) /* Fudge for short strings */
>+ gData.explosiveRELimit = 400000;
>+ } else
>+ gData.explosiveRELimit = 0;
>+
First, I liked your original zero-init before the if. Second, per above comment, this belongs in InitMatch, so that gData->backTrackLimit can be initialized right after gData->backTrackCount. There's no point in splitting across InitMatch and js_ExecuteRegExp arbitrarily this way. Possibly there's no point in InitMatch (but perhaps it saves some goto out mess to have a single failure return value -- anyway leave that for a later cleanup).
/be
Assignee | ||
Comment 24•18 years ago
|
||
> >+MSG_DEF(JSMSG_EXPLOSIVE_REGEX, 219, 0, JSEXN_INTERNALERR, "regular expression is exponential")
>
> We don't know it's "exponential", just higher order than n^3 by backtrack
> count. Plus users don't know what exponential means. How about "regular
> expression too complex"? Hey, we already have that one:
> JSMSG_REGEXP_TOO_COMPLEX.
I noticed this existing message when I first wrote the code, but I was worried about people writing very simple exponential expressions and boggling at the "too complex" message... I was hoping to provide something more informative (though I agree the word "exponential" is liable to be daunting). Can you think of a good compromise there, or is it better just to use the existing message regardless?
Comment 25•18 years ago
|
||
Either "too complex" is enough, or it's confusing, but anything longer would be more confusing. "likely to be too slow" comes to mind, but really ;-).
Question: are we sure other browsers don't do better here? ECMA does not bind us so much as real-world inter-operation requirements. I'd like to see results for all the bad examples listed here and in related bugs. Please post 'em here for IE, Opera, and Safari at least.
/be
Assignee | ||
Comment 26•18 years ago
|
||
The first attachment here definitely hangs IE 6 (my IE is still unusable), I will try with IE7 shortly. Safari fairly quickly displays a somewhat mysterious dialog box which contains URL, the source of the <SCRIPT> element and an "OK" button, and seems to suspend execution, halting it entirely after "OK" is pressed. Opera next.
Assignee | ||
Comment 27•18 years ago
|
||
Second case also hangs IE6, but resolves very quickly in Safari, I'm not totally clear on why.
First and second case both hang IE7.
First case yields a "stop scripts on this page" dialog from Opera, and the second case totally hangs it.
Comment 28•18 years ago
|
||
(In reply to comment #26)
> Safari fairly quickly displays a somewhat
> mysterious dialog box which contains URL, the source of the <SCRIPT> element
> and an "OK" button, and seems to suspend execution, halting it entirely after
> "OK" is pressed.
There's nothing mysterious about the dialog. It's an alert() showing exactly what it's supposed to show. Read the script again :)
Safari passes.
Assignee | ||
Comment 29•18 years ago
|
||
Then Safari's results are incorrect. Using the non-explosive equivalent provided in comment #3 here (whose expression should, imo, produce identical results), the results I get show the script with the actual regexp itself REMOVED (which is also what Safari shows, when using the non-explosive regexp). However, using the attachment from comment 0 in Safari, the result I get still shows the <!--...--> part of the regular expression from the script text. My guess is that Safari is just bailing early without producing correct results (probably also bailing early in the "quick brown fox" case, too).
Comment 30•18 years ago
|
||
Safari is correct to leave the <!--...--> part in the script in the attachment in comment 0, since the regular expression is not global and only matches the HTML comment. But if I take out the HTML comment, I see what appears to be "bailing early", like you describe.
What is special about \n? If I replace that with "z" or "\t", Safari works correctly.
Comment 31•18 years ago
|
||
Comment on attachment 252383 [details] [diff] [review]
v3, nits picked
New patch coming?
/be
Attachment #252383 -
Flags: review?(brendan) → review-
Assignee | ||
Comment 32•18 years ago
|
||
> There's no point in splitting across InitMatch and js_ExecuteRegExp
> arbitrarily this way.
It's not arbitrary, I need to know the length of the particular input string for this match execution, so that I can use it in my big-O math. Also, in theory, this option setting could change in between executions of the same regular expression, which may be a good thing (run regex, catch exception, change option, try again?)?
Everything else I agree with and have changed.
How do you feel in general about the heuristic itself, Brendan? Is O(n^3) backtracks adequate? It seems to be more than enough in my tests.... Also, how about my minimum "fudge"-factor for short strings (400000 backtracks)?
Attachment #252383 -
Attachment is obsolete: true
Attachment #255370 -
Flags: review?(brendan)
Assignee | ||
Comment 33•18 years ago
|
||
Oh, actually... ignore my last remark, I will add an argument to InitMatch. Patch shortly.
Assignee | ||
Comment 34•18 years ago
|
||
This moves some init code into InitMatch(), as suggested
Attachment #255370 -
Attachment is obsolete: true
Attachment #255372 -
Flags: review?(brendan)
Attachment #255370 -
Flags: review?(brendan)
Comment 35•18 years ago
|
||
(In reply to comment #32)
> How do you feel in general about the heuristic itself, Brendan? Is O(n^3)
> backtracks adequate? It seems to be more than enough in my tests....
Who knows if it's enough? We would have to test the web, approximated by shipping in a Firefox product release.
Is Safari just doing its own version of the slow script dialog, or is it short-circuiting to the correct answer?
So long as we are not losing compared to other browsers, this patch *may* be ok. But it's really impossible to say without more exhaustive testing against real-world input.
/be
Comment 36•18 years ago
|
||
Comment on attachment 255372 [details] [diff] [review]
one more try
>- {"strict", JSOPTION_STRICT},
>- {"werror", JSOPTION_WERROR},
>- {"atline", JSOPTION_ATLINE},
>- {"xml", JSOPTION_XML},
>- {0, 0}
>+ {"strict", JSOPTION_STRICT},
>+ {"werror", JSOPTION_WERROR},
>+ {"atline", JSOPTION_ATLINE},
>+ {"xml", JSOPTION_XML},
>+ {"relimit", JSOPTION_RELIMIT },
>+ {0, 0}
Restore old indentation to minimize patch.
>+#define JSOPTION_RELIMIT \
>+ JS_BIT(9) /* Throw exception on any
>+ regular expression which
>+ backtracks more than n^3
>+ times, where n is length
>+ of the input string */
No need for backslash continuation.
>+ size_t backTrackCount; /* how many times we've backtracked */
>+ size_t backTrackLimit; /* max backtrack states to generate */
Not max, limit (see below) -- max is maximum legal value, limit is fencepost.
>+
>+ /* Potentially detect explosive regex here. */
>+ gData->backTrackCount++;
>+ if (gData->backTrackLimit &&
>+ gData->backTrackCount > gData->backTrackLimit) {
You want >= here, consider basis case (1).
>+ if (gData->backTrackLimit < 400000) /* Fudge for short strings */
>+ gData->backTrackLimit = 400000;
Hoist magic number into a #define at file scope just above the enclosing function.
Minusing to get off my review radar (trying to trim backlog), but this is close and I'm in favor of trying it out for 1.9.
/be
Attachment #255372 -
Flags: review?(brendan) → review-
Assignee | ||
Comment 37•18 years ago
|
||
Safari uses PCRE and I talked to some of the webkit guys the other night on IRC; it's not modified. PCRE bails at (iirc) 10000000 backtracks, hardcoded, and without respect to the size of the input string.
Assignee | ||
Comment 38•18 years ago
|
||
As a note: this code DOESN'T impose a hard-limit ala PCRE, because the branch callback largely serves that purpose. Adding one would be trivial, of course.
Attachment #255372 -
Attachment is obsolete: true
Attachment #255400 -
Flags: review?(brendan)
Comment 39•18 years ago
|
||
Comment on attachment 255400 [details] [diff] [review]
patch cleanups recommended by brendan
>+#define MIN_BACKTRACK_LIMIT 400000
Blank line here, per prevailing style.
> static REMatchState *
>-InitMatch(JSContext *cx, REGlobalData *gData, JSRegExp *re)
>+InitMatch(JSContext *cx, REGlobalData *gData, JSRegExp *re, size_t length)
r=me with that, thanks.
/be
Attachment #255400 -
Flags: review?(brendan) → review+
Assignee | ||
Comment 40•18 years ago
|
||
I will land this later tonight when I have time to babysit it.
Attachment #255400 -
Attachment is obsolete: true
Attachment #255401 -
Flags: review+
Comment 41•18 years ago
|
||
So if we want this on for 1.9, we need a patch in the DOM to turn on JSOPTION_RELIMIT. And to be extra nice, reflect that as a boolean property of the options object defined in window, which reflects the javascript.options.strict and javascript.options.werror boolean prefs -- so I guess there ought to be a pref for relimit too.
This should all re-use existing code in dom/src/base/nsJSEnvironment.cpp IIRC.
/be
Comment 42•18 years ago
|
||
I bet sheppy didn't know about the pref-defaulted, page-settable options described in comment 41...
/be
Assignee | ||
Comment 43•18 years ago
|
||
js.c 3.132
jsapi.h 3.139
jsregexp.c 3.133
Leaving the bug open until I have done the DOM patch.
Assignee | ||
Comment 44•18 years ago
|
||
This is a first stab at implementing the dom side of enabling this (and providing a settable option). Not tested, do not review.
Assignee | ||
Comment 45•18 years ago
|
||
This patch allows the relimit setting to be tweaked from about:config and defaults it to false.
Attachment #256678 -
Attachment is obsolete: true
Attachment #256689 -
Flags: superreview?
Attachment #256689 -
Flags: review?(mrbkap)
Assignee | ||
Updated•18 years ago
|
Attachment #256689 -
Flags: superreview? → superreview?(jst)
Assignee | ||
Comment 46•18 years ago
|
||
Still might be worth considering some sort of very large cap on total backtracks, as well, since a long input string at O(n^3) could still take a very long time.
Comment 47•18 years ago
|
||
We don't want too many knobs on the code. Won't the operation-counter stuff igor added help run the branch callback soon enough, for a long input string and regexp not growing faster than O(n^3)?
/be
Assignee | ||
Comment 48•18 years ago
|
||
Yeah, I even made that comment myself, earlier. It just seems like the script callback takes a -very- long time to prompt me, and we could beat it with a hard-limit. I agree about not kludging this code up too much, though.
Comment 49•18 years ago
|
||
(In reply to comment #48)
> Yeah, I even made that comment myself, earlier.
Ah, so you did. Therefore...
> It just seems like the script
> callback takes a -very- long time to prompt me, and we could beat it with a
> hard-limit. I agree about not kludging this code up too much, though.
Something is wrong with the operation counter stuff and how it iteracts with the branch callback's slow-script watchdog. The latter takes some number of calls (256 IIRC) to snapshot wall time. Then it needs to be called some larger 2^n times to actually compare current time to the snapshot. Can you dig into this? I hope it's easy to adjust things so we get timely slow-script dialogs for regexp operation count limiting. Thanks,
/be
Comment 50•18 years ago
|
||
Comment on attachment 256689 [details] [diff] [review]
dom patch, and default setting
>Index: dom/src/base/nsJSEnvironment.cpp
>+ if (((optbit & (optbit - 1)) == 0 && optbit <= JSOPTION_WERROR) ||
>+ optbit == JSOPTION_RELIMIT) {
I find myself wondering if the crazy (optbit & (optbit - 1)) stuff is really more clear than just |optbit == JSOPTION_STRICT || optbit == JSOPTION_WERROR || optbit == JSOPTION_RELIMIT|.
Attachment #256689 -
Flags: review?(mrbkap) → review+
Comment 51•18 years ago
|
||
I wanted to chime in and ask -- why am I cc'd on this bug? Is there something that needs documenting here? I don't see anything obviously in need of writing up.
Assignee | ||
Comment 52•18 years ago
|
||
(In reply to comment #42)
> I bet sheppy didn't know about the pref-defaulted, page-settable options
> described in comment 41...
>
> /be
sheppy: This was for you.
Comment 53•18 years ago
|
||
I saw that comment, but I still have no idea what if anything needs documenting here.
Assignee | ||
Comment 54•18 years ago
|
||
jst: ping?
Comment 55•18 years ago
|
||
Comment on attachment 256689 [details] [diff] [review]
dom patch, and default setting
sr=jst, sorry for the delay :(
Attachment #256689 -
Flags: superreview?(jst) → superreview+
Assignee | ||
Comment 56•18 years ago
|
||
nsJSEnvironment.cpp: 1.319
all.js: 3.668
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Comment 57•18 years ago
|
||
/cvsroot/mozilla/js/tests/js1_5/extensions/regress-330569.js,v <-- regress-330569.js
initial revision: 1.1
Flags: in-testsuite+
Comment 58•18 years ago
|
||
verified fixed linux, windows, mac* shell 20070406
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•