Closed Bug 13350 Opened 25 years ago Closed 21 years ago

DOM needs to police JS infinite loops, schedule garbage collection

Categories

(Core :: DOM: Core & HTML, defect, P3)

defect

Tracking

()

VERIFIED FIXED
mozilla1.7alpha

People

(Reporter: brendan, Assigned: nallen)

References

(Blocks 1 open bug, )

Details

(Keywords: arch, hang, testcase, Whiteboard: [lm][front-end: bug 78089])

Attachments

(6 files, 6 obsolete files)

The DOM code should install a branch callback that does something to let the user hit Stop if a script does "too many" backward branches; it shoud also run the JS GC now and then (every 0 mod 4096 backward branches, the MozillaClassic code would call JS_MaybeGC). I wouldn't hold anything called Beta for this, but it's a must-have for Mozilla 5.0 and derivatives. /be
*** Bug 4042 has been marked as a duplicate of this bug. ***
Target Milestone: M14
Target Milestone: M14 → M12
Since we're defining JS_THREADSAFE, we need to use JS_BeginRequest and JS_EndRequest around all outermost calls into the JS engine from native code, in order to keep the GC from racing with any thread except possibly the one whose branch callback is running the GC from a safe point (a backward branch). This is higher priority than I thought given other threads using JS (see alecf's post at news://news.mozilla.org/38037DB8.9FE79EAF%40netscape.com for more). /be
Target Milestone: M12 → M13
Target Milestone: M13 → M14
Not gonna make M13. /be
Is this gonna make M14? I fear we might be running GC too frequently now, as evidenced by news://news.mozilla.org/slrn88u5b3.pa.jlnance%40bessie.lisanance.org .
Patching to cope with performance problem: the DOM calls JS_GC for every 20 scripts or event handlers. Moving the mouse over XUL UI and HTML elements causes lots of events. Do the math! /be
Pav, can you try the patch out? /be
I checked in the performance fix (still need qfy or jprof data on it). Vidur, can you take this bug back? My sabbatical is about to start for real. /be
Back to vidur. /be
Assignee: brendan → vidur
Status: ASSIGNED → NEW
Brendan's patch didn't make it into M14 - my bad. :( I've discussed this one with Johnny Stenback who plans to work on this in M15. Brendan, feel free to take this one back if you'd like after checking with Johnny on status.
Status: NEW → ASSIGNED
Target Milestone: M14 → M15
*** Bug 30830 has been marked as a duplicate of this bug. ***
I see that brendan's GC optimizing patch is checked in but there's still the branch callback issue, trying for M16...
Target Milestone: M15 → M16
Keywords: nsbeta2
Blocks: 40901
No longer blocks: 40901
*** Bug 40901 has been marked as a duplicate of this bug. ***
It sounds like the GC scheduling issue has not been fully resolved. Brendan/vidur: is this *absolutely* essential for beta2? I'd suggest pushing post beta to cope with other (more pressing) issues.
Whiteboard: [need info]
I'm having a hard time making this call. I don't think it's a nsbeta2 show-stopper. On the other hand, I think the handling of the JS infinite loop case could be messy and I'd hate to see it pushed off yet again. Johnny has been thinking about this problem though and might be able to come up with something shortly. So how about nsbeta2 with a 6/15 deadline?
Setting to [nsbeta2+][6/15].
Whiteboard: [need info] → [nsbeta2+][6/15]
Taking this off Vidurs list.
Assignee: vidur → jst
Status: ASSIGNED → NEW
I have what I believe will fix this bug, I implemented a JS branch callback that goes the GC:ing and opens a dialog that lets the user terminate looping scripts if the script takes too long to finish. I'm not quite ready to check it in tho, need to do more testing and possibly some tuning of some parameters, could we move this to 6/22? (I'll attach the patch)
Status: NEW → ASSIGNED
Target Milestone: M16 → M17
I thought the general consensus was that we shouldn't use the dialog route, but rather let the user signal us through the stop button (or ESC keyboard analogue) that they wanted to terminate script execution. That's what we did in 4.x, after using the dialog in 3.x.
Yes, we did discuss the stop button option but from emailing with brendan and danm it quickly became apparent that doing what we did in 4.x is *far* more complicated than opening the dialog. And, because of the current thread model in mozilla there's simply *no* way I would get the stop button route to work for nsbeta2, to do that we'd haveto pull all kinds of nasty black magic eventloop tricks, and I don't have neither the knowledge nor time to do that XP for beta2 (it would according to danm not be pretty at all). The beauty of using a modal dialog is that all the event loop fun already exists and works XP in mozilla. Also, we'd need to make this work not only in mozilla but also when Gecko is embedded in other apps, in which case there might not even be a stop button. We will, at some point, need to change how the embedded app is notified about a looping script, currently I use nsIPrompt but we'd probably need a different interface for this purpose only, but that's a later problem...
*** Bug 41586 has been marked as a duplicate of this bug. ***
Cleaning up status whiteboard by marking beta2 minus (6/15 has passed) Sadly there is too much broad doomage with other beta2 bugs to get this into beta2, but clearly it should be in beta3. When you are confident of a patch, nominate for beta3 and be sure to note fix in hand.
Whiteboard: [nsbeta2+][6/15] → [nsbeta2-]
I'm nominating this again because we need the tunable parameters for when to GC and (more important) when to put up the "stop long-running script" dialog to get wide exposure, and we need another beta to finalize these parameters. Jst's work restores the Nav2/Nav3 algorithm, modulo changes in the JS engine. To get Nav4 stop button behavior will require multithreading the layout engine, or something almost as hard. So we have the final plan for Mozilla 1.0 done (in fact, jst made his deadline but was held up by a DOM WG trip, I heard). What's more, the risk of this change is extremely low, compared to the risk that we'll add this for beta3, find the parameters need to be re-tuned, and do without another beta to finalize them. So I'm asking the PDT to give this a thumb's up, rather than make a valid fix wait months. This is the kind of patch I'd approve for checkin now, if it had come from a non-netscape.com contributor. /be
Whiteboard: [nsbeta2-]
Putting on [NEED INFO] radar. PDT needs to know impact to user and risk of fix to make a call on this bug. How much testing has been done since 6/15...and what are the risks of checking this in?
Whiteboard: [NEED INFO]
As brendan says above the risk of this change is extremely low, the worst that can happen as a result of this is that a mozilla opens a dialog saying there's a script on the page that appears to be looping. For this dialog to open there needs to be a script on the page that causes ~4 million branch callbacks, I've been running for a while with some debugging code that shows the number of branch callbacks done in every script execution, and I've never seen a script that has caused more than 100 branch callbacks (except when I've intentionally executed infinite loop scripts that result in the dialog popping up), there's still a long way to 4 million. I've had different versions of this fix in my tree for weeks, and I'm very confident about this fix. This fix is absolutely necessary for the final product and the risk in checking this in now, is a lot smaller than the risk in checkin this in later in the development cycle.
Putting on [nsbeta2+] radar for beta2 fix. Check it in.
Whiteboard: [NEED INFO] → [nsbeta2+]
I just checked in the fix for this, marking FIXED.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Windows 95, build ID 2000062420 Works (pops up a dialog asking to continue or abort the script) for: while (true) {} but doesn't (freezes, need to ctrl-alt-del kill Moz) for: var i=0; while (true) { i++; } I'll attach a testcase. Reopening.
How long did you wait? while(1); may take half as long to execute ~4M backward branches as while(1)i++; because the latter has at least one bytecode in its loop body. Patience! Or are you arguing for a lower bound on branches? This is the kind of tuning feedback we need, but what's a more realistic infinitely looping program? Both while(1); and var i=0;while(true)i++ are unlikely to be in real-world scripts, outside of denial of service attacks. /be
Some crude speed measurements on a Pentium 233MHz, 64 MB: while(true) {} : 6 seconds while(true) {i++;} : 600 seconds (yes, 10 minutes) I guess there's no one size fits all lower bound for branch callbacks. And you're right, you won't find while(true){} kind of scripts in pages which are nice, but "DOM needs to police JS infinite loops" is what this bug was about, the not so nice scripts, accidental (e.g. a missing return) or on purpose. You shouldn't have to kill your browser just because you navigated to an ill-behaving page. Being able to stop executing the bad script (without having to wait 10 minutes before being able to ;) ) is more graceful.
As a comparison, with IE5 on the same machine: while(true) {} : 3 seconds while(true) {i++;} : 6 seconds
Peter, try an optimized version of the JS shell for comparison (build it on Windows via cd js\src; nmake /f js.mak "CFG=jsshell - Win32 Release"; and run via Release\jsshell; on Linux, use Makefile.ref after setting BUILD_OPT in your environment). I get times similar to the ones you report for IE, on my home machine (733MHz Pentium). The JS engine has not been optimized for time much, so I don't expect that we beat IE, but we don't lose by as much as your test that took 10 minutes shows. That, I believe, must be due to some high overhead in the DOM code, involving GetWindowProperty, perhaps nsJSUtils::nsGetNativeThis (which need not be called for property gets such as the variable i in while(true){i++;} -- but which is currently called for all gets), or nsJSUtils::nsCallJSScriptObjectGetProperty(), or both. It would be good to run jprof or quantify and find the unduly high cost. Anyone? /be
I should say that I hacked js.c to define a branch callback that stops scripts at 0x400000 backward branches or returns. I get about 2 and 5 seconds, using my watch, on my 733MHz machine, for 'while(true);' and 'var i=0;while(true)i++;' respectively. Attached next is my patch to js.c. /be
Brendan, I was using one of the nightly builds on Windows, which are optimized builds as I understand things. Just for "fun" I've tried these same tests on my Pentium 133 MHz 48 MB laptop running Linux, Moz build ID 2000062509: while(true) {} : 9 seconds while(true) {i++;} : >3060 seconds (>51 minutes) If you feel Mozilla, after optimisation, will adequately be policing all JS inifinite loops, then I guess we can close this bug report. I think however this is a problem inherent to the current solution. After 0x400000 branch callbacks, the amount of time passed depends on the time the instructions need, which depends on the speed of the machine they're executed on. This can of course be reduced by speeding up the JS engine, but there will always be instructions which take a long time to execute, which to a user will look like a frozen Mozilla. I also highly doubt the magic number (0x400000) can be adjusted adequately to cover all possible cases reasonably. I wish I could offer a solution instead of sounding like I'm nagging all the time (hey, who still _uses_ 133MHz machines ;-) )...
disttsc: I didn't say anything about optimization solving the 10 minues (or 51 minutes on a slower machine) problem. That's likely a bug in the form of high overhead in dom/src/base/nsJSWindow.cpp:GetWindowProperty (specifically, in the static methods it calls, unnecessarily, for all property accesses). Oh, and of course SetWindowProperty does likewise, and is used for 'var i=0;while(1)i++;' in the loop body (i++ does a Get and then a Set). My point was that the js shell (optimized, of course -- no point in measuring only debug, which is built by default) does not suffer the slowdown introduced by adding i++ to the loop body. That exculpates the JS engine, I believe, and points the finger at the DOM glue code, specifically [GS]etWindowProperty. Your point that other infinite loops with slower bodies could be constructed is well-taken. Perhaps the DOM branch callback could dead-reckon by looking at time using PR_Now, every 0x40000 or so branches. If "too much" wall clock time has passed, put up the dialog. jst, what do you think? In any event, we need to profile mozilla to see why while(1)i++ is so badly slow compared to the js shell testcase. /be
Brendan, my apologies, I misread you there... s/speeding up the JS engine/reducing the high overhead in the DOM code/ I'm glad you saw the point of my argument despite it. Unfortunately I don't have the time to do builds here, but if you could send me something so I can do comparison tests under Windows 95 and Linux, I'd be happy to take the time. Shall we open another bug report for "Mozilla slower than JS shell testcase for while(1)i++;" and not clutter this one?
*** Bug 31730 has been marked as a duplicate of this bug. ***
A script in this document has been running for more than 30 seconds. This may be the result of an error in the document. Do you want to continue running this script? [/] Ask me again if the script doesn't finish soon ( Stop ) ((Continue))
Whiteboard: [nsbeta2+] → [nsbeta2+][ETA 7/14]
Keywords: nsbeta3
Whiteboard: [nsbeta2+][ETA 7/14] → [nsbeta2-][ETA 7/14]
We will worry about this in beta 3... Marking beta2- because we would not pull beta 2 off the wire for this.
*** Bug 47348 has been marked as a duplicate of this bug. ***
Marking nsbeta3+...
Status: REOPENED → ASSIGNED
Whiteboard: [nsbeta2-][ETA 7/14] → [nsbeta2-][ETA 7/14][nsbeta3+]
The fix that Johnny is suggesting here is to show the dialog when the time between branch callbacks exceeds a certain upper limit. Currently, the dialog is shown when the number of branch callbacks exceeds a certain upper limit. For certain loops, it takes an unacceptably long time to reach that upper limit and the dialog does not show up "early" enough. Thoughts?
Nisheeth: if the idea is to check PR_Now or equivalent every 0x40000 or similar branch callbacks, and put up the dialog if too much wall-clock time has past, I am all for it. Jst, how about a patch? /be
I have a patch in one of my trees hidden on a server somewhere back home in finland, I'll dig it up as soon as I get to this bug...
This bug has been marked "future" because the original netscape engineer working on this is over-burdened. If you feel this is an error, that you or another known resource will be working on this bug,or if it blocks your work in some way -- please attach your concern to the bug for reconsideration, but do not clear the nsbeta3- nomination.
Whiteboard: [nsbeta2-][ETA 7/14][nsbeta3+] → [nsbeta2-][nsbeta3-][lm]
Target Milestone: M17 → Future
Code review on the latest patch: - MAYBE_STOP_BRANCH_TIME_MASK is not (power-of-two - 1), nor is it the same as the MAYBE_GC mask as its comment claims: #define MAYBE_STOP_BRANCH_TIME_MASK 0x0000dfff // Same as the above for now... Was it tuned empirically? Could be good, just needs a better comment if not a different value. - mFirstBranchCallback is initialized with ' = 0' but that won't work on all platforms. Use LL_ZERO. - mFirstBranchCallback is reset to PR_Now when an "infinite" loop is declared. But that might be many seconds before another script starts on the context, eh? Oh wait -- ScriptEvaluated sets mBranchCallbackCount = 0 which will cause the first subsequent branch callback to resample PR_Now. I don't see the need for ctx->mFirstBranchCallback = PR_Now(); in the "infinite" case -- in fact PR_Now was just called a few instructions earlier, in the LL_SUB macro's second argument. Either cache that in a local variable, or just strike the above assignment from the "infinite" loop case. Looking to r,a=brendan@mozilla.org a new patch -- let's get this in. /be
Attached patch Updated patch. (obsolete) — Splinter Review
Depends on: 52579
Brandan, I just attached an updated patch that should clean things up alot and fix the issues you mentioned with the last patch. The patch is set up to let the user press 'Cancel' in the dialog to let the script continue for as long as it took to get the dialog in the first case, that's why I'm resetting ctx->mFirstBranchCallback more often than you might think is necessary. The parameters are empirically tuned and I'm guessing they could use some more tuning now that Vidur made accessing global properties much faster, but I didn't have time to do any tuning yet. OTOH, even if the patch goes in as is, the only downside to having the parameters as they are right now is that we call PR_Now() every 4096th branch callback, and that's probably not expencive enough to worry about, is it? (scripts that actually executes 4096 branches seem to be pretty rare) Comments? (btw, this patch depends on bug 52579 being fixed)
CC:ing jband.
Keywords: dom0
Keywords: dom0
Keywords: nsbeta2, nsbeta3arch, helpwanted
Whiteboard: [nsbeta2-][nsbeta3-][lm] → [lm]
Blocks: 30942
This bug has a patch that is waiting for approval if i'm not mistaken. As there are lots of bugs depending on this in the depency tree - let's get this patch in.
Keywords: patch
hwaara: don't worry about the bugs "blocked" by bug 30942, those aren't real depenencies.
Looks pretty good, but one nit: + // Filter out even more of the calls to this callback + if (ctx->mBranchCallbackCount & MAYBE_STOP_BRANCH_TIME_MASK) + return JS_TRUE; + + PRTime duration; + LL_SUB(duration, PR_Now(), ctx->mFirstBranchCallback); + // Filter out most of the calls to this callback that make it this far - if (ctx->mBranchCallbackCount & MAYBE_STOP_BRANCH_COUNT_MASK) + if ((ctx->mBranchCallbackCount & MAYBE_STOP_BRANCH_COUNT_MASK) && + LL_CMP(duration, <, MAX_SCRIPT_RUN_TIME)) { + // If we get here we have not yet executed enough branches nor spent too + // much time running this script to notify the user so we simply return + // here... return JS_TRUE; + } It confused me that you did the LL_SUB, then checked the COUNT_MASK next. How about: + // Filter out even more of the calls to this callback + if (ctx->mBranchCallbackCount & MAYBE_STOP_BRANCH_TIME_MASK) + return JS_TRUE; + if (ctx->mBranchCallbackCount & MAYBE_STOP_BRANCH_COUNT_MASK) + return JS_TRUE; + + PRTime duration; + LL_SUB(duration, PR_Now(), ctx->mFirstBranchCallback); + // Filter out most of the calls to this callback that make it this far - if (ctx->mBranchCallbackCount & MAYBE_STOP_BRANCH_COUNT_MASK) + if (LL_CMP(duration, <, MAX_SCRIPT_RUN_TIME)) { + // If we get here we have not yet executed enough branches nor spent too + // much time running this script to notify the user so we simply return + // here... return JS_TRUE; + } instead? /be
*** Bug 83854 has been marked as a duplicate of this bug. ***
*** Bug 86321 has been marked as a duplicate of this bug. ***
Seeing that the last proposed patch for this bug was posted more than 3 months ago, and yet it has not been fixed yet, I think the severity for this bug should be at least <i>major</i>, the reason for this being that the more people actually start using Mozilla for doing real work, the more of them will likely stumble on this. It's quite annoying when you have a few browser windows open, and then the browser crashes because of a typo in the code.
You are correct. My bug was a Major/Critical level one, and dupped against this, thus this has to be equivalent. Marking Major and adding hang keyword. I'm also nominating for 0.9.3/1.0 since I don't think we should ship with a browser that can be hung by a web author's possibly malicious javascript (nscatfood implied, and at a pinch dataloss, but I won't add that one myself).
Severity: normal → major
This is *so* not nsCatFood, this really doesn't prevent you from using the product axcept if you end up on a malicious page that exploits this problem. FYI, it's possible (but more complicated) to hang IE, and probably every other browser out there too (well, NS4.x is pretty good about not hanging in situations like these). The reason the attached patch hasn't been checked in is that I haven't had the time to make sure that it doesn't kick in in cases where it shouldn't, and the attached patch is not perfect, so the problem won't go away completely even if we check in the fix. If someone wants to fix this, be my guest, but I don't have time to spare on this any time soon (and I have more important bugs to fix), targetting for mozilla1.0, but don't hold your breath.
Keywords: nsCatFood
Target Milestone: Future → mozilla1.0
*** Bug 99889 has been marked as a duplicate of this bug. ***
*** Bug 100504 has been marked as a duplicate of this bug. ***
*** Bug 101187 has been marked as a duplicate of this bug. ***
*** Bug 114007 has been marked as a duplicate of this bug. ***
*** Bug 114227 has been marked as a duplicate of this bug. ***
Any news here? Too many dups at last time. Visit this link to see how lite it is to hang mozilla: http://www.cyberarmy.com/crash.shtml
I think the resason this "bug" should be fixed is because it is just not one browser window that gets frozen, it is the whole Mozilla program. That means that all other browsers will also freeze. Email will freeze. Any mail you were writting gets lost. Webpages you were making will not be saved. Calendar and IRC will not work either. It does not matter if IE handles this better or Opera or whatever. It should be a required feature by the 1.0 release that we detect and thwart any attempts, premeditated or not, at hanging the browser.
Don't expect this to be fully fixed for mozilla1.0, if ever. There are just too many possible DOS attacks in mozilla, and all other browsers as well, to expect to get those fixed any time soon. It's just not worth the enormous amount of work that it takes to plug all holes, but I'll leave this bug at mozilla1.0 for now since we might be able to improve some things w/o spending too much time on it...
Posted on bugtraq today by Pavel Titov: <script> for(i=0;i<100000000;i++) { document.write("<img src=http://fakehost.com/"+i+".gif>"); } </script> Browser reaction Mozilla 0.9.6 (build 2001112009) @ Win2K - freezes, ~98% CPU utilization and big memory leak (bigger than 1Mb per second).
I'm not surprised, at all.
*** Bug 120554 has been marked as a duplicate of this bug. ***
Blocks: 51139
jst, should your time-based policing go in for 0.9.9? /be
*** Bug 127247 has been marked as a duplicate of this bug. ***
Should't memory bloat caused by code inside a loop also be policed? For instance: var a = "TESTTESTTESTTESTTESTTESTTEST"; while (1) { a += "TESTTESTTESTTESTTESTTESTTEST"; } Will quickly bloat the memory to an unreasonable amount, and then Mozilla will become as sluggish as a sloth. The chances of the dialog coming up in that case is very small if Mozilla doesn't catch something like that before it does a lot of damage. Chances are, someone wouldn't in their right mind loop thousands of times increasing the length of a string each time and therefore this isn't a general document memory issue imho.
Moving Netscape owned 0.9.9 and 1.0 bugs that don't have an nsbeta1, nsbeta1+, topembed, topembed+, Mozilla0.9.9+ or Mozilla1.0+ keyword. Please send any questions or feedback about this to adt@netscape.com. You can search for "Moving bugs not scheduled for a project" to quickly delete this bugmail.
Target Milestone: mozilla1.0 → mozilla1.2
Not so fast, there's a patch in this bug. /be
Keywords: mozilla1.0mozilla1.0+
Target Milestone: mozilla1.2 → mozilla1.0
*** Bug 129742 has been marked as a duplicate of this bug. ***
*** Bug 126444 has been marked as a duplicate of this bug. ***
*** Bug 131226 has been marked as a duplicate of this bug. ***
I tested with 2002.03.26 nightly build on Win2k. Mozilla freezes on page: http://www.cyberarmy.com/crash.shtml but IE6 on WinXP does not freeze. I think we must fix crash bugs (like this) before 1.0
*** Bug 136324 has been marked as a duplicate of this bug. ***
This testcase is from bug 136324. The policing code doesn't catch it because the script is waiting for user input before the next step.
Arthur: the case where javascript is waiting for user input is bug 61098 or bug 59314.
Attachment #78308 - Attachment is obsolete: true
*** Bug 137702 has been marked as a duplicate of this bug. ***
Blocks: 77271
Whiteboard: [lm] → [lm][wgate]
Comment 41 suggests that we fix the dialog which comes up occasionally.
When we do catch a script running too long, we need to decide what to do, the options i'm thinking of are: A. stop this script now (it's run too long) B. stop this script now, and stop other scripts when they run too long C. continue this script, but prompt me the next time a (or this) script runs too long D. let the script run forever imo a good system should allow embedders to decide which of these steps to take without showing the options to the user and without hacking the c++ source files... bc suggests that if i'm forced to drop one of A or B i should drop A, his reasoning is that if a user is attacked by a DOS page then the user needs to be able to stop everything (B), whereas A won't help. [mpt suggested that I flip a coin to pick, so after mpt ducked the coin, I relied on bc] The patch I attached doesn't handle B or D because I haven't figured out how to leave a note reminding mozilla that the user made a decission for the script. [Part of this is that we keep running into serious architecture flaws in the prompt service. I'll try and get an improved flavor for 1.1 or 1.2 but that won't help us now] So anyways... shaver suggested that I set private data somewhere to remember what the user said about the script. It turns out that I can't set it on the script itself because the script isn't a jsobject. I might be able to ask the js context for the js object that owns the script, or I might have to store my data on the context. I'm looking for advice from jst or rginda or anyone who can offer it. Anyways a bit about this patch 1. it makes the errors localizable 2. it removes 'abort' from the message (mpt has a rant about abort somewhere) 3. it replaces OK/Cancel (which don't make sense as answers to a Yes/No question) with Continue/Stop (which also don't make sense, but which at least have some mapping -- this is a continuous debate between mpt and I) 4. it makes it easier for embedders to decide that error cases should stop scripts instead of continuing them forever [this point is negotiable, esp the wording] 5. it shows my thoughts about how to store the remember flag. As for the order of Continue/Stop, that's another issue w/ the prompt service api. The code I'm proposing minus all of the commented out/noop changes is a straight exchange. If we really want the other order, either we should fix the prompt service to handle order for us, or we will have to change the return !ret; line at the end of the code.
Attachment #82836 - Attachment description: stage 1 fix for the warning dialog → http://bugzilla.mozilla.org/show_bug.cgi?id=78089
Attachment #82836 - Attachment is obsolete: true
Whiteboard: [lm][wgate] → [lm][front-end: bug 78089]
I wrote a checkers program that may do a hell of a lot of iterations before stopping, depending on the depth. Random pop-ups warning the user that the the game is hung really sucks. This 'feature' is absolutely unnessary, and just causes huge headaches to people who want to write complex scripts. PLEASE DO NOT PUT THIS BUG IN!!!! IT SCREWS MY LEGITIMATE SCRIPTS!
There should not be a dialog for stopping scripts that run too long. The problem is not that scripts can run too long, but you can't close a window or hit the back button while a script is running. Adding a dialog doesn't really fix the problem, and as Stephen points out, a dialog can be annoying at pages that have slow scripts. Opera has a better solution, which is to keep the UI active while a script runs. Let's fix this correctly before advertisers figure out they can force you to look at an ad by looping for four seconds.
oh cut it out both of you. the dialog already exists, and if you haven't hit it that means you probably don't need to worry about it. i'm just trying to improve the text for the dialog. if anyone wants to rewrite gecko so that it's threadsafe and so that the ui can be drawn from multiple threads, then i think mozilla.org would gladly accept your work (in a few years, when you finish).
Grr. I do see the dialog box with my checkers script! Its nasty and there is nothing I can do it. Why are so many programmes like lemmings running off a cliff, repeating the mistakes of other ad infinitum. The dialog box is just plain stupid, all it does is get in the way of legitimate scripts.
well with my fix you'll be able to make the dialog go away at the risk of shooting yourself in your foot. you should take my patch as an improvement. the dialog also gets in the way of illegitimate scripts like the one in the url field. currently mozilla's architecture does not allow us to do what we need to do to avoid this dialog.
Responding to Timeless... >the dialog already exists, and if you haven't hit it that means you probably >don't need to worry about it. The dialog currently only appears for computationally intensive javascript that doesn't access the DOM. If I understand this bug (13350) correctly, fixing it would make the dialog also appear when the page uses the DOM. I don't mind if we use a dialog for now, but I'd like to know the bug number for eventually getting rid of the dialog. It might be bug 40848, "each docshell should run on its own thread", or bug 30942, "browser should remain responsive during infinite loops".
*shrug* as i said in comment 88 my patch really didn't belong here and really doesn't interact with this bug. i blame mpt for not knowing my bugs better than I do.
As Author pointed out, all you have to do is stick an alert in the box and you get around the protection. IMO, having a dialog popup is potentially annoying, and doesn't really deal with a DOS. And if you're agile enough you just hit Ctrl-Q after clicking on the alert ;-) It might help with a poorly written script. But so what, if I go to a site with such a thing, I'll go there once and I won't be back. I guess I'm in the camp that we need a good fix. If a window hangs, the user should be able to close/stop it. Maybe that's posting an event that would set a flag that could then be checked at the branch and things could exit/stop nicely from there. At that point the user could be informed a script is running and do they wish to terminate it. The "Alert" case would still be an issue.
*** Bug 138599 has been marked as a duplicate of this bug. ***
*** Bug 152180 has been marked as a duplicate of this bug. ***
*** Bug 150374 has been marked as a duplicate of this bug. ***
Does the bug I find in: http://146.164.2.21:4505/ALEPH/-/start/PERIOD belong here? The browser loops forever with the message: unknown DOM In konqueror and ie, it works all right.
No, that sounds like an unrelated evangelism bug, please file a separate bug on that site.
*** Bug 158372 has been marked as a duplicate of this bug. ***
From the duplicate bug 158372. Notice the increment of |z| inside the body of the loop: <script> z = 100; for (i=0; i<z; i++) { document.write(i); z++; } </script> The script can be found at http://www.times.co.il/test.html Actual Results: Mozilla stops responding. Expected Results: Mozilla should block the script.
Target Milestone: mozilla1.0 → mozilla1.1alpha
*** Bug 153108 has been marked as a duplicate of this bug. ***
*** Bug 157434 has been marked as a duplicate of this bug. ***
re comment 96 by dbradley (Alert()s would bypass protection), you could enforce a time delay between alerts, perhaps one that grows (within bounds) if the code continues to throw alerts with little or no delay. This may have downsides as well, but if you let it throw a number of alerts before starting to delay, it might work. Or you could add a "stop script" button to Alert()s after a script had thrown N Alert()s. On the other hand, worrying about that case may be not worthwhile; part of this issue is about code that accidently chews the system.
*** Bug 176723 has been marked as a duplicate of this bug. ***
Target Milestone: mozilla1.1alpha → ---
BuildID 2002110204, i386 Linux 2.4 Found "in the wild": http://members.cox.net/xoclipse/open.html Ctrl-Q fails to quit the browser for this page. (I had to remotely log in to the machine and kill -9 the process... you have been warned.)
Did i missed something? I run Attachment 10630 [details] and i Mozilla asked me if i want to run this script (it may hang Mozilla)... So is this bug about JS police? (WFM for me) or garbage collector? (I was sure JS shell has it, but if no - we need any testcase for this)
Mass-reassigning bugs to dom_bugs@netscape.com
Assignee: jst → dom_bugs
Status: ASSIGNED → NEW
Blocks: 203448
Keywords: testcase
Priority: P3 → --
*** Bug 211950 has been marked as a duplicate of this bug. ***
*** Bug 211951 has been marked as a duplicate of this bug. ***
*** Bug 220892 has been marked as a duplicate of this bug. ***
*** Bug 226622 has been marked as a duplicate of this bug. ***
*** Bug 228759 has been marked as a duplicate of this bug. ***
-> me I'm going to clean up jst's old patch which makes the infinite loop checking be time based. I think we can close this bug after getting that in. There are already bugs out there for changing the dialog text, making the dialog unnecessary, and solving other js resource hogging problems.
Assignee: general → nallen
Severity: major → normal
Priority: -- → P3
Target Milestone: --- → mozilla1.7alpha
I've done some benchmarking to get a feeling for callback frequency on various computers. I looked at three different kinds of loads: "heavy"- array sorting, expensive math operations that produce very few callbacks "normal"- arithmetic, string operations that the current script stopping is well tuned for "light"- tight loops that produce many callbacks The last patch had a time limit of 5 seconds and a callback limit of 0x400000. For normal loads, these two values are equal for a ~1.6 GHz processor. Heavy and light loads are more of a problem. On a 150 MHz machine, the heavy load generates 0x8000 callbacks per second. This would be a roughly 11 minute wait for the script dialog with the current system. On a 3.2 GHz machine, the light load generates 0xA00000 callbacks per second. The script will be blocked after about a third of a second. There is also considerable variability in callback rates across processors models that otherwise have comparable performance (Athlons having rather more for certain heavy loads, Celerons having rather less for many light and heavy loads). The conclusions I came to were: Cutting scripts off based on time rather than callback counts would be a noticeable improvement. The 5 second limit in the original patch is reasonable based on the historical callback cutoff. The interval between clock checks can safely be raised from every 0xFFF callbacks to every 0x7FFF callbacks with little change in responsiveness. Most users (500 MHz+, moderate system load) would have nearly all scripts cut off at most 10% (~0.5s) over the limit time. Nearly all uers would have nearly all scripts cut off at most 20% (~1.0s) over the limit time.
Attached patch Police JS scripts by run time (obsolete) — Splinter Review
Low impact patch to interrupt running JS scripts based on wall clock time rather than callback count.
Attachment #12932 - Attachment is obsolete: true
Attachment #14683 - Attachment is obsolete: true
Attachment #140212 - Flags: superreview?(peterv)
Attachment #140212 - Flags: review?(jst)
Comment on attachment 140212 [details] [diff] [review] Police JS scripts by run time >@@ -419,6 +427,12 @@ > if (!ctx) > return JS_TRUE; Question for jst and peterv: is this necessary? Which JSContexts have null private data but a non-null branchCallback? I say none should, so this null test is not necessary, and we can save a (predicted-not-taken, but still) branch here. >+ // If this is the first time we've made a callback, we set >+ // ctx->mBranchCallbackTime to PR_Now(). This will happen again when the >+ // user chooses not to interrupt a script. >+ if (!ctx->mBranchCallbackCount) >+ ctx->mBranchCallbackTime = PR_Now(); Why not wait, and do this below, where you call PR_Now again, making for only one PR_Now call, and only after MAYBE_GC and MAYBE_STOP rollover? >+ >+ PRTime duration; >+ LL_SUB(duration, PR_Now(), ctx->mBranchCallbackTime); Save this PR_Now in a local. >+ >+ // Check the amount of time this script has been running >+ if (LL_CMP(duration, <, MAX_SCRIPT_RUN_TIME)) Systems without long long will choke at compile time here. You must declare a PRTime maxtime; and LL_I2L(maxtime, MAX_SCRIPT_RUN_TIME); to set it. >+ return JS_TRUE; >+ >+ // Reset the branch callback count so that scripts can run this long again >+ // if the user chooses not to interrupt the script >+ ctx->mBranchCallbackCount = 0; Set it to the saved PR_Now return value. Thanks for patching this bug! /be
Comment on attachment 140212 [details] [diff] [review] Police JS scripts by run time This looks good n' all, but while you're at it, wanna make the code read the number of seconds before stopping from a pref in stead of hard coding it? I.e. read "dom.max_script_run_time" or somesuch (and make sure you don't let numbers wrap in the math when converting that into a number of usec)? IOW, read the pref on startup and store the number of us in a static PR_Time variable.
Attachment #140212 - Flags: review?(jst) → review-
Re: my last comment: obviously you need to avoid subtracting the initially-zero mBranchCallbackTime from PR_Now and declarin an iloop the first time. But you should be able to avoid the early if(zero)init test in the critical path, moving it down to the LL_SUB case. If mBranchCallbackTime hasn't been set yet there, set it and wait another turn. /be
jst: >IOW, read the >pref on startup and store the number of us in a static PR_Time variable. Did you mean program startup or startup for each script (I assume the former because you said static)? The reason I'm asking is because it would probably be useful if this value could be changed dynamically through about:config without restarting the browser by people debugging their scripts (with the possible risk of currently running scripts triggering the cutoff). I don't really know this part of the code that well, so I'm wondering if that's a reasonable thing to ask for.
I meant on app startup, or module startup, really. I'd put the code in nsJSEnvironment::Init(). I'm not sure it's worth the extra code to make this pref changeable at runtime, how often would that be neede, really? (especially given that the current code isn't changeable at all, and few complain, about that part.)
I agree its not something that is worth holding up the patch if its not going to be something that will be of high demand. I guess we should not complicate things, keep it simple and just wait and see what the result is.
Attached patch Patch w/ pref (obsolete) — Splinter Review
Nits picked, pref added. Saving PR_now for resetting later doesn't work. By the time you close the dialog it's likely already now + 5 secs so the script just gets blocked again on the next MAYBE_STOP. I'll leave the does ctx needs null checking question to jst.
Attachment #140212 - Attachment is obsolete: true
Attachment #140284 - Flags: superreview?(brendan)
Attachment #140284 - Flags: review?(jst)
Attachment #140212 - Flags: superreview?(peterv)
Comment on attachment 140284 [details] [diff] [review] Patch w/ pref Cool if you can't combine PR_Now calls, just get 'em off the critical path. Still, you can combine at least two PR_Now calls with a now local: >+ // If this is the first time we've made it this far, we start timing how >+ // long the script has run >+ if (LL_IS_ZERO(ctx->mBranchCallbackTime)) { >+ ctx->mBranchCallbackTime = PR_Now(); >+ return JS_TRUE; >+ } >+ >+ PRTime duration; >+ LL_SUB(duration, PR_Now(), ctx->mBranchCallbackTime); >+ Seems like a codesize win, although the optimizer should do it for you (but gcc probably won't -- go on, gcc, surprise me ;-). Note this #define and string constant: > #define JS_OPTIONS_DOT_STR "javascript.options." > > static const char js_options_dot_str[] = JS_OPTIONS_DOT_STR; You want a similar string constant (no need for the #define) for the new pref you're adding here: >@@ -1926,10 +1957,25 @@ > if (manager) { > PRBool started = PR_FALSE; > rv = manager->StartupLiveConnect(sRuntime, started); > } > #endif /* OJI */ >+ >+ // Initialize limit on script run time >+ PRInt32 maxtime = 5; >+ nsCOMPtr<nsIPrefBranch> prefs = do_GetService(NS_PREFSERVICE_CONTRACTID); >+ if (prefs) { >+ prefs->GetIntPref("dom.max_script_run_time", &maxtime); and pass that string constant here and earlier, in the pref observer jazz. /be
While you're debating this patch, I was wondering whether it would be possible to change the labels of the confirmation (probably by directly accessing the prompt service) as "OK" to abort the script and "Cancel" to continue the script are somewhat confusing... perhaps just swapping the optons around would make more sense i.e. "Do you want to allow the script to continue execution?"
While we are removing dialog boxes for other errors, do you think that a modal dialog box is the best choice? Could we stop the script, and show an icon in the status bar that will let them start the script again if they click on it and hit "OK"?
Let's make the buttons read "Let script continue" and "Abort script"
erm. let's *not* use the word abort. there's a rant *somewhere* about it. thank you.
Re: comment 128: sorry, without multi-threading Gecko and separating JS execution from the thread that services the window's event loop, we can't do anything like what you propose. Nor should we sweat it. This is a rare, hard case, and as they teach in law school, hard cases make bad law. Let's not overengineer the solution that we and others have used since I implemented it in Nav2.0 in 1995. It's not as if many users actually run into the branch limit. /be
I won't be able to get back to this for the next few days due to travel. An open question is whether we want an observer for this pref. Is there really enough of a demand to justify the extra code? I think there isn't from what I've seen.
nallen: good point, I misremembered the code (shoulda looked) and thought there was some way you could piggyback on an existing observer. It doesn't seem worth the extra code in light of the "hard case => bad law" argument, unless there's an unfoolish-consistency win here. /be
For the record, there are other bugs (bug 145523, bug 78089) that deal with what this dialog box says. Let's keep the comments about what this says out of this bug which simply deals with when this dialogue appears, not with what it says.
Comment on attachment 140284 [details] [diff] [review] Patch w/ pref + // Initialize limit on script run time + PRInt32 maxtime = 5; + nsCOMPtr<nsIPrefBranch> prefs = do_GetService(NS_PREFSERVICE_CONTRACTID); + if (prefs) { + prefs->GetIntPref("dom.max_script_run_time", &maxtime); + if (maxtime < 1) { + maxtime = 1; + } This'll possibly (depending on what GetIntPref() does when the pref doesn't exist, better to not assume it does, and will continue to, not set the out param on failure) cause maxtime to be 1 second if the dom.max_script_run_time pref is not set. Make the GetIntPref() call set a temporary local, and only set maxtime if the call succeeds, and then do the min check. Index: modules/libpref/src/init/all.js ... +pref("dom.max_script_run_time", 5); And if you do the above, you don't eve need to add this (though I know we do add prefs with defaults to all.js in general, even if I disagree with the need to do that :-)
Attachment #140284 - Flags: review?(jst) → review-
Remaining nits picked.
Attachment #140284 - Attachment is obsolete: true
Attachment #140971 - Flags: superreview?(brendan)
Attachment #140971 - Flags: review?(jst)
Comment on attachment 140971 [details] [diff] [review] Merge Now() calls, more defensive pref use Looks good to me. One more thought below, take it or leave it. > // Run the GC if we get this far. > JS_MaybeGC(cx); > >- // Filter out most of the calls to this callback that make it this far >+ // Filter out even more of the calls to this callback > if (ctx->mBranchCallbackCount & MAYBE_STOP_BRANCH_COUNT_MASK) > return JS_TRUE; > >+ PRTime now = PR_Now(); >+ >+ // If this is the first time we've made it this far, we start timing how >+ // long the script has run >+ if (LL_IS_ZERO(ctx->mBranchCallbackTime)) { >+ ctx->mBranchCallbackTime = now; >+ return JS_TRUE; >+ } >+ >+ PRTime duration; >+ LL_SUB(duration, now, ctx->mBranchCallbackTime); Avoid LL_SUB here by redefining mBranchCallbackTime to be a deadline, setting it to now + sMaxScriptRuntime (using LL_ADD, natch). One LL_ADD every deadline update instead of many LL_SUBS. Should be same code footprint. I'm sure this is not going to change things noticeably, but it seems like a slightly better way to code it. Thanks again for doing this (and I promise I'll get to the Array.prototype.sort merge-sort patch before tomorrow!). /be
Attachment #140971 - Flags: superreview?(brendan) → superreview+
Comment on attachment 140971 [details] [diff] [review] Merge Now() calls, more defensive pref use > JSBool JS_DLL_CALLBACK > nsJSContext::DOMBranchCallback(JSContext *cx, JSScript *script) > { > // Get the native context > nsJSContext *ctx = NS_STATIC_CAST(nsJSContext *, ::JS_GetContextPrivate(cx)); Someone asked if it's safe to remove the if (!ctx) check after this line, and I apparently forgot to reply to that question, so I'm replying now :-) It should indeed be safe to remove that, in no case should we get here on a context w/o its private data being an nsJSContext, so checking is a waste of cycles. AFAICT. And feel free to take Brendan's suggestion to flip the logic around here to store the deadline as a member. r=jst
Comment on attachment 140971 [details] [diff] [review] Merge Now() calls, more defensive pref use JSBool JS_DLL_CALLBACK nsJSContext::DOMBranchCallback(JSContext *cx, JSScript *script) { // Get the native context nsJSContext *ctx = NS_STATIC_CAST(nsJSContext *, ::JS_GetContextPrivate(cx)); Someone asked earlier if it was ok to remove the if (!ctx) check after this call, and I apparently forgot to answer. AFAICT it is indeed ok to remove that check, we should in no situations end up here with a JSContext whose private data is not a valid nsJSContext. So I'd say remove it. And feel free follow Brendan's suggestion and flip this logic around to make the context store the deadline. r=jst
Attachment #140971 - Flags: review?(jst) → review+
I tried removing the if (!ctx) check and saw no problems with normal use. However, there was no performance gain for real world scripts and little difference even with artificial "for(;;);" type scripts. Performance for one ADD vs many SUBs is subtle. The ADD would be on the first MAYBE_STOP while the SUBs are every MAYBE_STOP after the first. No one could notice the difference either way of course. I stored the time instead of a deadline only because it seemed to mesh better with the way the callback count was handled.
Comment on attachment 141103 [details] [diff] [review] Remove if (!ctx) check I think it'd be great to get this in for the alpha, given that this could in theory cause this dialogue to show up in annoying situations where it shouldn't. The more testing we get the better.
Attachment #141103 - Flags: superreview+
Attachment #141103 - Flags: review+
Attachment #141103 - Flags: approval1.7a?
Comment on attachment 141103 [details] [diff] [review] Remove if (!ctx) check a=chofmann for 1.7a
Attachment #141103 - Flags: approval1.7a? → approval1.7a+
Comment on attachment 140284 [details] [diff] [review] Patch w/ pref patch is obsolete
Attachment #140284 - Flags: superreview?(brendan)
Fix checked in, closing bug. Many thanks to Nick for pulling this together!
Status: NEW → RESOLVED
Closed: 25 years ago21 years ago
Resolution: --- → FIXED
*** Bug 178509 has been marked as a duplicate of this bug. ***
Please note that the fix of this bug causes a hardcoded warning to pop up quite often. This warning should be made localizable ASAP! See http://bugzilla.mozilla.org/show_bug.cgi?id=78089#c17 (and c18)
(In reply to comment #146) > Please note that the fix of this bug causes a hardcoded warning to pop up quite > often. This warning should be made localizable ASAP! > See http://bugzilla.mozilla.org/show_bug.cgi?id=78089#c17 (and c18) While I agree that this should be localized, I can't say I agree with your statement about this popping up quite often, as I've never seen this dialog myself, not once (except when triggering it intentionally). Got any webpages to point to that bring up this dialog?
OK, maybe I put it wrong: The probability of the warning popping up is now much higher. I personally only saw it on the JS performance test at http://www.world-direct.com/mozilla/dhtml/funo/jsTimeTest.htm so far. But with Firefox 0.9 out now and many people going to use it, also more people will run across pages that will trigger the event. And I dare to say that the average e.g. German or French computer user does *not* know what the message means.
Just did the JS performance test for - Avant Browser 9.02 build 031 (Average time: 5587ms) - Opera 7.51 build 3798 (Average time: 27172ms) - FireFox 0.7+ 20040608 I waited 5 minutes for something to happen, but nothing did. Not a single score and no popup message. These tests were done under Windows 2000 SP4 on a AMD K6-2 300 MHz with 124 MB RAM.
Verified Fixed (checked testcase) This bug is already quite long. Localization issues in bug http://bugzilla.mozilla.org/show_bug.cgi?id=78089#c17
Status: RESOLVED → VERIFIED
Depends on: 371469
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: