Closed Bug 13350 Opened 20 years ago Closed 16 years ago

DOM needs to police JS infinite loops, schedule garbage collection

Categories

(Core :: DOM: Core & HTML, defect, P3)

defect

Tracking

()

VERIFIED FIXED
mozilla1.7alpha

People

(Reporter: brendan, Assigned: nallen)

References

(Blocks 2 open bugs, )

Details

(Keywords: arch, hang, testcase, Whiteboard: [lm][front-end: bug 78089])

Attachments

(6 files, 6 obsolete files)

The DOM code should install a branch callback that does something to let the
user hit Stop if a script does "too many" backward branches; it shoud also run
the JS GC now and then (every 0 mod 4096 backward branches, the MozillaClassic
code would call JS_MaybeGC).

I wouldn't hold anything called Beta for this, but it's a must-have for Mozilla
5.0 and derivatives.

/be
*** Bug 4042 has been marked as a duplicate of this bug. ***
Target Milestone: M14
Target Milestone: M14 → M12
Since we're defining JS_THREADSAFE, we need to use JS_BeginRequest and
JS_EndRequest around all outermost calls into the JS engine from native code, in
order to keep the GC from racing with any thread except possibly the one whose
branch callback is running the GC from a safe point (a backward branch).

This is higher priority than I thought given other threads using JS (see alecf's
post at news://news.mozilla.org/38037DB8.9FE79EAF%40netscape.com for more).

/be
Target Milestone: M12 → M13
Target Milestone: M13 → M14
Not gonna make M13.

/be
Is this gonna make M14?  I fear we might be running GC too frequently now, as
evidenced by
news://news.mozilla.org/slrn88u5b3.pa.jlnance%40bessie.lisanance.org .
Patching to cope with performance problem: the DOM calls JS_GC for every 20 
scripts or event handlers.  Moving the mouse over XUL UI and HTML elements 
causes lots of events.  Do the math!

/be
Pav, can you try the patch out?

/be
I checked in the performance fix (still need qfy or jprof data on it).  Vidur, 
can you take this bug back?  My sabbatical is about to start for real.

/be
Back to vidur.

/be
Assignee: brendan → vidur
Status: ASSIGNED → NEW
Brendan's patch didn't make it into M14 - my bad. :(

I've discussed this one with Johnny Stenback who plans to work on this in M15. 
Brendan, feel free to take this one back if you'd like after checking with 
Johnny on status.
Status: NEW → ASSIGNED
Target Milestone: M14 → M15
*** Bug 30830 has been marked as a duplicate of this bug. ***
I see that brendan's GC optimizing patch is checked in but there's still the
branch callback issue, trying for M16...
Target Milestone: M15 → M16
Keywords: nsbeta2
Blocks: 40901
No longer blocks: 40901
*** Bug 40901 has been marked as a duplicate of this bug. ***
It sounds like the GC scheduling issue has not been fully resolved. 
Brendan/vidur: is this *absolutely* essential for beta2? I'd suggest pushing 
post beta to cope with other (more pressing) issues.
Whiteboard: [need info]
I'm having a hard time making this call. I don't think it's a nsbeta2 
show-stopper. On the other hand, I think the handling of the JS infinite loop 
case could be messy and I'd hate to see it pushed off yet again. Johnny has been 
thinking about this problem though and might be able to come up with something 
shortly. So how about nsbeta2 with a 6/15 deadline?
Setting to [nsbeta2+][6/15].
Whiteboard: [need info] → [nsbeta2+][6/15]
Taking this off Vidurs list.
Assignee: vidur → jst
Status: ASSIGNED → NEW
I have what I believe will fix this bug, I implemented a JS branch callback
that goes the GC:ing and opens a dialog that lets the user terminate looping
scripts if the script takes too long to finish.
I'm not quite ready to check it in tho, need to do more testing and possibly
some tuning of some parameters, could we move this to 6/22? (I'll attach the
patch)
Status: NEW → ASSIGNED
Target Milestone: M16 → M17
I thought the general consensus was that we shouldn't use the dialog route, but
rather let the user signal us through the stop button (or ESC keyboard analogue)
that they wanted to terminate script execution.  That's what we did in 4.x,
after using the dialog in 3.x.
Yes, we did discuss the stop button option but from emailing with brendan and
danm it quickly became apparent that doing what we did in 4.x is *far* more
complicated than opening the dialog. And, because of the current thread model
in mozilla there's simply *no* way I would get the stop button route to work for
nsbeta2, to do that we'd haveto pull all kinds of nasty black magic eventloop
tricks, and I don't have neither the knowledge nor time to do that XP for beta2
(it would according to danm not be pretty at all).

The beauty of using a modal dialog is that all the event loop fun already exists
and works XP in mozilla.

Also, we'd need to make this work not only in mozilla but also when Gecko is
embedded in other apps, in which case there might not even be a stop button.
We will, at some point, need to change how the embedded app is notified about a
looping script, currently I use nsIPrompt but we'd probably need a different
interface for this purpose only, but that's a later problem...
*** Bug 41586 has been marked as a duplicate of this bug. ***
Cleaning up status whiteboard by marking beta2 minus (6/15 has passed)

Sadly there is too much broad doomage with other beta2 bugs to get this into 
beta2, but clearly it should be in beta3.

When you are confident of a patch, nominate for beta3 and be sure to note fix in 
hand.
Whiteboard: [nsbeta2+][6/15] → [nsbeta2-]
I'm nominating this again because we need the tunable parameters for when to GC
and (more important) when to put up the "stop long-running script" dialog to get
wide exposure, and we need another beta to finalize these parameters. 

Jst's work restores the Nav2/Nav3 algorithm, modulo changes in the JS engine. 
To get Nav4 stop button behavior will require multithreading the layout engine,
or something almost as hard.  So we have the final plan for Mozilla 1.0 done (in
fact, jst made his deadline but was held up by a DOM WG trip, I heard).  What's
more, the risk of this change is extremely low, compared to the risk that we'll
add this for beta3, find the parameters need to be re-tuned, and do without
another beta to finalize them.

So I'm asking the PDT to give this a thumb's up, rather than make a valid fix
wait months.  This is the kind of patch I'd approve for checkin now, if it had
come from a non-netscape.com contributor.

/be
Whiteboard: [nsbeta2-]
Putting on [NEED INFO] radar. PDT needs to know impact to user and risk of fix 
to make a call on this bug. How much testing has been done since 6/15...and what 
are the risks of checking this in? 
Whiteboard: [NEED INFO]
As brendan says above the risk of this change is extremely low, the worst that
can happen as a result of this is that a mozilla opens a dialog saying there's
a script on the page that appears to be looping. For this dialog to open there
needs to be a script on the page that causes ~4 million branch callbacks, I've
been running for a while with some debugging code that shows the number of
branch callbacks done in every script execution, and I've never seen a script
that has caused more than 100 branch callbacks (except when I've intentionally
executed infinite loop scripts that result in the dialog popping up), there's
still a long way to 4 million.

I've had different versions of this fix in my tree for weeks, and I'm very
confident about this fix.

This fix is absolutely necessary for the final product and the risk in checking
this in now, is a lot smaller than the risk in checkin this in later in the
development cycle.
Putting on [nsbeta2+] radar for beta2 fix. Check it in.
Whiteboard: [NEED INFO] → [nsbeta2+]
I just checked in the fix for this, marking FIXED.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Windows 95, build ID 2000062420

Works (pops up a dialog asking to continue or abort the script) for:

  while (true) {}

but doesn't (freezes, need to ctrl-alt-del kill Moz) for:

  var i=0;
  while (true) { i++; }

I'll attach a testcase.

Reopening.
How long did you wait?  while(1); may take half as long to execute ~4M backward 
branches as while(1)i++; because the latter has at least one bytecode in its 
loop body.  Patience!  Or are you arguing for a lower bound on branches?  This 
is the kind of tuning feedback we need, but what's a more realistic infinitely 
looping program?  Both while(1); and var i=0;while(true)i++ are unlikely to be 
in real-world scripts, outside of denial of service attacks.

/be
Some crude speed measurements on a Pentium 233MHz, 64 MB:

while(true) {}     :   6 seconds
while(true) {i++;} : 600 seconds (yes, 10 minutes)

I guess there's no one size fits all lower bound for branch callbacks.

And you're right, you won't find while(true){} kind of scripts in pages which 
are nice, but "DOM needs to police JS infinite loops" is what this bug was 
about, the not so nice scripts, accidental (e.g. a missing return) or on 
purpose.

You shouldn't have to kill your browser just because you navigated to an 
ill-behaving page. Being able to stop executing the bad script (without having 
to wait 10 minutes before being able to ;) ) is more graceful.
As a comparison, with IE5 on the same machine:

while(true) {}     :   3 seconds
while(true) {i++;} :   6 seconds
Peter, try an optimized version of the JS shell for comparison (build it on 
Windows via cd js\src; nmake /f js.mak "CFG=jsshell - Win32 Release"; and run 
via Release\jsshell; on Linux, use Makefile.ref after setting BUILD_OPT in your 
environment).  I get times similar to the ones you report for IE, on my home 
machine (733MHz Pentium).

The JS engine has not been optimized for time much, so I don't expect that we 
beat IE, but we don't lose by as much as your test that took 10 minutes shows.  
That, I believe, must be due to some high overhead in the DOM code, involving 
GetWindowProperty, perhaps nsJSUtils::nsGetNativeThis (which need not be called 
for property gets such as the variable i in while(true){i++;} -- but which is 
currently called for all gets), or nsJSUtils::nsCallJSScriptObjectGetProperty(),
or both.

It would be good to run jprof or quantify and find the unduly high cost. Anyone?

/be
I should say that I hacked js.c to define a branch callback that stops scripts 
at 0x400000 backward branches or returns.  I get about 2 and 5 seconds, using my 
watch, on my 733MHz machine, for 'while(true);' and 'var i=0;while(true)i++;' 
respectively.

Attached next is my patch to js.c.

/be
Brendan, I was using one of the nightly builds on Windows, which are optimized 
builds as I understand things.

Just for "fun" I've tried these same tests on my Pentium 133 MHz 48 MB laptop 
running Linux, Moz build ID 2000062509:

while(true) {}     :     9 seconds
while(true) {i++;} : >3060 seconds (>51 minutes)

If you feel Mozilla, after optimisation, will adequately be policing all JS 
inifinite loops, then I guess we can close this bug report.

I think however this is a problem inherent to the current solution. After 
0x400000 branch callbacks, the amount of time passed depends on the time the 
instructions need, which depends on the speed of the machine they're executed 
on. This can of course be reduced by speeding up the JS engine, but there will 
always be instructions which take a long time to execute, which to a user will 
look like a frozen Mozilla.

I also highly doubt the magic number (0x400000) can be adjusted adequately to 
cover all possible cases reasonably.

I wish I could offer a solution instead of sounding like I'm nagging all the 
time (hey, who still _uses_ 133MHz machines ;-) )...
disttsc: I didn't say anything about optimization solving the 10 minues (or 51 
minutes on a slower machine) problem.  That's likely a bug in the form of high 
overhead in dom/src/base/nsJSWindow.cpp:GetWindowProperty (specifically, in the 
static methods it calls, unnecessarily, for all property accesses).  Oh, and of 
course SetWindowProperty does likewise, and is used for 'var i=0;while(1)i++;' 
in the loop body (i++ does a Get and then a Set).

My point was that the js shell (optimized, of course -- no point in measuring 
only debug, which is built by default) does not suffer the slowdown introduced 
by adding i++ to the loop body.  That exculpates the JS engine, I believe, and 
points the finger at the DOM glue code, specifically [GS]etWindowProperty.

Your point that other infinite loops with slower bodies could be constructed is 
well-taken.  Perhaps the DOM branch callback could dead-reckon by looking at 
time using PR_Now, every 0x40000 or so branches.  If "too much" wall clock time 
has passed, put up the dialog.  jst, what do you think?

In any event, we need to profile mozilla to see why while(1)i++ is so badly slow 
compared to the js shell testcase.

/be
Brendan, my apologies, I misread you there...
s/speeding up the JS engine/reducing the high overhead in the DOM code/
I'm glad you saw the point of my argument despite it.

Unfortunately I don't have the time to do builds here, but if you could send me 
something so I can do comparison tests under Windows 95 and Linux, I'd be happy 
to take the time.

Shall we open another bug report for "Mozilla slower than JS shell testcase for 
while(1)i++;" and not clutter this one?
*** Bug 31730 has been marked as a duplicate of this bug. ***
A script in this document has been running for more
than 30 seconds. This may be the result of an error
in the document.

Do you want to continue running this script?

[/] Ask me again if the script doesn't finish soon

                              ( Stop ) ((Continue))
Whiteboard: [nsbeta2+] → [nsbeta2+][ETA 7/14]
Keywords: nsbeta3
Whiteboard: [nsbeta2+][ETA 7/14] → [nsbeta2-][ETA 7/14]
We will worry about this in beta 3...  Marking beta2- because we would not pull 
beta 2 off the wire for this.
*** Bug 47348 has been marked as a duplicate of this bug. ***
Marking nsbeta3+...
Status: REOPENED → ASSIGNED
Whiteboard: [nsbeta2-][ETA 7/14] → [nsbeta2-][ETA 7/14][nsbeta3+]
The fix that Johnny is suggesting here is to show the dialog when the time 
between branch callbacks exceeds a certain upper limit.  Currently, the dialog 
is shown when the number of branch callbacks exceeds a certain upper limit.  For 
certain loops, it takes an unacceptably long time to reach that upper limit and 
the dialog does not show up "early" enough.

Thoughts?
Nisheeth: if the idea is to check PR_Now or equivalent every 0x40000 or similar 
branch callbacks, and put up the dialog if too much wall-clock time has past, I 
am all for it.  Jst, how about a patch?

/be
I have a patch in one of my trees hidden on a server somewhere back home in
finland, I'll dig it up as soon as I get to this bug...
This bug has been marked "future" because the original netscape engineer working 
on this is over-burdened. If you feel this is an error, that you or another
known resource will be working on this bug,or if it blocks your work in some way 
-- please attach your concern to the bug for reconsideration, but do not clear 
the nsbeta3- nomination.
Whiteboard: [nsbeta2-][ETA 7/14][nsbeta3+] → [nsbeta2-][nsbeta3-][lm]
Target Milestone: M17 → Future
Code review on the latest patch:

- MAYBE_STOP_BRANCH_TIME_MASK is not (power-of-two - 1), nor is it the same as
the MAYBE_GC mask as its comment claims:

#define MAYBE_STOP_BRANCH_TIME_MASK  0x0000dfff // Same as the above for now...

Was it tuned empirically?  Could be good, just needs a better comment if not a
different value.

- mFirstBranchCallback is initialized with ' = 0' but that won't work on all
platforms.  Use LL_ZERO.

- mFirstBranchCallback is reset to PR_Now when an "infinite" loop is declared. 
But that might be many seconds before another script starts on the context, eh? 
Oh wait -- ScriptEvaluated sets mBranchCallbackCount = 0 which will cause the
first subsequent branch callback to resample PR_Now.  I don't see the need for

  ctx->mFirstBranchCallback = PR_Now();

in the "infinite" case -- in fact PR_Now was just called a few instructions
earlier, in the LL_SUB macro's second argument.  Either cache that in a local
variable, or just strike the above assignment from the "infinite" loop case.

Looking to r,a=brendan@mozilla.org a new patch -- let's get this in.
/be
Attached patch Updated patch. (obsolete) — Splinter Review
Depends on: 52579
Brandan, I just attached an updated patch that should clean things up alot and
fix the issues you mentioned with the last patch. The patch is set up to let the
user press 'Cancel' in the dialog to let the script continue for as long as it
took to get the dialog in the first case, that's why I'm resetting
ctx->mFirstBranchCallback more often than you might think is necessary.

The parameters are empirically tuned and I'm guessing they could use some more
tuning now that Vidur made accessing global properties much faster, but I didn't
have time to do any tuning yet. OTOH, even if the patch goes in as is, the only
downside to having the parameters as they are right now is that we call PR_Now()
every 4096th branch callback, and that's probably not expencive enough to worry
about, is it? (scripts that actually executes 4096 branches seem to be pretty rare)

Comments? (btw, this patch depends on bug 52579 being fixed)
CC:ing jband.
Keywords: dom0
Keywords: dom0
Keywords: nsbeta2, nsbeta3arch, helpwanted
Whiteboard: [nsbeta2-][nsbeta3-][lm] → [lm]
Blocks: 30942
This bug has a patch that is waiting for approval if i'm not mistaken. As there
are lots of bugs depending on this in the depency tree - let's get this patch in.
Keywords: patch
hwaara: don't worry about the bugs "blocked" by bug 30942, those aren't real 
depenencies.
Looks pretty good, but one nit:

+  // Filter out even more of the calls to this callback
+  if (ctx->mBranchCallbackCount & MAYBE_STOP_BRANCH_TIME_MASK)
+    return JS_TRUE;
+
+  PRTime duration; 
+  LL_SUB(duration, PR_Now(), ctx->mFirstBranchCallback);
+
   // Filter out most of the calls to this callback that make it this far
-  if (ctx->mBranchCallbackCount & MAYBE_STOP_BRANCH_COUNT_MASK)
+  if ((ctx->mBranchCallbackCount & MAYBE_STOP_BRANCH_COUNT_MASK) &&
+      LL_CMP(duration, <, MAX_SCRIPT_RUN_TIME)) {
+    // If we get here we have not yet executed enough branches nor spent too
+    // much time running this script to notify the user so we simply return
+    // here...
     return JS_TRUE;
+  }

It confused me that you did the LL_SUB, then checked the COUNT_MASK next.  How
about:

+  // Filter out even more of the calls to this callback
+  if (ctx->mBranchCallbackCount & MAYBE_STOP_BRANCH_TIME_MASK)
+    return JS_TRUE;
+  if (ctx->mBranchCallbackCount & MAYBE_STOP_BRANCH_COUNT_MASK)
+    return JS_TRUE;
+
+  PRTime duration; 
+  LL_SUB(duration, PR_Now(), ctx->mFirstBranchCallback);
+
   // Filter out most of the calls to this callback that make it this far
-  if (ctx->mBranchCallbackCount & MAYBE_STOP_BRANCH_COUNT_MASK)
+  if (LL_CMP(duration, <, MAX_SCRIPT_RUN_TIME)) {
+    // If we get here we have not yet executed enough branches nor spent too
+    // much time running this script to notify the user so we simply return
+    // here...
     return JS_TRUE;
+  }

instead?

/be
*** Bug 83854 has been marked as a duplicate of this bug. ***
*** Bug 86321 has been marked as a duplicate of this bug. ***
Seeing that the last proposed patch for this bug was posted more than 3 months
ago, and yet it has not been fixed yet, I think the severity for this bug should
be at least <i>major</i>, the reason for this being that the more people
actually start using Mozilla for doing real work, the more of them will likely
stumble on this. It's quite annoying when you have a few browser windows open,
and then the browser crashes because of a typo in the code.









You are correct. My bug was a Major/Critical level one, and dupped against this,
thus this has to be equivalent. Marking Major and adding hang keyword. I'm also
nominating for 0.9.3/1.0 since I don't think we should ship with a browser that
can be hung by a web author's possibly malicious javascript (nscatfood implied,
and at a pinch dataloss, but I won't add that one myself).
Severity: normal → major
This is *so* not nsCatFood, this really doesn't prevent you from using the
product axcept if you end up on a malicious page that exploits this problem.
FYI, it's possible (but more complicated) to hang IE, and probably every other
browser out there too (well, NS4.x is pretty good about not hanging in
situations like these).

The reason the attached patch hasn't been checked in is that I haven't had the
time to make sure that it doesn't kick in in cases where it shouldn't, and the
attached patch is not perfect, so the problem won't go away completely even if
we check in the fix.

If someone wants to fix this, be my guest, but I don't have time to spare on
this any time soon (and I have more important bugs to fix), targetting for
mozilla1.0, but don't hold your breath.
Keywords: nsCatFood
Target Milestone: Future → mozilla1.0
*** Bug 99889 has been marked as a duplicate of this bug. ***
*** Bug 100504 has been marked as a duplicate of this bug. ***
*** Bug 101187 has been marked as a duplicate of this bug. ***
*** Bug 114007 has been marked as a duplicate of this bug. ***
*** Bug 114227 has been marked as a duplicate of this bug. ***
Any news here? Too many dups at last time.

Visit this link to see how lite it is to hang mozilla:
http://www.cyberarmy.com/crash.shtml
I think the resason this "bug" should be fixed is because it is just not one
browser window that gets frozen, it is the whole Mozilla program. That means
that all other browsers will also freeze. Email will freeze. Any mail you were
writting gets lost. Webpages you were making will not be saved. Calendar and IRC
will not work either.

It does not matter if IE handles this better or Opera or whatever. It should be
a required feature by the 1.0 release that we detect and thwart any attempts,
premeditated or not, at hanging the browser.
Don't expect this to be fully fixed for mozilla1.0, if ever. There are just too
many possible DOS attacks in mozilla, and all other browsers as well, to expect
to get those fixed any time soon. It's just not worth the enormous amount of
work that it takes to plug all holes, but I'll leave this bug at mozilla1.0 for
now since we might be able to improve some things w/o spending too much time on
it...
Posted on bugtraq today by Pavel Titov:

<script>
for(i=0;i<100000000;i++) {
  document.write("<img src=http://fakehost.com/"+i+".gif>");
}
</script>

Browser reaction 
Mozilla 0.9.6 (build 2001112009) @ Win2K - freezes, ~98% CPU utilization
and big memory leak (bigger than 1Mb per second). 
I'm not surprised, at all.
*** Bug 120554 has been marked as a duplicate of this bug. ***
Blocks: 51139
jst, should your time-based policing go in for 0.9.9?

/be
*** Bug 127247 has been marked as a duplicate of this bug. ***
Should't memory bloat caused by code inside a loop also be policed? For instance:

var a = "TESTTESTTESTTESTTESTTESTTEST";
while (1)
{
  a += "TESTTESTTESTTESTTESTTESTTEST";
}

Will quickly bloat the memory to an unreasonable amount, and then Mozilla will
become as sluggish as a sloth. The chances of the dialog coming up in that case
is very small if Mozilla doesn't catch something like that before it does a lot
of damage. Chances are, someone wouldn't in their right mind loop thousands of
times increasing the length of a string each time and therefore this isn't a
general document memory issue imho.
Moving Netscape owned 0.9.9 and 1.0 bugs that don't have an nsbeta1, nsbeta1+,
topembed, topembed+, Mozilla0.9.9+ or Mozilla1.0+ keyword.  Please send any
questions or feedback about this to adt@netscape.com.  You can search for
"Moving bugs not scheduled for a project" to quickly delete this bugmail.
Target Milestone: mozilla1.0 → mozilla1.2
Not so fast, there's a patch in this bug.

/be
Keywords: mozilla1.0mozilla1.0+
Target Milestone: mozilla1.2 → mozilla1.0
*** Bug 129742 has been marked as a duplicate of this bug. ***
*** Bug 126444 has been marked as a duplicate of this bug. ***
*** Bug 131226 has been marked as a duplicate of this bug. ***
I tested with 2002.03.26 nightly build on Win2k.

Mozilla freezes on page:
http://www.cyberarmy.com/crash.shtml

but IE6 on WinXP does not freeze.

I think we must fix crash bugs (like this) before 1.0
*** Bug 136324 has been marked as a duplicate of this bug. ***
This testcase is from bug 136324. The policing code doesn't catch it because
the script is waiting for user input before the next step.
Arthur: the case where javascript is waiting for user input is bug 61098 or bug
59314.
Attachment #78308 - Attachment is obsolete: true
*** Bug 137702 has been marked as a duplicate of this bug. ***
Blocks: 77271
Whiteboard: [lm] → [lm][wgate]
Comment 41 suggests that we fix the dialog which comes up occasionally.
When we do catch a script running too long, we need to decide what to do, the 
options i'm thinking of are:
A. stop this script now (it's run too long)
B. stop this script now, and stop other scripts when they run too long
C. continue this script, but prompt me the next time a (or this) script runs 
too long
D. let the script run forever

imo a good system should allow embedders to decide which of these steps 
to take without showing the options to the user and without hacking the c++ 
source files...

bc suggests that if i'm forced to drop one of A or B i should drop A, his 
reasoning is that if a user is attacked by a DOS page then the user needs to 
be able to stop everything (B), whereas A won't help. [mpt suggested that I 
flip a coin to pick, so after mpt ducked the coin, I relied on bc]

The patch I attached doesn't handle B or D because I haven't figured out how 
to leave a note reminding mozilla that the user made a decission for the 
script.

[Part of this is that we keep running into serious architecture flaws in the 
prompt service. I'll try and get an improved flavor for 1.1 or 1.2 but that 
won't help us now]

So anyways... shaver suggested that I set private data somewhere to remember 
what the user said about the script.  It turns out that I can't set it on the 
script itself because the script isn't a jsobject.  I might be able to ask 
the js context for the js object that owns the script, or I might have to 
store my data on the context. I'm looking for advice from jst or rginda or 
anyone who can offer it.

Anyways a bit about this patch
1. it makes the errors localizable
2. it removes 'abort' from the message (mpt has a rant about abort somewhere)
3. it replaces OK/Cancel (which don't make sense as answers to a Yes/No 
question) with Continue/Stop (which also don't make sense, but which at least 
have some mapping -- this is a continuous debate between mpt and I)
4. it makes it easier for embedders to decide that error cases should stop 
scripts instead of continuing them forever [this point is negotiable, esp the 
wording]
5. it shows my thoughts about how to store the remember flag.

As for the order of Continue/Stop, that's another issue w/ the prompt service 
api. The code I'm proposing minus all of the commented out/noop changes is a 
straight exchange. If we really want the other order, either we should fix 
the prompt service to handle order for us, or we will have to change the 
return !ret; line at the end of the code.
Comment on attachment 82836 [details] [diff] [review]
http://bugzilla.mozilla.org/show_bug.cgi?id=78089

whoops my last few comments belonged in bug 78089
Attachment #82836 - Attachment description: stage 1 fix for the warning dialog → http://bugzilla.mozilla.org/show_bug.cgi?id=78089
Attachment #82836 - Attachment is obsolete: true
Whiteboard: [lm][wgate] → [lm][front-end: bug 78089]
I wrote a checkers program that may do a hell of a lot of iterations before
stopping, depending on the depth.  Random pop-ups warning the user that the the
game is hung really sucks.  This 'feature' is absolutely unnessary, and just
causes huge headaches to people who want to write complex scripts.

PLEASE DO NOT PUT THIS BUG IN!!!! IT SCREWS MY LEGITIMATE SCRIPTS!
There should not be a dialog for stopping scripts that run too long.  The
problem is not that scripts can run too long, but you can't close a window or
hit the back button while a script is running.  Adding a dialog doesn't really
fix the problem, and as Stephen points out, a dialog can be annoying at pages
that have slow scripts.

Opera has a better solution, which is to keep the UI active while a script runs.
 Let's fix this correctly before advertisers figure out they can force you to
look at an ad by looping for four seconds.
oh cut it out both of you.

the dialog already exists, and if you haven't hit it that means you probably 
don't need to worry about it. i'm just trying to improve the text for the 
dialog.

if anyone wants to rewrite gecko so that it's threadsafe and so that the ui can 
be drawn from multiple threads, then i think mozilla.org would gladly accept 
your work (in a few years, when you finish).
Grr. I do see the dialog box with my checkers script!  Its nasty and there is
nothing I can do it.  Why are so many programmes like lemmings running off a
cliff, repeating the mistakes of other ad infinitum.   The dialog box is just
plain stupid, all it does is get in the way of legitimate scripts.  
well with my fix you'll be able to make the dialog go away at the risk of 
shooting yourself in your foot. you should take my patch as an improvement.

the dialog also gets in the way of illegitimate scripts like the one in the url 
field. currently mozilla's architecture does not allow us to do what we need to 
do to avoid this dialog.
Responding to Timeless...
>the dialog already exists, and if you haven't hit it that means you probably 
>don't need to worry about it.

The dialog currently only appears for computationally intensive javascript that
doesn't access the DOM.  If I understand this bug (13350) correctly, fixing it
would make the dialog also appear when the page uses the DOM.

I don't mind if we use a dialog for now, but I'd like to know the bug number for
eventually getting rid of the dialog.  It might be bug 40848, "each docshell
should run on its own thread", or bug 30942, "browser should remain responsive
during infinite loops".
*shrug* as i said in comment 88 my patch really didn't belong here and really 
doesn't interact with this bug. i blame mpt for not knowing my bugs better than 
I do.
As Author pointed out, all you have to do is stick an alert in the box and you
get around the protection. IMO, having a dialog popup is potentially annoying,
and doesn't really deal with a DOS. And if you're agile enough you just hit
Ctrl-Q after clicking on the alert ;-) It might help with a poorly written
script. But so what, if I go to a site with such a thing, I'll go there once and
I won't be back.

I guess I'm in the camp that we need a good fix. If a window hangs, the user
should be able to close/stop it. Maybe that's posting an event that would set a
flag that could then be checked at the branch and things could exit/stop nicely
from there. At that point the user could be informed a script is running and do
they wish to terminate it. The "Alert" case would still be an issue.
*** Bug 138599 has been marked as a duplicate of this bug. ***
*** Bug 152180 has been marked as a duplicate of this bug. ***
*** Bug 150374 has been marked as a duplicate of this bug. ***
Does the bug I find in: http://146.164.2.21:4505/ALEPH/-/start/PERIOD
belong here? The browser loops forever with the message: unknown DOM
In konqueror and ie, it works all right.
No, that sounds like an unrelated evangelism bug, please file a separate bug on
that site.
*** Bug 158372 has been marked as a duplicate of this bug. ***
From the duplicate bug 158372. Notice the increment of |z| inside
the body of the loop:

<script>
z = 100;
for (i=0; i<z; i++)
{
  document.write(i);
  z++;
}
</script>


The script can be found at http://www.times.co.il/test.html

Actual Results:   Mozilla stops responding.
Expected Results: Mozilla should block the script.
Target Milestone: mozilla1.0 → mozilla1.1alpha
*** Bug 153108 has been marked as a duplicate of this bug. ***
*** Bug 157434 has been marked as a duplicate of this bug. ***
re comment 96 by dbradley (Alert()s would bypass protection), you could enforce
a time delay between alerts, perhaps one that grows (within bounds) if the code
continues to throw alerts with little or no delay.  This may have downsides as
well, but if you let it throw a number of alerts before starting to delay, it
might work.  Or you could add a "stop script" button to Alert()s after a script
had thrown N Alert()s.

On the other hand, worrying about that case may be not worthwhile; part of this
issue is about code that accidently chews the system.
*** Bug 176723 has been marked as a duplicate of this bug. ***
Target Milestone: mozilla1.1alpha → ---
BuildID 2002110204, i386 Linux 2.4

Found "in the wild":

http://members.cox.net/xoclipse/open.html

Ctrl-Q fails to quit the browser for this page.
(I had to remotely log in to the machine and kill -9 the
process... you have been warned.)
Did i missed something?
I run Attachment 10630 [details] and i Mozilla asked me if i want to run this script (it
may hang Mozilla)... So is this bug about JS police? (WFM for me) or garbage
collector? (I was sure JS shell has it, but if no - we need any testcase for this)
Mass-reassigning bugs to dom_bugs@netscape.com
Assignee: jst → dom_bugs
Status: ASSIGNED → NEW
Blocks: 203448
Keywords: testcase
Priority: P3 → --
*** Bug 211950 has been marked as a duplicate of this bug. ***
*** Bug 211951 has been marked as a duplicate of this bug. ***
*** Bug 220892 has been marked as a duplicate of this bug. ***
*** Bug 226622 has been marked as a duplicate of this bug. ***
*** Bug 228759 has been marked as a duplicate of this bug. ***
-> me

I'm going to clean up jst's old patch which makes the infinite loop checking be
time based.  I think we can close this bug after getting that in.  There are
already bugs out there for changing the dialog text, making the dialog
unnecessary, and solving other js resource hogging problems.
Assignee: general → nallen
Severity: major → normal
Priority: -- → P3
Target Milestone: --- → mozilla1.7alpha
I've done some benchmarking to get a feeling for callback frequency on various
computers.  I looked at three different kinds of loads:

"heavy"- array sorting, expensive math operations that produce very few callbacks

"normal"- arithmetic, string operations that the current script stopping is well
tuned for

"light"- tight loops that produce many callbacks

The last patch had a time limit of 5 seconds and a callback limit of 0x400000. 
For normal loads, these two values are equal for a ~1.6 GHz processor.

Heavy and light loads are more of a problem.

On a 150 MHz machine, the heavy load generates 0x8000 callbacks per second. 
This would be a roughly 11 minute wait for the script dialog with the current
system.

On a 3.2 GHz machine, the light load generates 0xA00000 callbacks per second. 
The script will be blocked after about a third of a second.

There is also considerable variability in callback rates across processors
models that otherwise have comparable performance (Athlons having rather more
for certain heavy loads, Celerons having rather less for many light and heavy
loads).

The conclusions I came to were:

Cutting scripts off based on time rather than callback counts would be a
noticeable improvement.

The 5 second limit in the original patch is reasonable based on the historical
callback cutoff.

The interval between clock checks can safely be raised from every 0xFFF
callbacks to every 0x7FFF callbacks with little change in responsiveness.  Most
users (500 MHz+, moderate system load) would have nearly all scripts cut off at
most 10% (~0.5s) over the limit time.  Nearly all uers would have nearly all
scripts cut off at most 20% (~1.0s) over the limit time.
Attached patch Police JS scripts by run time (obsolete) — Splinter Review
Low impact patch to interrupt running JS scripts based on wall clock time
rather than callback count.
Attachment #12932 - Attachment is obsolete: true
Attachment #14683 - Attachment is obsolete: true
Attachment #140212 - Flags: superreview?(peterv)
Attachment #140212 - Flags: review?(jst)
Comment on attachment 140212 [details] [diff] [review]
Police JS scripts by run time

>@@ -419,6 +427,12 @@
>   if (!ctx)
>     return JS_TRUE;

Question for jst and peterv: is this necessary?  Which JSContexts have null
private data but a non-null branchCallback?  I say none should, so this null
test is not necessary, and we can save a (predicted-not-taken, but still)
branch here.

>+  // If this is the first time we've made a callback, we set
>+  // ctx->mBranchCallbackTime to PR_Now().  This will happen again when the
>+  // user chooses not to interrupt a script.
>+  if (!ctx->mBranchCallbackCount)
>+    ctx->mBranchCallbackTime = PR_Now();

Why not wait, and do this below, where you call PR_Now again, making for only
one PR_Now call, and only after MAYBE_GC and MAYBE_STOP rollover?

>+
>+  PRTime duration;
>+  LL_SUB(duration, PR_Now(), ctx->mBranchCallbackTime);

Save this PR_Now in a local.

>+
>+  // Check the amount of time this script has been running
>+  if (LL_CMP(duration, <, MAX_SCRIPT_RUN_TIME))

Systems without long long will choke at compile time here.  You must declare a
PRTime maxtime; and LL_I2L(maxtime, MAX_SCRIPT_RUN_TIME); to set it.

>+    return JS_TRUE;
>+
>+  // Reset the branch callback count so that scripts can run this long again
>+  // if the user chooses not to interrupt the script
>+  ctx->mBranchCallbackCount = 0;

Set it to the saved PR_Now return value.

Thanks for patching this bug!

/be
Comment on attachment 140212 [details] [diff] [review]
Police JS scripts by run time

This looks good n' all, but while you're at it, wanna make the code read the
number of seconds before stopping from a pref in stead of hard coding it? I.e.
read "dom.max_script_run_time" or somesuch (and make sure you don't let numbers
wrap in the math when converting that into a number of usec)? IOW, read the
pref on startup and store the number of us in a static PR_Time variable.
Attachment #140212 - Flags: review?(jst) → review-
Re: my last comment: obviously you need to avoid subtracting the initially-zero
mBranchCallbackTime from PR_Now and declarin an iloop the first time.  But you
should be able to avoid the early if(zero)init test in the critical path, moving
it down to the LL_SUB case.  If mBranchCallbackTime hasn't been set yet there,
set it and wait another turn.

/be
jst: 

>IOW, read the
>pref on startup and store the number of us in a static PR_Time variable.

Did you mean program startup or startup for each script (I assume the former
because you said static)? The reason I'm asking is because it would probably be
useful if this value could be changed dynamically through about:config without
restarting the browser by people debugging their scripts (with the possible risk
of currently running scripts triggering the cutoff). I don't really know this
part of the code that well, so I'm wondering if that's a reasonable thing to ask
for.
I meant on app startup, or module startup, really. I'd put the code in
nsJSEnvironment::Init().

I'm not sure it's worth the extra code to make this pref changeable at runtime,
how often would that be neede, really? (especially given that the current code
isn't changeable at all, and few complain, about that part.)
I agree its not something that is worth holding up the patch if its not going to
be something that will be of high demand. I guess we should not complicate
things, keep it simple and just wait and see what the result is.
Attached patch Patch w/ pref (obsolete) — Splinter Review
Nits picked, pref added.

Saving PR_now for resetting later doesn't work.  By the time you close the
dialog it's likely already now + 5 secs so the script just gets blocked again
on the next MAYBE_STOP.

I'll leave the does ctx needs null checking question to jst.
Attachment #140212 - Attachment is obsolete: true
Attachment #140284 - Flags: superreview?(brendan)
Attachment #140284 - Flags: review?(jst)
Attachment #140212 - Flags: superreview?(peterv)
Comment on attachment 140284 [details] [diff] [review]
Patch w/ pref

Cool if you can't combine PR_Now calls, just get 'em off the critical path. 
Still, you can combine at least two PR_Now calls with a now local:

>+  // If this is the first time we've made it this far, we start timing how
>+  // long the script has run
>+  if (LL_IS_ZERO(ctx->mBranchCallbackTime)) {
>+    ctx->mBranchCallbackTime = PR_Now();
>+    return JS_TRUE;
>+  }
>+
>+  PRTime duration;
>+  LL_SUB(duration, PR_Now(), ctx->mBranchCallbackTime);
>+

Seems like a codesize win, although the optimizer should do it for you (but gcc
probably won't -- go on, gcc, surprise me ;-).

Note this #define and string constant:

> #define JS_OPTIONS_DOT_STR "javascript.options."
> 
> static const char js_options_dot_str[]   = JS_OPTIONS_DOT_STR;

You want a similar string constant (no need for the #define) for the new pref
you're adding here:

>@@ -1926,10 +1957,25 @@
>   if (manager) {
>     PRBool started = PR_FALSE;
>     rv = manager->StartupLiveConnect(sRuntime, started);
>   }
> #endif /* OJI */
>+
>+  // Initialize limit on script run time
>+  PRInt32 maxtime = 5;
>+  nsCOMPtr<nsIPrefBranch> prefs = do_GetService(NS_PREFSERVICE_CONTRACTID);
>+  if (prefs) {
>+    prefs->GetIntPref("dom.max_script_run_time", &maxtime); 

and pass that string constant here and earlier, in the pref observer jazz.

/be
While you're debating this patch, I was wondering whether it would be possible
to change the labels of the confirmation (probably by directly accessing the
prompt service) as "OK" to abort the script and "Cancel" to continue the script
are somewhat confusing... perhaps just swapping the optons around would make
more sense i.e. "Do you want to allow the script to continue execution?"
While we are removing dialog boxes for other errors, do you think that a modal
dialog box is the best choice? Could we stop the script, and show an icon in the
status bar that will let them start the script again if they click on it and hit
"OK"?
Let's make the buttons read
"Let script continue"
and
"Abort script"
erm. let's *not* use the word abort. there's a rant *somewhere* about it. thank you.
Re: comment 128: sorry, without multi-threading Gecko and separating JS
execution from the thread that services the window's event loop, we can't do
anything like what you propose.

Nor should we sweat it.  This is a rare, hard case, and as they teach in law
school, hard cases make bad law.  Let's not overengineer the solution that we
and others have used since I implemented it in Nav2.0 in 1995.  It's not as if
many users actually run into the branch limit.

/be
I won't be able to get back to this for the next few days due to travel.

An open question is whether we want an observer for this pref.  Is there really
enough of a demand to justify the extra code?  I think there isn't from what
I've seen.
nallen: good point, I misremembered the code (shoulda looked) and thought there
was some way you could piggyback on an existing observer.  It doesn't seem worth
the extra code in light of the "hard case => bad law" argument, unless there's
an unfoolish-consistency win here.

/be
For the record, there are other bugs (bug 145523, bug 78089) that deal with what
this dialog box says. Let's keep the comments about what this says out of this
bug which simply deals with when this dialogue appears, not with what it says.
Comment on attachment 140284 [details] [diff] [review]
Patch w/ pref

+  // Initialize limit on script run time
+  PRInt32 maxtime = 5;
+  nsCOMPtr<nsIPrefBranch> prefs = do_GetService(NS_PREFSERVICE_CONTRACTID);
+  if (prefs) {
+    prefs->GetIntPref("dom.max_script_run_time", &maxtime); 
+    if (maxtime < 1) {
+      maxtime = 1;
+    }

This'll possibly (depending on what GetIntPref() does when the pref doesn't
exist, better to not assume it does, and will continue to, not set the out
param on failure) cause maxtime to be 1 second if the dom.max_script_run_time
pref is not set. Make the GetIntPref() call set a temporary local, and only set
maxtime if the call succeeds, and then do the min check.

Index: modules/libpref/src/init/all.js
...
+pref("dom.max_script_run_time", 5);

And if you do the above, you don't eve need to add this (though I know we do
add prefs with defaults to all.js in general, even if I disagree with the need
to do that :-)
Attachment #140284 - Flags: review?(jst) → review-
Remaining nits picked.
Attachment #140284 - Attachment is obsolete: true
Attachment #140971 - Flags: superreview?(brendan)
Attachment #140971 - Flags: review?(jst)
Comment on attachment 140971 [details] [diff] [review]
Merge Now() calls, more defensive pref use

Looks good to me.  One more thought below, take it or leave it.

>   // Run the GC if we get this far.
>   JS_MaybeGC(cx);
> 
>-  // Filter out most of the calls to this callback that make it this far
>+  // Filter out even more of the calls to this callback
>   if (ctx->mBranchCallbackCount & MAYBE_STOP_BRANCH_COUNT_MASK)
>     return JS_TRUE;
> 
>+  PRTime now = PR_Now();
>+
>+  // If this is the first time we've made it this far, we start timing how
>+  // long the script has run
>+  if (LL_IS_ZERO(ctx->mBranchCallbackTime)) {
>+    ctx->mBranchCallbackTime = now;
>+    return JS_TRUE;
>+  }
>+
>+  PRTime duration;
>+  LL_SUB(duration, now, ctx->mBranchCallbackTime);

Avoid LL_SUB here by redefining mBranchCallbackTime to be a deadline, setting
it to now + sMaxScriptRuntime (using LL_ADD, natch).  One LL_ADD every deadline
update instead of many LL_SUBS.  Should be same code footprint.  I'm sure this
is not going to change things noticeably, but it seems like a slightly better
way to code it.

Thanks again for doing this (and I promise I'll get to the Array.prototype.sort
merge-sort patch before tomorrow!).

/be
Attachment #140971 - Flags: superreview?(brendan) → superreview+
Comment on attachment 140971 [details] [diff] [review]
Merge Now() calls, more defensive pref use

> JSBool JS_DLL_CALLBACK
> nsJSContext::DOMBranchCallback(JSContext *cx, JSScript *script)
> {
>   // Get the native context
>   nsJSContext *ctx = NS_STATIC_CAST(nsJSContext *, ::JS_GetContextPrivate(cx));

Someone asked if it's safe to remove the if (!ctx) check after this line, and I
apparently forgot to reply to that question, so I'm replying now :-) It should
indeed be safe to remove that, in no case should we get here on a context w/o
its private data being an nsJSContext, so checking is a waste of cycles.
AFAICT.

And feel free to take Brendan's suggestion to flip the logic around here to
store the deadline as a member.

r=jst
Comment on attachment 140971 [details] [diff] [review]
Merge Now() calls, more defensive pref use

 JSBool JS_DLL_CALLBACK
 nsJSContext::DOMBranchCallback(JSContext *cx, JSScript *script)
 {
   // Get the native context
   nsJSContext *ctx = NS_STATIC_CAST(nsJSContext *,
::JS_GetContextPrivate(cx));

Someone asked earlier if it was ok to remove the if (!ctx) check after this
call, and I apparently forgot to answer. AFAICT it is indeed ok to remove that
check, we should in no situations end up here with a JSContext whose private
data is not a valid nsJSContext. So I'd say remove it.

And feel free follow Brendan's suggestion and flip this logic around to make
the context store the deadline.

r=jst
Attachment #140971 - Flags: review?(jst) → review+
I tried removing the if (!ctx) check and saw no problems with normal use. 
However, there was no performance gain for real world scripts and little
difference even with artificial "for(;;);" type scripts.

Performance for one ADD vs many SUBs is subtle.  The ADD would be on the first
MAYBE_STOP while the SUBs are every MAYBE_STOP after the first.  No one could
notice the difference either way of course.  I stored the time instead of a
deadline only because it seemed to mesh better with the way the callback count
was handled.
Comment on attachment 141103 [details] [diff] [review]
Remove if (!ctx) check

I think it'd be great to get this in for the alpha, given that this could in
theory cause this dialogue to show up in annoying situations where it
shouldn't. The more testing we get the better.
Attachment #141103 - Flags: superreview+
Attachment #141103 - Flags: review+
Attachment #141103 - Flags: approval1.7a?
Comment on attachment 141103 [details] [diff] [review]
Remove if (!ctx) check

a=chofmann for 1.7a
Attachment #141103 - Flags: approval1.7a? → approval1.7a+
Comment on attachment 140284 [details] [diff] [review]
Patch w/ pref

patch is obsolete
Attachment #140284 - Flags: superreview?(brendan)
Fix checked in, closing bug. Many thanks to Nick for pulling this together!
Status: NEW → RESOLVED
Closed: 20 years ago16 years ago
Resolution: --- → FIXED
*** Bug 178509 has been marked as a duplicate of this bug. ***
Please note that the fix of this bug causes a hardcoded warning to pop up quite
often. This warning should be made localizable ASAP!
See http://bugzilla.mozilla.org/show_bug.cgi?id=78089#c17 (and c18)
(In reply to comment #146)
> Please note that the fix of this bug causes a hardcoded warning to pop up quite
> often. This warning should be made localizable ASAP!
> See http://bugzilla.mozilla.org/show_bug.cgi?id=78089#c17 (and c18)

While I agree that this should be localized, I can't say I agree with your
statement about this popping up quite often, as I've never seen this dialog
myself, not once (except when triggering it intentionally).

Got any webpages to point to that bring up this dialog?
OK, maybe I put it wrong: The probability of the warning popping up is now much
higher. I personally only saw it on the JS performance test at
http://www.world-direct.com/mozilla/dhtml/funo/jsTimeTest.htm so far.

But with Firefox 0.9 out now and many people going to use it, also more people
will run across pages that will trigger the event.

And I dare to say that the average e.g. German or French computer user does
*not* know what the message means.
Just did the JS performance test for 
- Avant Browser 9.02 build 031 (Average time: 5587ms)
- Opera 7.51 build 3798 (Average time: 27172ms)
- FireFox 0.7+ 20040608 I waited 5 minutes for something to happen, but nothing 
did. Not a single score and no popup message.

These tests were done under Windows 2000 SP4 on a AMD K6-2 300 MHz with 124 MB 
RAM.
Verified Fixed (checked testcase)

This bug is already quite long. Localization issues in bug
http://bugzilla.mozilla.org/show_bug.cgi?id=78089#c17
Status: RESOLVED → VERIFIED
Depends on: 371469
You need to log in before you can comment on or make changes to this bug.