Investigate ways to prevent scripts from bogging down the entire browser.

NEW
Assigned to

Status

()

3 years ago
6 months ago

People

(Reporter: mrrrgn, Assigned: mrrrgn)

Tracking

(Depends on: 1 bug, Blocks: 4 bugs)

Firefox Tracking Flags

(Not tracked)

Details

(Assignee)

Description

3 years ago
A single script hogging all CPU usage grinds the entire browser to a halt. We should investigate ways of preventing this. One idea that comes to mind would be capping the CPU usage of scripts running in any tab not currently in focus to something like 40-50%.

Comment 1

3 years ago
As long as that does not affect the scripts running in workers attached to an in-focus tab, I'm fine with that :)

That said, given Firefox's run-to-completion model, won't a 50% cap just hang the browser for twice as long?  When will the scripts in the in-focus tab get any time on the scheduler if a script in another tab is running?  It's not like a runaway script uses more than one core, and most of us have lots of those.

Also, any tie-ins to e10s?
The current mechanism to interrupt running script is aweful because involves stacking the event loop, as we cannot discard JS+DOM+C++ frames from the stack.
What Morgan said about throttling CPU is just a random example of the sort of idea we might end up pursuing here. Another possibility is taking the Firefox for Android "zombify" code, which silently (!) kills off background tabs when there's memory pressure, and using that on desktop too. We could add code to track other resources (CPU, video hardware resources, whatever) to try and detect when background tabs are interfering with the foreground user experience.

Anyway, investigation comes first. The bottom line is, the current user experience stinks, at least for me on Linux. I think E10S has made it worse, contrary to what I expected, and we can do better.
(Assignee)

Comment 4

3 years ago
(In reply to Lars T Hansen [:lth] from comment #1)
> As long as that does not affect the scripts running in workers attached to
> an in-focus tab, I'm fine with that :)
> 
> That said, given Firefox's run-to-completion model, won't a 50% cap just
> hang the browser for twice as long?  When will the scripts in the in-focus
> tab get any time on the scheduler if a script in another tab is running? 
> It's not like a runaway script uses more than one core, and most of us have
> lots of those.
> 
> Also, any tie-ins to e10s?

The run to completion model would throw a wrench in that idea most likely. I'll try studying up on our current mechanism (that NBP mentioned). Jason's proposal seems like the most promising currently.
(Assignee)

Updated

3 years ago
Assignee: nobody → winter2718
(Assignee)

Updated

3 years ago
Depends on: 1270729
(Assignee)

Updated

3 years ago
Blocks: 1270732
No longer depends on: 1270729
(Assignee)

Updated

3 years ago
Blocks: 1270729
(Assignee)

Updated

3 years ago
No longer blocks: 1270729
(In reply to Nicolas B. Pierron [:nbp] from comment #2)
> The current mechanism to interrupt running script is aweful because involves
> stacking the event loop, as we cannot discard JS+DOM+C++ frames from the
> stack.

Debugger has the same pain. How awful is it really? We only check for interrupts "between bytecodes" (i.e. at safe points where the whole GC heap must be in a valid state).

The one definite problem I know of is that you've used up arbitrary amounts of the C stack. There might not be much left. Is that what you had in mind? (The fix would be to allocate a whole new C stack when this happens. Each window that's paused, by the debugger or the slow-script dialog, would have a suspended C stack floating around in memory -- a continuation -- which we'll return to when it's unpaused. Worth doing?)

(ni?nbp just to keep the conversation going - no obligation :-)
Flags: needinfo?(nicolas.b.pierron)
(Assignee)

Updated

3 years ago
No longer blocks: 1270732
Depends on: 1270729, 1270732
(In reply to Jason Orendorff [:jorendorff] from comment #5)
> The one definite problem I know of is that you've used up arbitrary amounts
> of the C stack. There might not be much left. Is that what you had in mind?

Yes, that's part of what I had in mind, and also the fact that the interrupt check does not prevent the rest of the event of the page to be scheduled, which can trigger JS code again.  Thus prevent us to even respect the run-to-completion semantic.

I do not recall the bug number, but there is a long standing issue about making a per-origin event loop, which would fix the run-to-completion issue by not scheduling events from the same origin, while we are in a nested event loop.

> (The fix would be to allocate a whole new C stack when this happens. Each
> window that's paused, by the debugger or the slow-script dialog, would have
> a suspended C stack floating around in memory -- a continuation -- which
> we'll return to when it's unpaused. Worth doing?)

Getting rid of the nested event loops and moving towards a yield & resume model will unfortunately break RAII, whatever RAII could meant when used with nested event loops.  The problem here also involves the caller.  I don't think Gecko is ready at the moment to move toward a yield & resume model (would MozPromise help us one that?).  On the other hand, this should not prevent us from doing it in SpiderMonkey.

Moving to a per-compartment JS-only stack would be a plus from my point of view.  This implies that our state would move into the heap, and that we can make our C-stack stateless.

This would:
 - Increase our recursion limit, as we are no longer sharing the JS stack with the C stack.
 - Remove the stacking behaviour of EnterJit / Interpreter / EnterJit.
 - Reduce the stack frame depth of calls and nested event loops. (only enter-compartment & leave-compartment frames & DOM calls)
 - Reduce effectiveness of stack overflow attack to chrome / chrome JS code. (smaller frame sizes)

Also, a stateless C-stack implies that we are *less* dependent on the thread on which we are running.  Still, interleaving JavaScript execution implies that we already have one event loop per-origin.
Flags: needinfo?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #6)
> I do not recall the bug number, but there is a long standing issue about
> making a per-origin event loop

Bug 715376.
(Assignee)

Updated

3 years ago
Depends on: 1272644
(Assignee)

Updated

3 years ago
Blocks: 1272644
No longer depends on: 1272644
(Assignee)

Updated

3 years ago
Blocks: 685828
(Assignee)

Updated

3 years ago
Blocks: 380223
Blocks: 1274399

Updated

2 years ago
Blocks: 1225900
You need to log in before you can comment on or make changes to this bug.