Open Bug 715378 Opened 12 years ago Updated 2 years ago

Do cost accounting of setTimeouts to punish cpu hog background tabs

Categories

(Core :: DOM: Core & HTML, defect, P5)

x86
Windows 7
defect

Tracking

()

People

(Reporter: taras.mozilla, Unassigned)

References

(Depends on 1 open bug)

Details

(Whiteboard: [snappy:p1])

Attachments

(1 file, 1 obsolete file)

Currently a bad webpage can suck up cpu time by busy-waiting via poorly chosen setTimeouts. This should make the browser more responsive when one of many tabs is misbehaving via setTimeout.

Something similar should be done for XMLHttprequest polling and websockets.
Depends on: 715380
Taras, please cc me on any XHR timeout bugs you create.  I'm implementing XHR2's timeout spec for Mozilla (bug 525816).
Depends on: 674779
Jan says he can look into this once he's done with some other work that he's in the middle of right now.
Assignee: nobody → Jan.Varga
Whiteboard: [snappy] → [snappy:p1]
IMHO another valuable feature for more advanced users would be to allow those users to see the real-time CPU performance hit per tab so that they could decide if they simply wanted to kill performance hogging tabs.
Sometimes my browser gets laggy after a few hours. When it does that I see a lot of timeouts firing during lag periods. So this is a real problem.
Taras, do you know whether those timeouts are actually DOM timeouts? The reason I asked is that I recently saw Firefox chewing up 100% cpu locally, and after digging in a bit it turned out to be an animated gif that went nuts and started animating constantly. Unfortunately that happened in an optimized build based on source that had since been updated, so debugging further was hard, and it didn't reproduce so that's all I know as of now. That too was based on timers, but not DOM timeouts. The work we'll do here won't catch those kinds of problems.
I should note that I only have correlation here, not causation. However I see a lot of lag correlating with DOM_TIMERS_FIRED_PER_NATIVE_TIMEOUT and DOM_TIMERS_RECENTLY_SET activity.
We throttle timers in background tabs, so I think there should not be a problem unless something is wrong with that throttling. Doing a simple test with

  function annoy() {
    setTimeout(annoy, 0);
    console.log(Math.pow(Math.random(), Math.log(Date.now()*127.2)));
  }
  setTimeout(annoy, 0);

shows 30% CPU as a foreground tab and ~0% CPU in a background tab.
We do, but we do not take time to execute handlers into consideration. Pages like zimbra/etherpad spend hundreds of milliseconds per period doing expensive operations in background janking foreground tabs. We do not supress background tabs while doing things like scrolling, etc either.
Here's a patch that penalizes windows that have background timers that are long-running enough to jank us. When such a timer occurs, we make background timers even slower for it, so such background tabs have less ability to hurt responsiveness.

Thoughts?
Attachment #647622 - Flags: feedback?(jst)
Comment on attachment 647622 [details] [diff] [review]
penalize windows with janky timeouts

Seems like we should make the penalty proportional to the timeout cost. 

For example we could allow 10ms/second of jank. Then we can make the minimum timeout be max(1000,timeToRunTimeout*100).
Sounds good to me.

Should we do this proportional to the *worst* single timeout cost (as in the patch), or the *sum* of timeouts over some period? The worst would focus on hanging the main thread, the sum more on CPU usage. Both seem important.
responsiveness is more important than CPU usage (even if both are important)
(In reply to Alon Zakai (:azakai) from comment #11)
> Should we do this proportional to the *worst* single timeout cost (as in the
> patch), or the *sum* of timeouts over some period? The worst would focus on
> hanging the main thread, the sum more on CPU usage. Both seem important.

I would go with worst single timeout. What happens once an inactive tab becomes active then inactive again. Would that reset the process?
Ok, looks like consensus on the worst case/responsiveness approach.

When inactive becomes active we fire timeouts (if necessary) and then they continue to fire normally until the tab becomes inactive again. When it becomes inactive, we penalize it based on recent jank, so if enough time was spent in active mode we may have forgotten about prior bad behavior and things are reset in that sense. But we would quickly learn about its bad behavior again once it repeats the offense, so this sounds ok to me.
Attached patch v2Splinter Review
Improved patch. Penalizes timeouts in background tabs by the amount of jank generated in the tab, using a weighted average that focuses on the worst offenders in order to maximize responsiveness.

The one concern I have with this approach is that we do not "speed up" timers when we become active. So if we decided to penalize a timeout by a lot, it will still wait for a while after returning to the tab. I added a maximum penalty because of that. The problem exists even with the current 1 second penalty I suppose, but it seems more dangerous when we go to bigger timeouts. ("Speeding up" timers when we return might help, but that has risks as well.)
Attachment #647622 - Attachment is obsolete: true
Attachment #647622 - Flags: feedback?(jst)
Attachment #647797 - Flags: feedback?(jst)
Comment on attachment 647797 [details] [diff] [review]
v2

Clearing feedback request until we figure out where we want to go here...
Attachment #647797 - Flags: feedback?(jst)
I don't know if this can be useful, but with bug 674779, we can now track down the jank caused by individual compartments.
Priority: -- → P5
Assignee: jvarga → nobody
Component: DOM → DOM: Core & HTML
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.