Closed Bug 942049 Opened 11 years ago Closed 1 year ago

need to do a better job of interrupting long-running reflow passes

Categories

(Core :: Layout, defect, P4)

defect

Tracking

()

RESOLVED INACTIVE

People

(Reporter: jtd, Unassigned)

References

()

Details

(Keywords: hang, perf)

Assuming I'm interpreting the telemetry data correctly, I think we need to work on doing a better job of interrupting long-running reflow passes. I pulled down recent histogram data for HTML_FOREGROUND_REFLOW_MS, which is logged at the end of PresShell::ProcessReflowCommands: http://mxr.mozilla.org/mozilla-central/source/layout/base/nsPresShell.cpp#8163 Telemetry dashboard view: http://telemetry.mozilla.org/#path=release/24/HTML_FOREGROUND_REFLOW_MS Summing up the histogram buckets, here are "long-tail" counts which represent calls to ProcessReflowCommands calls that took more than 3000ms: firefox 22 histogram of reflow times (ms) [3000 ==> inf] 33790 firefox 23 histogram of reflow times (ms) [3000 ==> inf] 413884 firefox 24 histogram of reflow times (ms) [3000 ==> inf] 29466533 firefox 25 histogram of reflow times (ms) [3000 ==> inf] 6235310 Note the Firefox 24 value, that's 29 *million* times reflow look >3secs!! Not sure there's a simple solution here, but I think it would make sense to try and dig in and understand what causes these long-running reflows a bit better.
(In reply to John Daggett (:jtd) from comment #0) > firefox 22 histogram of reflow times (ms) [3000 ==> inf] 33790 > firefox 23 histogram of reflow times (ms) [3000 ==> inf] 413884 > firefox 24 histogram of reflow times (ms) [3000 ==> inf] 29466533 > firefox 25 histogram of reflow times (ms) [3000 ==> inf] 6235310 I'm assuming this distribution is the result of the number of users on each release in the time interval examined rather than a regression. If that doesn't make sense given the time interval, could you make it clear that there's a regression we need to look into?
One other note: it would be good to differentiate interruptible reflows that took that long before interrupting and non-interruptible ones (i.e. layout flushes from script). That would tell us whether we need to make interruption work better or whether we just need to make reflow faster....
Flags: needinfo?(bzbarsky)
Priority: -- → P4
(In reply to David Baron [:dbaron] (needinfo? me) (UTC-8) from comment #1) > (In reply to John Daggett (:jtd) from comment #0) > > firefox 22 histogram of reflow times (ms) [3000 ==> inf] 33790 > > firefox 23 histogram of reflow times (ms) [3000 ==> inf] 413884 > > firefox 24 histogram of reflow times (ms) [3000 ==> inf] 29466533 > > firefox 25 histogram of reflow times (ms) [3000 ==> inf] 6235310 > > I'm assuming this distribution is the result of the number of users on each > release in the time interval examined rather than a regression. If that > doesn't make sense given the time interval, could you make it clear that > there's a regression we need to look into? Right, this is simply a sample of the past couple months of telemetry data, so the distribution across releases simply reflects which releases are in use currently. The total sum would have been more appropriate -- content reflow has taken over 3secs more than 35 million times over the past couple months among users of telemetry.
QA Whiteboard: qa-not-actionable

In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.

Severity: critical → --

Closing as inactive, since this sat for 10 years after being filed (and the telemetry data that inspired it is now long-obsolete).

If we want to take action here, we should look at current data and start fresh with a new bug.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.