grid devtools overlay causes extreme content-process jank (for`stringifyGridFragments` and resulting garbage collection), for testcase with many columns
Categories
(DevTools :: Inspector: Layout, defect, P3)
Tracking
(firefox111 wontfix, firefox112 verified, firefox113 verified)
People
(Reporter: hyperfekt, Assigned: bradwerth)
References
(Blocks 1 open bug)
Details
Attachments
(6 files)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/109.0
Steps to reproduce:
- Open the attached file.
- Open DevTools.
- Toggle the Grid Overlay.
Actual results:
Firefox becomes unresponsive, then all its windows turn black.
Expected results:
A grid overlay is displayed or the display of the grid overlay fails gracefully.
Reporter | ||
Comment 1•1 year ago
|
||
Reporter | ||
Comment 2•1 year ago
|
||
Comment 3•1 year ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Layout: Grid' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 4•1 year ago
|
||
I can definitely reproduce issues with your testcase -- thanks for the bug report! For me, Firefox as-a-whole doesn't become unresponsive or blacked out, and I didn't get to a point with CanvasTranslator detected a device reset
(maybe I just needed to wait longer). But the content process for the tab in question does start consuming quite a bit of CPU and does become periodically unresponsive for ~seconds, and that in and of itself is quite bad. I'm narrowing the bug title to focus on that; I assume the other symptoms that you described were downstream from that.
ANALYSIS:
tl;dr: It looks like most of that is associated with generating strings for all of the line names, and then garbage-collecting those strings; and doing that same work (or incremental portions of that work?) in a repeated requestAnimationFrame
callback loop.
Here's a profile of me activating the grid overlay at around t=3s, and then leaving Firefox untouched from that point on for ~30 seconds (during which time the content process is pegged with periods of unresponsiveness):
https://share.firefox.dev/3losS8B
Here's a snippet of that profile, zoomed to a particularly-bad 3.1-second period of unresponsiveness, for a requestAnimationFrame callback:
https://share.firefox.dev/3x8Kt79
In that snippet, 95% of our time is spent doing JS garbage collection! If I ignore the garbage collection samples, it looks like nearly all our time is spent in stringifyGridFragments
, and that sounds like something that could believably be generating tons of garbage (for this 10000-column grid testcase), depending on what we do with the generated strings...
I'm CC'ing three folks who worked on this grid-inspector code in the past (Gabe, Micah, bradwerth). I'm not sure if any of them have been working on devtools-related stuff recently, but they might be able to offer insights to other devtools folks who have cycles to take a look here.
Updated•1 year ago
|
Assignee | ||
Comment 5•1 year ago
|
||
It does seem pretty trivial to add an early exit when hasMoved
is true but I don't yet know if that would activate in in this case. I'll put a patch together that does this, and try it out. In the more general case, the stringify-comparison is pretty hardcore. It would be good to find another way to do it, or to segment the stringification to add more opportunities for early exits. I'll attempt to do that also.
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 6•1 year ago
|
||
Segmenting the stringify-comparison is insufficent, since the common case is that there is no change, which would require stringifying all segments anyway. So the real solution might be to make the calculation of hasMoved
into an async calculation that updates state when complete.
Assignee | ||
Comment 7•1 year ago
•
|
||
Truly the problem may just be that hasMoved
is called continuously. I'll try to fix that.
Comment 8•1 year ago
|
||
Thanks, Brad!
Assignee | ||
Comment 9•1 year ago
|
||
I don't want to try to re-think the update loop of AutoRefreshHighlighter
, so instead I propose we change hasGridFragments
to take an optional bool parameter aReflow = true
. This will only do the recompute if true, and could then be used in CssGridHighlighter._hasMoved
to determine if the already-confirmed grid element has dumped its grid info data structure -- which is an indicator that it has been reflowed. I guess then the challenge is that any call to isGrid
would force it to recompute that structure, so we'd have fragility in the call order. I'll find the right way to do this and make the change in the definition and implementation of Element
.
Assignee | ||
Comment 10•1 year ago
|
||
Assignee | ||
Comment 11•1 year ago
|
||
The timestamp will be updated whenever the grid is reflowed, which is
equivalent to the old check of comparing the contents of the grid
structure. This is much much faster.
Depends on D169724
Assignee | ||
Comment 12•1 year ago
|
||
I think there is more to do here. Specifically, devtools should have a better strategy for drawing too-complex a highlighter. The proposed patches will prevent the memory bloat of the original report, but it won't do anything about the long delay in seeing the updated highlighter. I'll file a follow-up Bug.
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Assignee | ||
Comment 13•1 year ago
|
||
Comment 14•1 year ago
|
||
Pushed by bwerth@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7f50487ea10c Part 1: Make Grid dom object an idempotent property of its frame. r=emilio https://hg.mozilla.org/integration/autoland/rev/dd7288d15da0 Part 2: Make grid highlighter hasMoved directly compare the grid fragment objects. r=devtools-reviewers,jdescottes https://hg.mozilla.org/integration/autoland/rev/0380de95a301 Part 3: Add a test of grid fragment equality across reflows. r=jdescottes
Comment 15•1 year ago
•
|
||
Backed out for causing wrapper related devtools failures. Please check every failure log.
Failure logs:
https://treeherder.mozilla.org/logviewer?job_id=406337021&repo=autoland
https://treeherder.mozilla.org/logviewer?job_id=406336384&repo=autoland
https://treeherder.mozilla.org/logviewer?job_id=406336333&repo=autoland
https://treeherder.mozilla.org/logviewer?job_id=406336668&repo=autoland
Backout link: https://hg.mozilla.org/integration/autoland/rev/142a64a49c12c6dba29591c3650345634cb724c2
Assignee | ||
Comment 16•1 year ago
|
||
I think I understand what is going on. The failure is that the Grid
object no longer has valid member variables, specifically it no longer has its GridDimension
objects. This happens as DevTools reloads, the cycle collector destructs the GridDimension
objects but not the Grid
object itself. It must be that adding the Grid
object as a property of the frame is enough to keep it from being destructed but not enough to keep it from being marked as destroyable by the cycle collector, thus freeing its members for destruction.
Possible solutions:
- There may be some
NS_IMPL_CYCLE_COLLECTING
-style macro that could inform the correct behavior here. I don't understand the cycle collector well enough to identify this. - Manually addref'ing the
Grid
object when it is set as a property could work, as long as we ensured that a frame released the object when the frame was destructed or any other time the property was cleared. I don't know how to manipulate the refcount directly, safely, but it's probably possible.
Emilio, do you have any advice on how to fix this?
Comment 17•1 year ago
|
||
What's happening is that the new getter might return an unlinked object, which is not good. The easiest fix is to not only clear the property on the grid dtor but also on the Grid
cycle collection unlinking.
Comment 18•1 year ago
|
||
Pushed by bwerth@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2e8e5b4fcf3b Part 1: Make Grid dom object an idempotent property of its frame. r=emilio https://hg.mozilla.org/integration/autoland/rev/813ab8067abb Part 2: Make grid highlighter hasMoved directly compare the grid fragment objects. r=devtools-reviewers,jdescottes https://hg.mozilla.org/integration/autoland/rev/c64b326f3b9a Part 3: Add a test of grid fragment equality across reflows. r=jdescottes
Comment 19•1 year ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/2e8e5b4fcf3b
https://hg.mozilla.org/mozilla-central/rev/813ab8067abb
https://hg.mozilla.org/mozilla-central/rev/c64b326f3b9a
Updated•1 year ago
|
Comment 20•1 year ago
•
|
||
Steps to reproduce:
- Load the attached file (https://bug1815140.bmoattachments.org/attachment.cgi?id=9316029)
- Open Dev Tools / Inspector
- Click the "grid" element inside the body.
- Open Task Manager / Processes and observe the Firefox CPU and Memory usage.
Actual: The CPU rises fast and the Memorry rises slowly to above 2000 MB; the browser usage might show signs of slowness or even freezes.
Expected: The resources are not spiking aggressively.
With these steps, I have reproduced the issue in Firefox Release v111.0.1 and verified the fix in Beta v112.0 (RC) and Nightly v113.0a1.
This appears to be windows specific as it did not reproduce on Ubuntu or MacOS.
Updated•1 year ago
|
Updated•1 year ago
|
Description
•