Closed Bug 1740702 Opened 3 years ago Closed 2 years ago

Render incorrectly when the developer tool is opened

Categories

(Core :: Layout: Grid, defect)

Firefox 96
defect

Tracking

()

RESOLVED FIXED
98 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox94 --- unaffected
firefox95 --- unaffected
firefox96 --- disabled
firefox97 --- disabled
firefox98 --- fixed

People

(Reporter: nayinain, Assigned: sefeng)

References

(Regression)

Details

(Keywords: regression)

Attachments

(8 files)

Attached file testcase.html

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:100.0) Gecko/20100101 Firefox/100.0

Steps to reproduce:

  1. Open the testcase.html.
  2. Open Inspctor (Ctrl + Shift + I).
  3. Refresh the page.
  4. Scroll up and down the page.

Actual results:

Some boxes are not displayed.

Expected results:

The boxes should all be displayed.


Sorry for my bad English.

Attached video Capture.mp4

Regression window:

2021-11-11T16:45:13.029000: INFO : application_buildid: 20211110164131
2021-11-11T16:45:13.029000: INFO : application_changeset: fd38093162bce3f221a1a1ea003e300ffb9f237b
2021-11-11T16:45:13.029000: INFO : application_display_name: Firefox Nightly
2021-11-11T16:45:13.029000: INFO : application_id: {ec8030f7-c20a-464f-9b0e-13a3a9e97384}
2021-11-11T16:45:13.029000: INFO : application_name: Firefox
2021-11-11T16:45:13.029000: INFO : application_remotingname: firefox
2021-11-11T16:45:13.029000: INFO : application_repository: https://hg.mozilla.org/integration/autoland
2021-11-11T16:45:13.029000: INFO : application_vendor: Mozilla
2021-11-11T16:45:13.029000: INFO : application_version: 96.0a1
2021-11-11T16:45:13.029000: INFO : platform_buildid: 20211110164131
2021-11-11T16:45:13.029000: INFO : platform_changeset: fd38093162bce3f221a1a1ea003e300ffb9f237b
2021-11-11T16:45:13.029000: INFO : platform_repository: https://hg.mozilla.org/integration/autoland
2021-11-11T16:45:13.029000: INFO : platform_version: 96.0a1
2021-11-11T16:45:30.160000: INFO : Narrowed integration regression window from [492b4330, 93310707] (3 builds) to [fd380931, 93310707] (2 builds) (~1 steps left)
2021-11-11T16:45:30.170000: DEBUG : Starting merge handling...
2021-11-11T16:45:30.170000: DEBUG : Using url: https://hg.mozilla.org/integration/autoland/json-pushes?changeset=93310707d7c5a3b85040c950bc19e322541f6fbc&full=1
2021-11-11T16:45:30.170000: DEBUG : redo: attempt 1/3
2021-11-11T16:45:30.170000: DEBUG : redo: retry: calling _default_get with args: ('https://hg.mozilla.org/integration/autoland/json-pushes?changeset=93310707d7c5a3b85040c950bc19e322541f6fbc&full=1',), kwargs: {}, attempt #1
2021-11-11T16:45:30.177000: DEBUG : urllib3.connectionpool: Resetting dropped connection: hg.mozilla.org
2021-11-11T16:45:31.367000: DEBUG : urllib3.connectionpool: https://hg.mozilla.org:443 "GET /integration/autoland/json-pushes?changeset=93310707d7c5a3b85040c950bc19e322541f6fbc&full=1 HTTP/1.1" 200 None
2021-11-11T16:45:31.387000: DEBUG : Found commit message:
Bug 1733978 - Don't overwrite the ISize of child frames when optimizing nested grid layout r=mats

The optimized code path for nested grid layout should only set the
BSize of the child frames, since this is what the optimization is
for.

Without the change, the ISize is always going to be set to 0 for
child frames which may break the layout. Sometimes, it is not
an issue because the ISize of the grid area gets updated, so
the cached measurement becomes invalid, and the ISize of the child
frame gets set to the correct one again.

Differential Revision: https://phabricator.services.mozilla.com/D130828

2021-11-11T16:45:31.388000: DEBUG : Did not find a branch, checking all integration branches
2021-11-11T16:45:31.390000: INFO : The bisection is done.
2021-11-11T16:45:31.401000: INFO : Stopped

Has Regression Range: --- → yes
Has STR: --- → yes
Keywords: regression
Regressed by: 1733978

Set release status flags based on info from the regressing bug 1733978

Thanks for filing this and finding the regression range!

Blocks: 1732082
Flags: needinfo?(sefeng)

Sean, any update on this?

Severity: -- → S3
Status: UNCONFIRMED → NEW
Ever confirmed: true

Sorry, I still need some time on this, I'll try to make some progress this week.

It's a Nightly only feature, so I am updating the tracking flag accordingly.

The innermost divs here (inside the label) -- the ones that actually draw the black border -- have percent height and width, and they are ending up 0-sized in this case. You can see this in devtools if you drill down to one such element, after reproducing the bug.

I'm betting these elements take a slightly different codepath in their initial reflow (or maybe get skipped altogether, somehow?) if their outer grid container is computing the bonus metadata that devtools consumes. See nsGridContainerFrame::GetGridFrameWithComputedInfo and the codepath that's behind ShouldGenerateComputedInfo.

Maybe we're doing some sort of "lean" traversal when doing the reflow to populate devtools, and then that inadvertently makes us think we don't need to do a more thorough reflow later on, via the caching mechanism in play here?

See Also: → 1746228
Attached file testcase 2 (reduced)

Here's a substantially-reduced testcase.

EXPECTED RESULTS:
At 500ms, "PASS" should appear, and the black-bordered square should change to green.

ACTUAL RESULTS:
The black-bordered square disappears.

If I load the attached testcase in Nighlty with devtools inspector open, then I get ACTUAL RESULTS.

I captured this bug in rr and submitted it to pernosco -- I used a modified version of testcase 3 (with a longer setTimeout, and with a logging line added to go(): dump("****dholbert MAKING THE TESTCASE'S DYNAMIC CHANGE\n");). I was running in a debug build with a bit of NS_WARNING logging (with the most-relevant logging being in PresShell::DoReflow, to see when we kick off reflows).

(In the session, I loaded the testcase once with devtools closed, and then I opened devtools, and then I reloaded and reproduced the bug. So the testcase loads twice in the trace, and the second one is the interesting one.)

tl;dr: we never actually kick off a reflow (we never call PresShell::DoReflow) after we make the dynamic modification. That's the high-level problem here.

Here's the pernosco trace, seeked to the PresShell::FrameNeedsReflow call right after the dynamic change (triggering invalidation for the frames that have been inserted via the display tweak):
https://pernos.co/debug/KWusV4nZmiStoGjRnAxvpQ/index.html#f{m[DoRu,AWuDHw_,t[tA,ASYz_,f{e[DoRu,AWuDEw_,s{af1dp1QAA,bDZ4,uD5g/xA,oD6fHhA___/

In this FrameNeedsReflow call, **we set wasDirty=true, because the frame already has the NS_FRAME_IS_DIRTY bit set!! It's bad that we set wasDirty=true, because that means we decline to add the frame's subtree to the dirty-root list here (i.e. since wasDirty is true, we decline to queue up this chunk of the frame tree for a reflow, because we assume it's already pending a reflow since we're already marked as dirty. But in fact it is not pending a reflow.)

So, the next question: were did our NS_FRAME_IS_DIRTY bit get set? It was set inside of our call to nsGridContainerFrame::GetGridFrameWithComputedInfo -- our presShell->FrameNeedsReflow call there doesn't only set the dirty bit on the grid frame itself, but also on its descendants. This is slightly-unnecessary but should still be fine.

[speculating from this point on -- I haven't yet verified this in pernosco, but I'm fairly confident]
The problem is when we actually perform that reflow, and we discover (unsuprisingly) that our grid item does have cached measurements, and we skip reflowing it because the cache matches our current circumstances, because our content/styling hasn't actually been modified since our last reflow. So we skip reflowing the grid item, and we inadvertently leave its descendants with NS_FRAME_IS_DIRTY flag, which is then a landmine for us to trip over later when we flag those descendants as legitimately-needing-a-reflow and we incorrectly think they are already queued up.

I think the issue here is that we're neglecting to check whether the grid item's subtree is dirty when we reflow it. When we decide whether the grid item's cached size is usable, we need to be calling IsSubtreeDirty() on the grid item's frame (like we do when deciding to use the analogous flexbox cache) -- and if IsSubtreeDirty() returns true, then we should reject the cached value. I don't see any such dirty check in our grid logic (grepping through nsGridContainerFrame.cpp for "dirty"), so I think that's what we're missing here.

Sean, could you try adding a check like that and see if it helps here?

In particular, I think we need to add && !aChild->IsSubtreeDirty() after found in this condition:

    if (found && cachedMeasurement.IsValidFor(aChild, aCBSize)) {

https://searchfox.org/mozilla-central/rev/311094ffb4500152544230378f6711d3598f1cb6/layout/generic/nsGridContainerFrame.cpp#5030

That seems to fix the issue for me (and makes our grid caching logic closer to the flexbox logic linked in my previous comment here).

The idea is: if the cache matches but the subtree is dirty, then we still need to go ahead and reflow to clear the dirty flag at a minimum; and to potentially update our measurements if the dirty flag is actually indicating that something interesting has happened (e.g. some content change) that invalidates our cache.

(I'm optimistic that this suggestion fixes bug 1746228 as well.)

Assignee: nobody → sefeng
Status: NEW → ASSIGNED

Here's a variant of testcase 3 that uses getComputedStyle to trigger the relevant frame-dirtying code (rather than devtools), which does indeed work to trigger the bug, as sefeng noted in phabricator.

Here's another testcase, which makes a stylistic tweak to trigger the right set of invalidations (in this case to padding-bottom, which is sufficient to get the frame's subtree marked as dirty (without being destructive enough to trigger MarkIntrinsicISizesDirty() - that's important because that would clear the cached values and work around the bug).

Pushed by sefeng@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cde61df8539e
Add some grid performance logging r=dholbert
https://hg.mozilla.org/integration/autoland/rev/a8e8c3da8e2c
Ensure grid's cached measurement won't prevent reflow from happending when its subtree is dirty r=dholbert
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/32352 for changes under testing/web-platform/tests
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch

Since the status are different for nightly and release, what's the status for beta?
For more information, please visit auto_nag documentation.

Upstream PR merged by moz-wptsync-bot
Flags: needinfo?(sefeng) → in-testsuite+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: