Closed Bug 1466224 Opened 3 years ago Closed 10 months ago

Assertion failure: GetApzc()->GetParent() == aParent, at /buildssrc/gfx/layers/apz/src/HitTestingTreeNode.cpp:406

Categories

(Core :: Layout: Scrolling and Overflow, defect, P3)

defect

Tracking

()

RESOLVED FIXED
86 Branch
Tracking Status
firefox-esr78 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox84 --- wontfix
firefox85 --- wontfix
firefox86 --- fixed

People

(Reporter: tsmith, Assigned: botond)

References

(Blocks 2 open bugs)

Details

(Keywords: assertion, testcase, Whiteboard: [gfx-noted])

Attachments

(3 files, 1 obsolete file)

Attached file testcase.html
Reduced with m-c:
BuildID=20180531100449
SourceStamp=763f30c3421233a45ef9e67a695c5c241a2c8a3a

Assertion failure: GetApzc()->GetParent() == aParent, at /buildssrc/gfx/layers/apz/src/HitTestingTreeNode.cpp:406

#0 mozilla::layers::HitTestingTreeNode::SetApzcParent(mozilla::layers::AsyncPanZoomController*) /buildssrc/gfx/layers/apz/src/HitTestingTreeNode.cpp:406:5
#1 mozilla::layers::HitTestingTreeNode* mozilla::layers::APZCTreeManager::PrepareNodeForLayer<mozilla::layers::LayerMetricsWrapper>(mozilla::RecursiveMutexAutoLock const&, mozilla::layers::LayerMetricsWrapper const&, mozilla::layers::FrameMetrics const&, mozilla::layers::LayersId, mozilla::layers::AncestorTransform const&, mozilla::layers::HitTestingTreeNode*, mozilla::layers::HitTestingTreeNode*, mozilla::layers::APZCTreeManager::TreeBuildingState&) /buildssrc/gfx/layers/apz/src/APZCTreeManager.cpp:1050:5
#2 void mozilla::layers::APZCTreeManager::UpdateHitTestingTreeImpl<mozilla::layers::LayerMetricsWrapper>(mozilla::layers::LayersId, mozilla::layers::LayerMetricsWrapper const&, bool, mozilla::layers::LayersId, unsigned int)::{lambda(mozilla::layers::LayerMetricsWrapper)#1}::operator()(mozilla::layers::LayerMetricsWrapper) const /buildssrc/gfx/layers/apz/src/APZCTreeManager.cpp:410:38
#3 _ZN7mozilla6layersL11ForEachNodeINS0_15ReverseIteratorENS0_19LayerMetricsWrapperEZNS0_15APZCTreeManager24UpdateHitTestingTreeImplIS3_EEvNS0_8LayersIdERKT_bS6_jEUlS3_E_ZNS5_IS3_EEvS6_S9_bS6_jEUlS3_E0_EENS_8EnableIfIXaasr6IsSameIDTclfp0_fp_EEvEE5valuesr6IsSameIDTclfp1_fp_EEvEE5valueEvE4TypeET0_RKT1_RKT2_ /buildssrc/gfx/layers/TreeTraversal.h:137:3
#4 _ZN7mozilla6layersL11ForEachNodeINS0_15ReverseIteratorENS0_19LayerMetricsWrapperEZNS0_15APZCTreeManager24UpdateHitTestingTreeImplIS3_EEvNS0_8LayersIdERKT_bS6_jEUlS3_E_ZNS5_IS3_EEvS6_S9_bS6_jEUlS3_E0_EENS_8EnableIfIXaasr6IsSameIDTclfp0_fp_EEvEE5valuesr6IsSameIDTclfp1_fp_EEvEE5valueEvE4TypeET0_RKT1_RKT2_ /buildssrc/gfx/layers/TreeTraversal.h:142:5
#5 _ZN7mozilla6layersL11ForEachNodeINS0_15ReverseIteratorENS0_19LayerMetricsWrapperEZNS0_15APZCTreeManager24UpdateHitTestingTreeImplIS3_EEvNS0_8LayersIdERKT_bS6_jEUlS3_E_ZNS5_IS3_EEvS6_S9_bS6_jEUlS3_E0_EENS_8EnableIfIXaasr6IsSameIDTclfp0_fp_EEvEE5valuesr6IsSameIDTclfp1_fp_EEvEE5valueEvE4TypeET0_RKT1_RKT2_ /buildssrc/gfx/layers/TreeTraversal.h:142:5
#6 _ZN7mozilla6layersL11ForEachNodeINS0_15ReverseIteratorENS0_19LayerMetricsWrapperEZNS0_15APZCTreeManager24UpdateHitTestingTreeImplIS3_EEvNS0_8LayersIdERKT_bS6_jEUlS3_E_ZNS5_IS3_EEvS6_S9_bS6_jEUlS3_E0_EENS_8EnableIfIXaasr6IsSameIDTclfp0_fp_EEvEE5valuesr6IsSameIDTclfp1_fp_EEvEE5valueEvE4TypeET0_RKT1_RKT2_ /buildssrc/gfx/layers/TreeTraversal.h:142:5
#7 _ZN7mozilla6layersL11ForEachNodeINS0_15ReverseIteratorENS0_19LayerMetricsWrapperEZNS0_15APZCTreeManager24UpdateHitTestingTreeImplIS3_EEvNS0_8LayersIdERKT_bS6_jEUlS3_E_ZNS5_IS3_EEvS6_S9_bS6_jEUlS3_E0_EENS_8EnableIfIXaasr6IsSameIDTclfp0_fp_EEvEE5valuesr6IsSameIDTclfp1_fp_EEvEE5valueEvE4TypeET0_RKT1_RKT2_ /buildssrc/gfx/layers/TreeTraversal.h:142:5
#8 _ZN7mozilla6layersL11ForEachNodeINS0_15ReverseIteratorENS0_19LayerMetricsWrapperEZNS0_15APZCTreeManager24UpdateHitTestingTreeImplIS3_EEvNS0_8LayersIdERKT_bS6_jEUlS3_E_ZNS5_IS3_EEvS6_S9_bS6_jEUlS3_E0_EENS_8EnableIfIXaasr6IsSameIDTclfp0_fp_EEvEE5valuesr6IsSameIDTclfp1_fp_EEvEE5valueEvE4TypeET0_RKT1_RKT2_ /buildssrc/gfx/layers/TreeTraversal.h:142:5
#9 void mozilla::layers::APZCTreeManager::UpdateHitTestingTreeImpl<mozilla::layers::LayerMetricsWrapper>(mozilla::layers::LayersId, mozilla::layers::LayerMetricsWrapper const&, bool, mozilla::layers::LayersId, unsigned int) /buildssrc/gfx/layers/apz/src/APZCTreeManager.cpp:405:5
#10 mozilla::layers::APZCTreeManager::UpdateHitTestingTree(mozilla::layers::LayersId, mozilla::layers::Layer*, bool, mozilla::layers::LayersId, unsigned int) /buildssrc/gfx/layers/apz/src/APZCTreeManager.cpp:576:3
#11 mozilla::layers::CompositorBridgeParent::NotifyShadowTreeTransaction(mozilla::layers::LayersId, bool, mozilla::layers::FocusTarget const&, bool, unsigned int, bool, bool) /buildssrc/gfx/layers/ipc/CompositorBridgeParent.cpp:905:22
#12 mozilla::layers::CrossProcessCompositorBridgeParent::ShadowLayersUpdated(mozilla::layers::LayerTransactionParent*, mozilla::layers::TransactionInfo const&, bool) /buildssrc/gfx/layers/ipc/CrossProcessCompositorBridgeParent.cpp:354:19
#13 mozilla::layers::LayerTransactionParent::RecvUpdate(mozilla::layers::TransactionInfo const&) /buildssrc/gfx/layers/ipc/LayerTransactionParent.cpp:458:22
#14 mozilla::layers::PLayerTransactionParent::OnMessageReceived(IPC::Message const&) /buildssrc/obj-firefox/ipc/ipdl/PLayerTransactionParent.cpp:112:20
#15 mozilla::layers::PCompositorManagerParent::OnMessageReceived(IPC::Message const&) /buildssrc/obj-firefox/ipc/ipdl/PCompositorManagerParent.cpp:110:28
#16 mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) /buildssrc/ipc/glue/MessageChannel.cpp:2136:25
#17 mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) /buildssrc/ipc/glue/MessageChannel.cpp:2066:17
#18 mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) /buildssrc/ipc/glue/MessageChannel.cpp:1912:5
#19 mozilla::ipc::MessageChannel::MessageTask::Run() /buildssrc/ipc/glue/MessageChannel.cpp:1945:15
#20 MessageLoop::RunTask(already_AddRefed<nsIRunnable>) /buildssrc/ipc/chromium/src/base/message_loop.cc:452:9
#21 MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask&&) /buildssrc/ipc/chromium/src/base/message_loop.cc:460:5
#22 MessageLoop::DoWork() /buildssrc/ipc/chromium/src/base/message_loop.cc:535:13
#23 base::MessagePumpDefault::Run(base::MessagePump::Delegate*) /buildssrc/ipc/chromium/src/base/message_pump_default.cc:36:31
#24 MessageLoop::RunInternal() /buildssrc/ipc/chromium/src/base/message_loop.cc:326:10
#25 MessageLoop::Run() /buildssrc/ipc/chromium/src/base/message_loop.cc:299:3
#26 base::Thread::ThreadMain() /buildssrc/ipc/chromium/src/base/thread.cc:181:16
#27 ThreadFunc(void*) /buildssrc/ipc/chromium/src/base/platform_thread_posix.cc:38:13
#28 start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)
#29 clone /build/glibc-Cl5G7W/glibc-2.23/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Flags: in-testsuite?
This appears to be coming down the HitTestingTreeNode::MakeRoot / APZCTreeManager::AttachNodeToTree path:

https://searchfox.org/mozilla-central/rev/55da592d85c2baf8d8818010c41d9738c97013d2/gfx/layers/apz/src/APZCTreeManager.cpp#1056

Looks like the assert was added to this path in bug 1109873.
Depends on: 1109873
Flags: needinfo?(botond)
Priority: -- → P3
Whiteboard: [gfx-noted]
This is likely a dupe of bug 1458145. Getting a layer tree dump will confirm.
Blocks: 1109873
No longer depends on: 1109873
Flags: needinfo?(botond)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) (parental leave) from comment #2)
> This is likely a dupe of bug 1458145. Getting a layer tree dump will confirm.

It's similar to bug 1458145 in that Layout is sending us a malformed layer tree, but the way in which the layer tree is malformed is different.

In bug 1458145, a layer and its descendant both had scroll metadata with the same scroll id.

Here, a layer (0x7f3b7f02d800) has three scroll metadata, with the second one having scrollId=5, and a sibling layer (0x7f3b7f02dc00) has one scroll metadata with scrollId=5. As a result, conflicting scroll parents are computed for scrollId=5.
Blocks: APZLayout
Component: Graphics: Layers → Layout: Scrolling and Overflow
Duplicate of this bug: 1674894

A Pernosco session is available here: https://pernos.co/debug/GTLHN-tvOPNXYaPFl9fZag/index.html

A Pernosco session is available: https://pernos.co/debug/OEK6hzTUCrHUK9zvAckkyg/index.html

This has layout.display-list.dump-content=true as requested.

Flags: needinfo?(botond)

Thanks!

Assignee: nobody → botond
Flags: needinfo?(botond)

Got a fresh display list dump from the Pernosco recording.

Analysis so far, based on IDs from this dump:

  • We have a container layer 0x55d747051e80 which has three scroll metadatas, for scroll ids 7, 6, and 5.
  • We have a container layer 0x55d746c40420 which has one scroll metadata, for scroll id 6.
  • These are siblings, their parent is a container layer with scroll metadata for scroll id 4.
  • The first layer's branch in the scroll node tree yields the scroll parent chain 7 -> 6 -> 5 -> 4, the second layer's branch yields the chain 6 -> 4. We get an assertion because 6 has conflicting scroll parents.
  • Both container layers come from nsDisplayScrollInfoLayer display items. For such items, the bottom-most metadata comes from the item's mScrollFrame, and the remaining ones are based on the item's ASR chain.
  • Looking at the two items' ASR chains, they are indeed (6, 5, 4) and (4), respectively. This seems wrong; the second item's ASR chain should be (5, 4).
Attachment #9005704 - Attachment is obsolete: true

From talking to Botond it sounds like hitting this code

https://searchfox.org/mozilla-central/rev/bc9e328cce65d10f648130229c3955660d300067/layout/generic/nsGfxScrollFrame.cpp#3968

leads to this situation. Since it can cause other things to go wrong maybe we should be recording some data, maybe with telemetry (or gfx critical note maybe if its rare/bad enough), about how often this happens.

To summarize what happens here (continuing to use the IDs from comment 9):

  • When display list building enters (5) (by which I mean, the scroll frame with scroll id 5), that scroll frame is not active, but its descendant (6) is.
  • In such situations, items scrolled by (5) get assigned the enclosing ASR, which in this case is (4).
  • Items scrolled by descendant ASRs, such as (6), get assigned that ASR. If you were to query the ASR chain of such an item at this time, you'd get (6, 4) (note that (5) missing). Note that the item only has a pointer to its own ASR; that ASR then points to its parent, and so on, to form the chain.
  • When display list building leaves (5), we realize (5) actually needs to be active (because it has an active descendant), and we take the InsertScrollFrame code path that Timothy linked to. This tries to "fix up" ASR chains, by e.g. "reparenting" (6) such that its parent is now (5), not (4). As a result, when the ASR chain of an item scrolled by (6) is queried, it is now the correct (6, 5, 4). However, nothing fixes up items which should have gotten (5) as their ASR but actually got (4); those remain with an incorrect ASR.

I can't think of a simple way to fix InsertScrollFrame to address situations like this. (Markus, if you have ideas, please let me know.)

However, having to activate a parent scroll frame because it has an active descendant is a fairly common scenario. So why does this bug not occur more often? The answer is that we have another mechanism to activate the ancestors of a scroll frame when we activate the scroll frame: by setting a zero-margin displayport on the ancestors. This happens between paints (usually in response to events), such that by the next paint we know the relevant scroll frames are active by the time display list building enters them.

Why does that mechanism fail in this test case? Because the method our displayport-setting code uses to determine a scroll frame's ancestor is not 100% accurate, i.e. it does not always match what the actual scroll parent will be during display list building. In this test case in particular, the descendant in question is an out-of-flow frame. The actual scroll parent ends up being an ancestor of its placeholder frame, while our displayport-setting code just looks at direct ancestors in the frame tree.

I have a candidate fix that improves our displayport-setting code to navigate to placeholder frames.

One thing that's not clear to me is along the lines of what Timothy pointed out in comment 10: is the InsertScrollFrame codepath ever really necessary, or is it just a fallback for cases where the displayport-based activation mechanism is buggy or fails for some reason?

This ensures that the notion of a scroll frame's scrollable ancestor used in
SetZeroMarginDisplayPortOnAsyncScrollableAncestor() to activate ancestors
when activating a scroll frame, matches what the actual ancestor is
according to display list building logic.

This avoids us taking the buggy "activate a scroll parent after the fact"
codepath (AutoCurrentActiveScrolledRootSetter::InsertScrollFrame()), which
can result in a display list with incorrect ASRs, in at least some cases.

Pushed by bballo@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b7973553b723
Follow out-of-flow frames to their placeholders in GetAsyncScrollableAncestorFrame(). r=tnikkel
Status: NEW → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → 86 Branch

(In reply to Botond Ballo [:botond] [away until Jan 4] from comment #11)

Why does that mechanism fail in this test case? Because the method our displayport-setting code uses to determine a scroll frame's ancestor is not 100% accurate, i.e. it does not always match what the actual scroll parent will be during display list building. In this test case in particular, the descendant in question is an out-of-flow frame. The actual scroll parent ends up being an ancestor of its placeholder frame, while our displayport-setting code just looks at direct ancestors in the frame tree.

Hmm, I actually find this a bit surprising - I would have thought that ignoring placeholder frames would give the correct ASR chain. For example:

<scrollbox id="A">
  <div style="position:absolute">
    <scrollbox id="B"></scrollbox>
  </div>
</scrollbox>

Here, the ASR tree should be

  • root
    • A
    • B

i.e. A and B should be siblings. Activating B does not need to activate A, because B hands off to the root scroll frame.

Blocks: 1683842

The test here makes us trip a fatal assertion in test-verify mode, apparently. See bug 1683842.

(In reply to Timothy Nikkel (:tnikkel) from comment #10)

From talking to Botond it sounds like hitting this code

https://searchfox.org/mozilla-central/rev/bc9e328cce65d10f648130229c3955660d300067/layout/generic/nsGfxScrollFrame.cpp#3968

leads to this situation. Since it can cause other things to go wrong maybe we should be recording some data, maybe with telemetry (or gfx critical note maybe if its rare/bad enough), about how often this happens.

Based on this Try push, it at least doesn't happen in our test suite. We could explore adding telemetry or a gfx critical note to learn more about occurrences in the wild if you think that's worth it.

ni?myself as a reminder to look into comment 15

Flags: needinfo?(botond)

(In reply to Botond Ballo [:botond] from comment #17)

(In reply to Timothy Nikkel (:tnikkel) from comment #10)

From talking to Botond it sounds like hitting this code

https://searchfox.org/mozilla-central/rev/bc9e328cce65d10f648130229c3955660d300067/layout/generic/nsGfxScrollFrame.cpp#3968

leads to this situation. Since it can cause other things to go wrong maybe we should be recording some data, maybe with telemetry (or gfx critical note maybe if its rare/bad enough), about how often this happens.

Based on this Try push, it at least doesn't happen in our test suite. We could explore adding telemetry or a gfx critical note to learn more about occurrences in the wild if you think that's worth it.

It seems worth it since it can lead to bad situations. I filed bug 1685468.

(In reply to Markus Stange [:mstange] from comment #15)

(In reply to Botond Ballo [:botond] [away until Jan 4] from comment #11)

Why does that mechanism fail in this test case? Because the method our displayport-setting code uses to determine a scroll frame's ancestor is not 100% accurate, i.e. it does not always match what the actual scroll parent will be during display list building. In this test case in particular, the descendant in question is an out-of-flow frame. The actual scroll parent ends up being an ancestor of its placeholder frame, while our displayport-setting code just looks at direct ancestors in the frame tree.

Hmm, I actually find this a bit surprising - I would have thought that ignoring placeholder frames would give the correct ASR chain. For example:

<scrollbox id="A">
  <div style="position:absolute">
    <scrollbox id="B"></scrollbox>
  </div>
</scrollbox>

Here, the ASR tree should be

  • root
    • A
    • B

i.e. A and B should be siblings. Activating B does not need to activate A, because B hands off to the root scroll frame.

This comment is correct. The patch here made SetZeroMarginDisplayPortOnAsyncScrollableAncestor match mScrollParentId on the ScrollMetadata that we send to the compositor. So the ASR tree and mScrollParentId on the ScrollMetadata are different.

It seems like we might be able to make these match so I filed bug 1693451.

Summarizing some discussions around this on chat:

There are at least three places where we compute some notion of a "scrollable ancestor":

  1. Parent links in the ASR chain.
  2. mScrollParentId in the scroll metadata.
  3. GetAsyncScrollableAncestorFrame (with the important call site being SetZeroMarginDisplayPortOnAsyncScrollableAncestors)

With that in mind:

  • There was (and still is) a pre-existing discrepancy between #1 (ignores placeholder frames) and #2 (follows placeholder frames).
  • #3 used to be consistent with #1 (i.e. ignore placeholder frames), and in this bug we changed it to be consistent with #2 (i.e. follow placeholder frames) instead.
  • It's important for #3 to be consistent with #2, to avoid relying on the buggy InsertScrollFrame codepath which can cause the assertion in the bug title.
  • The actual behaviour we want is #1, i.e. we'd like to resolve the pre-existing discrepancy by changing #2 (and #3 along with it) to behave like #1. Bug 1693451 tracks this.
  • (Mentioning for completeness since this confused me: in this testcase, prior to the fix, we did see an ASR chain that followed placeholder frames. The reason for this was that we were getting into the buggy InsertScrollFrame codepath which "fixes up" ASR chains. The entry into this codepath is governed by scroll parent logic, so we ended up with an ASR chain that follows a placeholder.)

Conclusion: there is nothing further to do here besides fixing the pre-existing issue in bug 1693451.

Flags: needinfo?(botond)
You need to log in before you can comment on or make changes to this bug.