Bug 1569131 Comment 14 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Ok so writing out what I'm now debugging.

-------------

STR: 

* start up a (debug) build with webrender and split-render-roots enabled
* load up a bunch of youtube videos in different tabs
* toggle through them over and over until you hit a GPU crash (window flickers black/white)

In release builds I'm still getting GPU process crashes, just somewhere else. I'm pretty sure the debug crash is more instructive since it's catching the problem earlier. 

--------------

Problem description:

Something somewhere is causing APZ's and WR's understanding of the page to fall out of sync. Specifically I am seeing a crash in APZCTreeManager::GetAPZCAtPointWR when processing ReceiveMouseInput. We seemingly successfully call WebRenderAPI::HitTest and get back a pipelineId/scrollId, but GetTargetNode fails to resolve them and we crash on an assert that claims that this should only happen if scrollId==0 (it was 2 in my last crash).

This isn't a particularly surprising bug since document splitting makes it significantly more complex to keep APZ/WR in sync, but it is a difficult one for me to debug. I'm not really sure where to begin, or how to try to isolate the issue. Something's getting out of sync... somehow.

Working under the theory that this was *also* caused by [this patch](https://phabricator.services.mozilla.com/D37078) probably helps isolate the problem a bit. Certainly deferring WRBP messages needs to properly cooperate with epoch management. I'm just very new to the epoch management code, so I'm not sure what to look for.

While walking through some of the APZ update queue code, I ran across this comment that was a bit troubling, since it specifically calls out this approach as something which could mess up update order, but asserts it's fine since we only have two RenderRoots -- except we have 3 now. Although I wouldn't *expect* this to be an issue because the new third RenderRoot is just the "you're fullscreen now" overlay, which I don't ever trigger in my STR.

ni?sotaro -- you reviewed a lot of the work here, any ideas for how to proceed on the hunt for this bug?
Ok so writing out what I'm now debugging.

-------------

STR: 

* start up a (debug) build with webrender and split-render-roots enabled
* load up a bunch of youtube videos in different tabs
* toggle through them over and over until you hit a GPU crash (window flickers black/white)

In release builds I'm still getting GPU process crashes, just somewhere else. I'm pretty sure the debug crash is more instructive since it's catching the problem earlier. 

--------------

Problem description:

Something somewhere is causing APZ's and WR's understanding of the page to fall out of sync. Specifically I am seeing a crash in APZCTreeManager::GetAPZCAtPointWR when processing ReceiveMouseInput. We seemingly successfully call WebRenderAPI::HitTest and get back a pipelineId/scrollId, but GetTargetNode fails to resolve them and we crash on an assert that claims that this should only happen if scrollId==0 (it was 2 in my last crash).

This isn't a particularly surprising bug since document splitting makes it significantly more complex to keep APZ/WR in sync, but it is a difficult one for me to debug. I'm not really sure where to begin, or how to try to isolate the issue. Something's getting out of sync... somehow.

Working under the theory that this was *also* caused by [this patch](https://phabricator.services.mozilla.com/D37078) probably helps isolate the problem a bit. Certainly deferring WRBP messages needs to properly cooperate with epoch management. I'm just very new to the epoch management code, so I'm not sure what to look for.

While walking through some of the APZ update queue code, I ran across [this comment](https://searchfox.org/mozilla-central/source/gfx/layers/apz/src/APZUpdater.cpp#400-402) that was a bit troubling, since it specifically calls out this approach as something which could mess up update order, but asserts it's fine since we only have two RenderRoots -- except we have 3 now. Although I wouldn't *expect* this to be an issue because the new third RenderRoot is just the "you're fullscreen now" overlay, which I don't ever trigger in my STR.

ni?sotaro -- you reviewed a lot of the work here, any ideas for how to proceed on the hunt for this bug?

Back to Bug 1569131 Comment 14