Closed Bug 1302236 Opened 8 years ago Closed 8 years ago

APZ assert when compositor restarts

Categories

(Core :: Panning and Zooming, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: dvander, Assigned: dvander)

References

(Blocks 1 open bug)

Details

(Whiteboard: [gfx-noted])

Attachments

(1 file, 1 obsolete file)

Attached file apzlog.txt (obsolete) —
This is kind of hard to reproduce at the moment since there's lots of patches that haven't landed yet. But I don't want to lose this so I'm filing ahead of time.

STR:
 1. Enable layers.gpu-process.dev.enabled
 2. Start Firefox, navigate to a simple webpage (no flash, or video)
 3. kill -s 9 the plugin-container for the GPU
 4. Restore focus to Firefox.

This assert in APZCTreeManager.cpp then fires: http://searchfox.org/mozilla-central/rev/591354c5e5521cf845bf6b6ef44e8b3b0aeda17d/gfx/layers/apz/src/APZCTreeManager.cpp#1732

Attached is some APZ logging from immediately after the compositor shutdown. Can attach the debugger or provide more logging if needed.
Flags: needinfo?(bugmail)
So looking at the log, the layer dump at the end has the full layer tree (including content process), so I don't the problem is that the content process layers are missing (as we had hypothesized on IRC yesterday).

Instead, it looks like the APZ code didn't build an APZC for the layer 0x7f76c8860000 in the content layer tree - in the corresponding hit-testing tree dump just above the HitTestingTreeNode corresponding to that layer is 0x7f76c19fab40 but it has "APZC ((nil))" instead of a non-null APZC which is what I would expect. So that definitely looks wrong, there might be a bug in our APZC building code that is exposed by this scenario.

Are the STR reliable? If so, can you put your local patches somewhere (github branch is fine) so I can try to repro and debug? Alternatively if it's easy for you to do I'd be interested to know if we end up hitting the line at [1] because looking at the code it seems like the most likely explanation - the GeckoContentController from the child process needs to get re-registered before we'll build APZCs for that process, and in the GPU restart scenario that might not have happened before the layer tree is updated and input events start hitting those layers.

[1] http://searchfox.org/mozilla-central/rev/bcc9ea6947878ca6378e5b7d6f08c1c090ed9eb7/gfx/layers/apz/src/APZCTreeManager.cpp#362
Component: Graphics: Layers → Panning and Zooming
Flags: needinfo?(bugmail)
Priority: -- → P3
Whiteboard: [gfx-noted]
Version: unspecified → Trunk
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #1)
> So looking at the log, the layer dump at the end has the full layer tree
> (including content process), so I don't the problem is that the content
> process layers are missing (as we had hypothesized on IRC yesterday).
> 
> Instead, it looks like the APZ code didn't build an APZC for the layer
> 0x7f76c8860000 in the content layer tree - in the corresponding hit-testing
> tree dump just above the HitTestingTreeNode corresponding to that layer is
> 0x7f76c19fab40 but it has "APZC ((nil))" instead of a non-null APZC which is
> what I would expect. So that definitely looks wrong, there might be a bug in
> our APZC building code that is exposed by this scenario.
> 
> Are the STR reliable? If so, can you put your local patches somewhere
> (github branch is fine) so I can try to repro and debug? Alternatively if
> it's easy for you to do I'd be interested to know if we end up hitting the
> line at [1] because looking at the code it seems like the most likely
> explanation - the GeckoContentController from the child process needs to get
> re-registered before we'll build APZCs for that process, and in the GPU
> restart scenario that might not have happened before the layer tree is
> updated and input events start hitting those layers.
> 
> [1]
> http://searchfox.org/mozilla-central/rev/
> bcc9ea6947878ca6378e5b7d6f08c1c090ed9eb7/gfx/layers/apz/src/APZCTreeManager.
> cpp#362

I can try that later today, but there is a GitHub branch: https://github.com/dvander/gecko-dev/tree/gpu-wip
Yup, you're right, we hit that return false. That makes sense.
Ok, cool. Let me know if you want me to look into fixing this. Otherwise I'm assuming you'll get to it eventually. I never got around to building your WIP branch.
Attached patch fixSplinter Review
On Windows, APZ doesn't assert but also doesn't scroll. I think it's probably the same bug. This patch recreates the controller and seems to work. We should be guaranteed that this message arrives before any content layers do.

(Unrelated, it's a little confusing that ContentChild has an APZCTreeManager?)
Assignee: nobody → dvander
Attachment #8790453 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8796438 - Flags: review?(bugmail)
Comment on attachment 8796438 [details] [diff] [review]
fix

Review of attachment 8796438 [details] [diff] [review]:
-----------------------------------------------------------------

This looks ok, but while I poking around I discovered that TabChild::mAPZChild is never set to anything that's not null. That seems to have been broken somewhere along the way... once we fix that we might want to change this patch to assign to mAPZChild instead of a local |apz| variable.
Attachment #8796438 - Flags: review?(bugmail) → review+
Looks like the call to mBrowser->SetAPZChild was removed at [1]. The code shifted around but the new code at [2] doesn't call SetAPZChild, possibly because at that point we don't have a reference to an APZChild. I'll file a followup bug for this.

[1] https://hg.mozilla.org/mozilla-central/rev/b0d9956e1251#l7.227
[2] https://hg.mozilla.org/mozilla-central/rev/b0d9956e1251#l5.119
Pushed by danderson@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c24ac66f5bb8
Recreate RemoteContentControllers for tabs after the GPU process restarts. (bug 1302236, r=kats)
https://hg.mozilla.org/mozilla-central/rev/c24ac66f5bb8
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: