Closed Bug 1533673 Opened 6 years ago Closed 6 years ago

Make APZ tell chrome main thread the transforms for chrome to content coordinate spaces when GPU process is used

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla68

Project Flags:

Fission Milestone

Tracking Flags:

Tracking

Status

firefox68

---

fixed

People

(Reporter: hsivonen, Assigned: kats)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fission-event-m2])

Attachments

(2 files)

Bug 1533673 - Prevent the GPU RemoteContentController from sending messages to a dead actor. r?rhunt 6 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1533673 - Allow APZ to send fission matrices with the GPU process. r?hsivonen 6 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 47 bytes, text/x-phabricator-request		Details \| Review

Henri Sivonen (:hsivonen)

Reporter

Description

•

6 years ago

Follow-up for bug 1530661:

Bug 1530661 is disabled by default when the GPU process is in use. The IPC problem should be tracked down and fixed so that the pref can be removed and the code run unconditionally in the GPU process case.

Henri Sivonen (:hsivonen)

Reporter

Comment 1

•

6 years ago

Marking [fission-event-m2] on the assumption that for [fission-event-m1] it's more important to have feature breadth on some platforms (Linux and Mac) than platform coverage (Windows).

Whiteboard: [fission-event-m2]

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

6 years ago

Priority: -- → P3

Henri Sivonen (:hsivonen)

Reporter

Comment 2

•

6 years ago

Let's see if it just works now, considering that in the original bug the mutex scope got narrowed after blocking Windows by pref:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8c3dc02b3f5934a4d103db5017fee75642d9fef5

Henri Sivonen (:hsivonen)

Reporter

Comment 3

•

6 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #2)

Let's see if it just works now, considering that in the original bug the mutex scope got narrowed after blocking Windows by pref:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8c3dc02b3f5934a4d103db5017fee75642d9fef5

No, still fails, so the mutex scope change didn't fix this.

Jessie [:jbonisteel] pls NI

Updated

•

6 years ago

Blocks: gfx-fission

Ryan Hunt [:rhunt]

Updated

•

6 years ago

Blocks: rendering-fission
No longer blocks: gfx-fission

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

6 years ago

Assignee: nobody → kats

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 4

•

6 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=f7014f86fc6986f47e8e636e369fb1beae1734ed

Still failing. The RemoteContentController::SendLayerTransforms is blowing up with a Route error: message sent to unknown actor ID error. This is quite bizarre because the code is (a) running on the compositor thread, which is the right thread, and (b) guarded by an mCanSend check, which should be false if the ActorDestroy method was called.

Clearly some sort of race condition since there's a bunch of tests running fine before it fails but it's not clear why this would happen. I'll try reproducing locally on Linux with the GPU process enabled and use rr if I can, as that will be easier to debug than spamming tryserver.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 5

•

6 years ago

Best theory so far is that the main thread is running RemoteCompositorSession::Shutdown at the same time that the GPU process is doing its RemoteContentController::SendLayerTransforms, and so when the message arrives to the main thread the actor is gone. But if that's the case I'm kind of surprised we haven't run into this crash before because all the RemoteContentController methods should be subject to the same problem.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 6

•

6 years ago

The theory seems to be supported by a try push with additional logging here. The crash always seems to occur immediately after RemoteCompositorSession::Shutdown runs.

Looking at the RemoteContentController methods, NotifyLayerTransforms is the only one that would get called from the updater thread (which with WR enabled would be the scene builder thread) so maybe that's related, although I can't see how that would matter, since it redispatches itself to the compositor thread anyway.

Anyway, now that I think I know what's going I'll try to come up with a fix.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 7

•

6 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=07e43ea284c87ef2de61f197245f6c09538ac00b

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 8

•

6 years ago

Attached file Bug 1533673 - Prevent the GPU RemoteContentController from sending messages to a dead actor. r?rhunt — Details

Currently it's possible for the RemoteContentController in the GPU process to
be sending a message to the UI process while the UI process is running
RemoteCompositorSession::Shutdown. This means that by the time that message
is processed, the UI process actor is destroyed and the message produces
a routing error.

This patch ensures that before RemoteCompositorSession::Shutdown completes,
it notifies the RemoteContentController in the GPU process that it's about
to destroy the APZChild actor. This eliminates the race because it ensures
the RemoteContentController is synchronously notified of the impending
actor destruction before it tries to send the message.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 9

•

6 years ago

Attached file Bug 1533673 - Allow APZ to send fission matrices with the GPU process. r?hsivonen — Details

With the IPC fix in the previous patch this seems to work now.

Depends on D29941

Neha Kochar [:neha]

Updated

•

6 years ago

Fission Milestone: --- → M3

Pulsebot

Comment 10

•

6 years ago

Pushed by kgupta@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c737fc642ddc Prevent the GPU RemoteContentController from sending messages to a dead actor. r=rhunt https://hg.mozilla.org/integration/autoland/rev/4abfc5c2a4ee Allow APZ to send fission matrices with the GPU process. r=hsivonen

Oana Pop-Rus

Comment 11

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/c737fc642ddc
https://hg.mozilla.org/mozilla-central/rev/4abfc5c2a4ee

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox68: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla68

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

5 years ago

Regressions: 1561570

Alice0775 White

Updated

•

5 years ago

Regressions: 1573795

You need to log in before you can comment on or make changes to this bug.