Closed Bug 1533673 Opened 1 year ago Closed 1 year ago

Make APZ tell chrome main thread the transforms for chrome to content coordinate spaces when GPU process is used


(Core :: Panning and Zooming, enhancement, P3)




Fission Milestone M3
Tracking Status
firefox68 --- fixed


(Reporter: hsivonen, Assigned: kats)


(Blocks 1 open bug)


(Whiteboard: [fission-event-m2])


(2 files)

Follow-up for bug 1530661:

Bug 1530661 is disabled by default when the GPU process is in use. The IPC problem should be tracked down and fixed so that the pref can be removed and the code run unconditionally in the GPU process case.

Marking [fission-event-m2] on the assumption that for [fission-event-m1] it's more important to have feature breadth on some platforms (Linux and Mac) than platform coverage (Windows).

Whiteboard: [fission-event-m2]
Priority: -- → P3

Let's see if it just works now, considering that in the original bug the mutex scope got narrowed after blocking Windows by pref:

(In reply to Henri Sivonen (:hsivonen) from comment #2)

Let's see if it just works now, considering that in the original bug the mutex scope got narrowed after blocking Windows by pref:

No, still fails, so the mutex scope change didn't fix this.

Blocks: rendering-fission
No longer blocks: gfx-fission
Assignee: nobody → kats

Still failing. The RemoteContentController::SendLayerTransforms is blowing up with a Route error: message sent to unknown actor ID error. This is quite bizarre because the code is (a) running on the compositor thread, which is the right thread, and (b) guarded by an mCanSend check, which should be false if the ActorDestroy method was called.

Clearly some sort of race condition since there's a bunch of tests running fine before it fails but it's not clear why this would happen. I'll try reproducing locally on Linux with the GPU process enabled and use rr if I can, as that will be easier to debug than spamming tryserver.

Best theory so far is that the main thread is running RemoteCompositorSession::Shutdown at the same time that the GPU process is doing its RemoteContentController::SendLayerTransforms, and so when the message arrives to the main thread the actor is gone. But if that's the case I'm kind of surprised we haven't run into this crash before because all the RemoteContentController methods should be subject to the same problem.

The theory seems to be supported by a try push with additional logging here. The crash always seems to occur immediately after RemoteCompositorSession::Shutdown runs.

Looking at the RemoteContentController methods, NotifyLayerTransforms is the only one that would get called from the updater thread (which with WR enabled would be the scene builder thread) so maybe that's related, although I can't see how that would matter, since it redispatches itself to the compositor thread anyway.

Anyway, now that I think I know what's going I'll try to come up with a fix.

Currently it's possible for the RemoteContentController in the GPU process to
be sending a message to the UI process while the UI process is running
RemoteCompositorSession::Shutdown. This means that by the time that message
is processed, the UI process actor is destroyed and the message produces
a routing error.

This patch ensures that before RemoteCompositorSession::Shutdown completes,
it notifies the RemoteContentController in the GPU process that it's about
to destroy the APZChild actor. This eliminates the race because it ensures
the RemoteContentController is synchronously notified of the impending
actor destruction before it tries to send the message.

With the IPC fix in the previous patch this seems to work now.

Depends on D29941

Fission Milestone: --- → M3
Pushed by
Prevent the GPU RemoteContentController from sending messages to a dead actor. r=rhunt
Allow APZ to send fission matrices with the GPU process. r=hsivonen
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
Regressions: 1561570
Regressions: 1573795
You need to log in before you can comment on or make changes to this bug.