Make sure APZ works with nested oop iframe

RESOLVED FIXED in Firefox 47

Status

()

Core
Panning and Zooming
RESOLVED FIXED
4 years ago
2 years ago

People

(Reporter: kanru, Assigned: peterv)

Tracking

(Blocks: 1 bug)

Trunk
mozilla47
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(tracking-b2g:backlog, firefox47 fixed)

Details

(Whiteboard: [nested-oop])

Attachments

(2 attachments, 10 obsolete attachments)

(Reporter)

Description

4 years ago
RenderFrameParent::GetApzcTreeManager assumes it could access CompositiorParent which is chrome process only.

Need to fix rendering of nested oop iframe in order to make this one work.
Component: Graphics: Layers → Panning and Zooming
Is there an class in the chrome process that serves as the communication bridge to the nested child process?
Flags: needinfo?(kchen)
(Reporter)

Comment 2

4 years ago
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #1)
> Is there an class in the chrome process that serves as the communication
> bridge to the nested child process?

They are still connected by PContent. Note that RenderFrameParent::GetApzcTreeManager is called from a child process that embed another child process.
Flags: needinfo?(kchen)
feature-b2g: --- → 2.2
(Reporter)

Updated

4 years ago
Whiteboard: [nested-oop]
blocking-b2g: --- → backlog
feature-b2g: 2.2 → ---
Trying to get some more info here to make this bug actionable.

Kan-Ru, is there a way for me to create a nested oop iframe so I can exercise the code that is being run here?

smaug, can you (or somebody else) point to docs or briefly explain the relationship between TabParent/RenderFrameParent/ContentParent? Specifically I'm interested in which of these are used for what kinds of communication when dealing with this nested oop iframe case (i.e. a process depth > 2). It was never really clear to me what the distinction between these was originally meant to be.
Flags: needinfo?(kchen)
Flags: needinfo?(bugs)

Comment 4

4 years ago
That is really a question to kanru, the nested oop part of it.
(And nested oop is actually sibling oop, AFAIK.)
Flags: needinfo?(bugs)
Based on the documentation Kan-Ru pointed me to at [1], I suspect that we need to move the APZ operations from PBrowser (TabParent/TabChild) to PCompositor (CompositorParent/CompositorChild), or to a new protocol managed by PCompositor. Stuff in TabChild would use the CompositorChild (or APZChild or whatever) to send messages to the CompositorParent (or APZParent) where the implementation of the GeckoContentController interface would live. That implementation would talk to the APZCTreeManager as necessary, similar to how RemoteContentController currently works in RenderFrameParent.cpp.

The GeckoContentController implementations for APZ root-process, Metro, and Fennec probably won't need to change.

[1] https://wiki.mozilla.org/Nested_Content_Processes
Flags: needinfo?(kchen)
Duplicate of this bug: 845686
Also based on what happened in bug 1041425, the input event routing will probably still need to stay in PBrowser. Events will come into (root) TabParent, get untransformed by the APZ, get sent to the (child) TabChild, get shunted to the (child) TabParent, which will send it further to the (grandchild) TabChild. That can probably stay as-is until we get around to fixing bug 920036 at which point it might need to change.
(Reporter)

Comment 8

4 years ago
I've put the basic nested gfx rendering doc on the wiki[1]. https://github.com/rexboy7/gaia/commit/4fa867a6ae69bb05a14c453e61209294f83fcba4 is simple test app for nested oop.
Assignee: nobody → bugmail.mozilla
blocking-b2g: backlog → 2.2?
blocking-b2g: 2.2? → ---
feature-b2g: --- → 2.2?
feature-b2g: 2.2? → 2.2+
Unassigning for now since I'm not actively working on this.
Assignee: bugmail.mozilla → nobody
This got kicked out of 2.2 with the date changes.
feature-b2g: 2.2+ → ---
blocking-b2g: --- → backlog
(Reporter)

Updated

3 years ago
Blocks: 1114872
Duplicate of this bug: 1124087
blocking-b2g: backlog → ---
tracking-b2g: --- → backlog
(Assignee)

Updated

3 years ago
Blocks: 558184
(Assignee)

Updated

3 years ago
Assignee: nobody → peterv
Status: NEW → ASSIGNED
Any updates here, Peter?
Flags: needinfo?(peterv)
(Assignee)

Comment 13

3 years ago
Yeah, I'm pretty close to having a final patch I think. Some final details to nail down.
Flags: needinfo?(peterv)
Thanks. Please feel free to post any WIPs if you would like feedback. I'm hoping the patches will fix bug 1206904 which will allow us to enable APZ on Linux desktop as well.
Blocks: 1206904
(Assignee)

Comment 16

3 years ago
Created attachment 8680192 [details] [diff] [review]
Move RemoteContentController

This just moves the code for RemoteContentController, to make the patch slightly easier to review.
(Assignee)

Comment 17

3 years ago
Created attachment 8680195 [details] [diff] [review]
v1 (for review)

This is on top of attachment 8680192 [details] [diff] [review] for easier reviewing, but we should just combine them again before landing.
(Assignee)

Comment 18

3 years ago
Created attachment 8682699 [details] [diff] [review]
v2 (for review)

This is much simpler, I used to open PAPZ through PCompositor but that complicated things because some of PCompositor runs off the main thread and because of Nuwa's handling of PCompositor, and it also caused leaks (CompositorChild leaks already, and now we'd leak APZChild too). Opening through PContent avoids these issues and simplifies things quite a bit.

I moved the GeckoContentController and APZ related IPDL methods from PBrowser to a new protocol named PAPZ. PAPZ is opened through PContent in ContentParent::InitInternal whenever we need it (gfxPlatform::AsyncPanZoomEnabled() is true). APZParent keeps a reference to itself until its Actordestroy is called (because we call PAPZChild::Close in ContentChild::ActorDestroy). TabChild::InitRenderingState uses PAPZ's UpdateControllerForLayerTree to create a RemoteContentController in the main process and store it in sIndirectLayerTrees. A RemoteContentController holds a reference to the APZParent and to the layer's id. It also holds a reference to the top-level TabParent, but that's only used in RemoteContentController::SendAsyncScrollDOMEvent, TabParent::AsyncScrollDOMEvent forwards the call through its IPC link to the right TabParent in a child if the layer id is for a grandchild.
EraseLayerState calls Destroy on the RemoteContentController so that we release the strong references that it holds, to avoid keeping them alive for too long (GeckoContentControllers are refcounted and sometimes stick around for a while).
Let me know if I need to explain anything else.
Attachment #8679617 - Attachment is obsolete: true
Attachment #8680195 - Attachment is obsolete: true
Attachment #8682699 - Flags: review?(bugmail.mozilla)
Comment on attachment 8682699 [details] [diff] [review]
v2 (for review)

Review of attachment 8682699 [details] [diff] [review]:
-----------------------------------------------------------------

I did a first pass through this patch to wrap my head around it. For the most part it seems pretty good. I have some comments below. I'd like to get somebody more familiar with the IPC setup to also look at those bits (I would have requested :bent but I'm not sure how active he still is, or who else would be appropriate - maybe :khuey?). :BenWa should probably also look a quick look at the changes on the CompositorParent/Child side to make sure there's no obvious problems there. I'll do another pass for stylistic nits and other less important issues.

::: dom/ipc/TabChild.cpp
@@ +2660,5 @@
>      }
>  
> +    if (mAsyncPanZoomEnabled) {
> +      APZChild* apz = APZChild::Get();
> +      apz->SendUpdateControllerForLayerTree(mLayersId,

Sending the layersId from the child process to the parent process is a security risk on B2G. In the case where the child process gets pwned, it can send the layersid for a different layer tree or process and can spy on notifications intended for that layer tree / process. (see bug 898563; that's why we do things like https://hg.mozilla.org/mozilla-central/file/61dcc13d0848/layout/ipc/RenderFrameParent.cpp#l541). Is there any way we can avoid doing this here? It might be ok to have this message go over the PBrowser interface, so that the parent side already knows what the layers id is, and then forward it to the CompositorParent from there.

@@ +2763,5 @@
>  {
> +  mRemoteFrame->SendUpdateHitRegion(aRegion);
> +  APZChild* apz = APZChild::Get();
> +  if (apz) {
> +    apz->SendUpdateHitRegion(mLayersId, aRegion);

Ditto here with respect to sending the layersid from the child to the parent. This should be avoided.

::: gfx/layers/ipc/CompositorParent.cpp
@@ +622,1 @@
>    virtual void SendAsyncScrollDOMEvent(bool aIsRootContent,

FYI I'm ripping out all this AsyncScrollDOMEvent stuff in bug 898075, it's not needed any more. (This patch reminded me to do that, thanks!)

@@ +1791,5 @@
> +CompositorParent::UpdateHitRegion(const uint64_t& aChild, const nsRegion& aRegion)
> +{
> +  MonitorAutoLock lock(*sIndirectLayerTreesLock);
> +  if (sIndirectLayerTrees[aChild].mController) {
> +    sIndirectLayerTrees[aChild].mController->SetTouchSensitiveRegion(aRegion);

I'm not happy with putting the SetTouchSensitiveRegion function on the GeckoContentController interface, since the interface is intended as messages that the APZC sends to content.

As per my previous comment, we might want to have the UpdateHitRegion message go over the PBrowser interface to avoid relying on the untrusted layers id. If we do that, perhaps we can save the touch sensitive region in the RenderFrameParent or TabParent instead?
Attachment #8682699 - Flags: review?(bugmail.mozilla) → feedback+
(Assignee)

Comment 20

3 years ago
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #19)
> Is there any way we can avoid doing this here?
> It might be ok to have this message go over the PBrowser interface, so that
> the parent side already knows what the layers id is

You might still be in a child process if you go over the PBrowser interface (coming from a grandchild), so in that case we still run into the same issue because you have to forward to its parent process, no?

I might be able to check that the layers id is registered in the correct ContentParent with NestedBrowserLayerIds.
(Assignee)

Comment 21

3 years ago
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #19)
> As per my previous comment, we might want to have the UpdateHitRegion
> message go over the PBrowser interface to avoid relying on the untrusted
> layers id. If we do that, perhaps we can save the touch sensitive region in
> the RenderFrameParent or TabParent instead?

I don't really understand how you imagine we do this. The RenderFrameParent or the TabParent can live in a child (if the layer id is for a grandchild), so we'd then need to go over IPC to *get* the touch sensitive region. AFAICT we need the value synchronously in ComputeClipRegion, no? We can't do synchronous IPC from parent to child.
Flags: needinfo?(bugmail.mozilla)
Yeah you're right. Not sure what I was thinking... nonetheless we need to find a way to avoid the parent relying on a layersid value coming from the child/grandchild. The only way I can think of doing that is by using a channel that's already per-layers-id, so that the parent side knows what the layers id is without having to be told.
Flags: needinfo?(bugmail.mozilla)
(Assignee)

Comment 23

3 years ago
Created attachment 8690252 [details] [diff] [review]
Guard against bad data

How about this as an approach to guard against a hijacked child process? Basically checks if a layers id is for a PBrowser belonging to the child process that we get the message from.
It doesn't allow me to remove the SetTouchSensitiveRegion function, there's not really another place to store the touch sensitive region in the main process.
Attachment #8690252 - Flags: feedback?(bugmail.mozilla)
Comment on attachment 8690252 [details] [diff] [review]
Guard against bad data

Review of attachment 8690252 [details] [diff] [review]:
-----------------------------------------------------------------

If I'm reading it right, it still allows one grandchild to spoof another grandchild (if they have the same parent) because LayersIdIsForNestedBrowser will still return true in that case. I'm not sure if that sort of attack is part of our threat model here, but if it is not, then this seems fine to me. I'll needinfo sicking to get clarity on this.
Attachment #8690252 - Flags: feedback?(bugmail.mozilla) → feedback+
So that you don't have to read the whole bug, the question here is an extension of https://bugzilla.mozilla.org/show_bug.cgi?id=898563#c18 - with nested content processes, if one grandchild process can read information from another grandchild process (which shares the same parent), is that a security issue?
Flags: needinfo?(jonas)
Yes, that is a security issue. Generally we will only create new nested child processes because we want to keep them separate from any of the existing nested child processes.
Flags: needinfo?(jonas)
Thanks Jonas!

Kan-Ru, does this problem affect the RecvDeallocateLayerTreeId function [1] as well? i.e. one grandchild which gets hijacked could send a deallocate message with a layers id for a different grandchild, and cause some sort of badness? It seems like the same sort of problem that we're facing here with PAPZ, but it's in code that already landed.

[1] https://dxr.mozilla.org/mozilla-central/rev/d3d286102ba7f8801e9dfe12d534f49554ba50c0/dom/ipc/ContentParent.cpp#1986
Flags: needinfo?(kchen)
(Reporter)

Comment 28

2 years ago
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #27)
> Thanks Jonas!
> 
> Kan-Ru, does this problem affect the RecvDeallocateLayerTreeId function [1]
> as well? i.e. one grandchild which gets hijacked could send a deallocate
> message with a layers id for a different grandchild, and cause some sort of
> badness? It seems like the same sort of problem that we're facing here with
> PAPZ, but it's in code that already landed.
> 
> [1]
> https://dxr.mozilla.org/mozilla-central/rev/
> d3d286102ba7f8801e9dfe12d534f49554ba50c0/dom/ipc/ContentParent.cpp#1986

No, the LayerTreeId and ContentParent has a 1-on-1 mapping relationship. The LayerTreeId is allocated by the grandchild directly here [2]. If the grandchild tries to deallocate others LayerTreeId it will be KillHard("DeallocateLayerTreeId") [1]

[2] https://dxr.mozilla.org/mozilla-central/source/layout/ipc/RenderFrameParent.cpp#316
Flags: needinfo?(kchen)
(Assignee)

Comment 29

2 years ago
(In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #28)
> No, the LayerTreeId and ContentParent has a 1-on-1 mapping relationship. The
> LayerTreeId is allocated by the grandchild directly here [2].

Given that the RenderFrameParent is running in the child (the parent of the grandchild), I don't see the 1-on-1 mapping? Isn't that ContentChild::GetSingleton() shared between all grandchild processes that have a common parent?
Flags: needinfo?(kchen)
(Assignee)

Comment 30

2 years ago
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #27)
> Kan-Ru, does this problem affect the RecvDeallocateLayerTreeId function [1]
> as well? i.e. one grandchild which gets hijacked could send a deallocate
> message with a layers id for a different grandchild, and cause some sort of
> badness?

The grandchild can't send a deallocate message, it has to rely on its parent to do that in order to avoid the KillHard. The grandchild can't really force its parent to send a deallocate message with a layers id that it controls. In my case the grandchild has a direct connection to the main process, so it can send a message with forged arguments directly.
Flags: needinfo?(kchen)
(Reporter)

Comment 31

2 years ago
(In reply to Peter Van der Beken [:peterv] from comment #29)
> (In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #28)
> > No, the LayerTreeId and ContentParent has a 1-on-1 mapping relationship. The
> > LayerTreeId is allocated by the grandchild directly here [2].
> 
> Given that the RenderFrameParent is running in the child (the parent of the
> grandchild), I don't see the 1-on-1 mapping? Isn't that
> ContentChild::GetSingleton() shared between all grandchild processes that
> have a common parent?

Ah, you're right. Still, the grandchild can't send a deallocate message as you said in comment 30.
(Assignee)

Comment 32

2 years ago
Created attachment 8698127 [details] [diff] [review]
v3 (for review)

Third try. This has APZ actors per layer id, and we create the parent when allocating the layer id. There's some trickiness where we end up creating the APZChild before the TabChild (because the IPDL messages come from different processes for grandchildren, APZChild constructor is sent from the main process but the TabChild constructor from the child process).
Attachment #8682699 - Attachment is obsolete: true
Attachment #8690252 - Attachment is obsolete: true
Attachment #8698127 - Flags: review?(bugmail.mozilla)
Sorry for the delay here, the review is going to take a dedicated chunk of time and I've been pretty busy. Let me know if you're blocked on this and I can try to get to it sooner.
No longer blocks: 1206904
Comment on attachment 8698127 [details] [diff] [review]
v3 (for review)

Review of attachment 8698127 [details] [diff] [review]:
-----------------------------------------------------------------

This looks much better, thanks! I couldn't see any architectural issues with this version, although there's a couple of issues I want to clear up before giving it an r+. One is the threading (see comment on HandleDoubleTap below) and the other is the RecvAdoptChild code (I might have some suggestions on tweaking that once I understand what's going on there). All the rest of the comments are relatively minor.

::: dom/ipc/ContentChild.cpp
@@ +1242,5 @@
>  
> +PAPZChild*
> +ContentChild::AllocPAPZChild(const TabId& aTabId)
> +{
> +    return APZChild::Create(aTabId); 

nit: trailing ws

::: dom/ipc/TabChild.cpp
@@ +1693,5 @@
> +  TABC_LOG("Handling double tap at %s with %p %p\n",
> +    Stringify(aPoint).c_str(), mGlobal.get(), mTabChildGlobal.get());
> +
> +  if (!mGlobal || !mTabChildGlobal) {
> +      return;

fix indent

@@ +1710,5 @@
> +  ViewID viewId;
> +  if (APZCCallbackHelper::GetOrCreateScrollIdentifiers(
> +      document->GetDocumentElement(), &presShellId, &viewId)) {
> +    RefPtr<TabChild> self = this;
> +    NS_DispatchToMainThread(NS_NewRunnableFunction([self, presShellId, viewId, zoomToRect] () -> void {

Why does this need to be dispatched to the main thread? Is this function running on the main thread already? If not, it should be made to run on the main thread, because it does things like access the nsIDocument which are main-thread-only.

In fact, if APZChild's functions are getting invoked on non-main-thread, then they need to redispatch themselves to the main thread, much like the RemoteContentController implementation in m-c right now does.

@@ +1727,3 @@
>  {
> +  if (aCallTakeFocusForClickFromTap && mRemoteFrame) {
> +    mRemoteFrame->SendTakeFocusForClickFromTap();

There might be a race here between the focus-taking (which happens in the parent process from this one) and whatever the ProcessSingleTap below does.

There are two possibilities: (a) the TakeFocusForClickFromTap call in the parent will set some state in the parent or (b) the call in the parent will generate an IPC message back to the child to set some state. (Or both (a) and (b)).

Scenario (a) doesn't have any problems, because anything that ProcessSingleTap does that relies on the state must involve sending messages to the parent, and those messages will arrive later than the TakeFocus message. Scenario (b) however may have a problem, because the return IPC message may arrive after ProcessSingleTap reads the focus state in the child.

In the code without this patch, the focus-taking always happened in the parent before the single-tap message was sent to the child, so any parent->child message would have been guaranteed to arrive first.

I'm not sure if this actually a problem but I'm fine with this for now, but if we run into problems we might have to re-evaluate this bit. The simplest thing might be to make the TakeFocusForClickFromTap message sync instead of async.

::: dom/ipc/TabChild.h
@@ +648,5 @@
>      bool mDidSetRealShowInfo;
>  
>      nsAutoTArray<bool, NUMBER_OF_AUDIO_CHANNELS> mAudioChannelsActive;
>  
> +    layers::APZChild* mAPZChild;

Add a comment that the APZChild will clear this raw ptr on destruction so it should never be a dangling ptr.

::: gfx/layers/apz/public/GeckoContentController.h
@@ +21,5 @@
>  
>  class GeckoContentController
>  {
>  public:
> +  NS_INLINE_DECL_THREADSAFE_REFCOUNTING_WITH_MAIN_THREAD_DESTRUCTION(GeckoContentController)

It's not clear to me why this is needed - can you explain? If possible I'd like to avoid this change.

::: gfx/layers/ipc/APZChild.cpp
@@ +31,5 @@
> +private:
> +  virtual ~TabChildCreatedObserver()
> +  {}
> +
> +  APZChild* mAPZChild;

Add a comment that the APZChild owns this TabChildCreatedObserver and that it's lifetime is longer, so the raw ptr is ok.

::: gfx/layers/ipc/APZChild.h
@@ +21,5 @@
> +
> +class APZChild final : public PAPZChild
> +{
> +public:
> +  static APZChild* Create(const dom::TabId& aTabId);

Maybe declare a private constructor as well, so that random other pieces of code can't inadvertently create an APZChild?

::: gfx/layers/ipc/CompositorParent.cpp
@@ +516,5 @@
> +  Unused << PAPZParent::Send__delete__(aPAPZ);
> +}
> +
> +class RemoteContentController : public GeckoContentController
> +                              , public PAPZParent

Can we actually move this RemoteContentController to a separate file? I don't see the value of putting it in CompositorParent.cpp

@@ +644,5 @@
>        return;
>      }
> +    if (CanSend()) {
> +      Unused << SendHandleLongTap(aPoint, aModifiers, aGuid,
> +                                        aInputBlockId);

nit: line up |aInputBlockId| with |aPoint|

@@ +714,2 @@
>    }
> +  

nit: remove trailing whitespace between functions, here and more instances below

@@ +1893,5 @@
>    sIndirectLayerTrees[aChild].mLayerManager = mLayerManager;
>  }
>  
> +/* static */
> +bool

nit: put bool on the same line as /* static */

@@ +1915,5 @@
>  CompositorParent::RecvAdoptChild(const uint64_t& child)
>  {
>    MonitorAutoLock lock(*sIndirectLayerTreesLock);
>    if (mApzcTreeManager) {
>      mApzcTreeManager->AdoptLayersId(child, sIndirectLayerTrees[child].mParent->mApzcTreeManager.get());

The code in this function doesn't match what's in m-c. Does this need rebasing, or do you have some other local patches applied?
Attachment #8698127 - Flags: review?(bugmail.mozilla) → feedback+
(Assignee)

Comment 35

2 years ago
Created attachment 8709420 [details] [diff] [review]
1020199.patch

Sorry for the long delay, there was an Android-only test failure on try in gfx/layers/apz/test/reftest/async-scrollbar-1-h.html that looked to be caused by my patch, so I tried to figure it out. I haven't been able to reproduce it on try or in a local build though.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #34)
> @@ +1710,5 @@
> > +  ViewID viewId;
> > +  if (APZCCallbackHelper::GetOrCreateScrollIdentifiers(
> > +      document->GetDocumentElement(), &presShellId, &viewId)) {
> > +    RefPtr<TabChild> self = this;
> > +    NS_DispatchToMainThread(NS_NewRunnableFunction([self, presShellId, viewId, zoomToRect] () -> void {
> 
> Why does this need to be dispatched to the main thread? Is this function
> running on the main thread already? If not, it should be made to run on the
> main thread, because it does things like access the nsIDocument which are
> main-thread-only.

Dispatching on the main thread doesn't mean this code isn't running on the main thread already ;-). IIRC there was a situation were this would deadlock (sending an IPC message while in a sync Recv* function). I think I can just remove the dispatch now.

> I'm not sure if this actually a problem but I'm fine with this for now, but
> if we run into problems we might have to re-evaluate this bit. The simplest
> thing might be to make the TakeFocusForClickFromTap message sync instead of
> async.

I just made it sync.

> ::: gfx/layers/apz/public/GeckoContentController.h
> @@ +21,5 @@
> >  
> >  class GeckoContentController
> >  {
> >  public:
> > +  NS_INLINE_DECL_THREADSAFE_REFCOUNTING_WITH_MAIN_THREAD_DESTRUCTION(GeckoContentController)
> 
> It's not clear to me why this is needed - can you explain? If possible I'd
> like to avoid this change.

The RemoteContentController destructor has to send an IPC message on the main thread. There really isn't an alternative to guarantee that the destructor is called on the main thread in a refcounted class.

> The code in this function doesn't match what's in m-c. Does this need
> rebasing, or do you have some other local patches applied?

The code was changed a week after I posted the patch, it matched what was in the tree at the time (I just rebased again).
Attachment #8680192 - Attachment is obsolete: true
Attachment #8698127 - Attachment is obsolete: true
Attachment #8709420 - Flags: review?(bugmail.mozilla)
Comment on attachment 8709420 [details] [diff] [review]
1020199.patch

Review of attachment 8709420 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me. A few nits below, r+ with those addressed. However as it's a complex patch with lots of IPC stuff, I'd like dvander to also take a quick look, particularly at the protocol hierarchy changes and lifetimes to make sure there's no issues.

Also a heads-up that depending on who lands first, you may have to rebase around part 3 in bug 1231517 which modifies the signature of the ZoomToRect call in PBrowser.ipdl. Should be relatively straightforward I think.

::: dom/ipc/ContentParent.cpp
@@ +2062,5 @@
> +                                       const TabId& aTabId, uint64_t* aId)
> +{
> +  RefPtr<ContentParent> contentParent;
> +  RefPtr<TabParent> browserParent;
> +  if (gfxPlatform::AsyncPanZoomEnabled()) {

Can we run the code inside this unconditionally? It feels like the stricter checking would be good even without APZ enabled. Or, if this is just an optimization, add a comment that indicates that.

::: dom/ipc/TabChild.cpp
@@ +721,5 @@
> +                                    uint64_t aInputBlockId,
> +                                    bool aPreventDefault) const
> +{
> +  if (!mAPZChild) {
> +    return false;

I don't think we ever use the return value from this function (or SetTargetAPZC, or SetAllowedTouchBehavior) so these could return void. We can defer it to a follow-up bug, since this patch is already changing enough things :)

@@ +1889,3 @@
>  {
> +  if (aLayersId != mLayersId) {
> +    RefPtr<TabParent> browser = TabParent::GetTabParentFromLayersId(aLayersId);

Might be good to guard against a null browser, in case the layers id is bad. I don't know if there are race conditions in which it might happen, if the grandchild is destroyed while this message is inflight.

::: gfx/layers/ipc/APZChild.cpp
@@ +167,5 @@
> +{
> +  MOZ_ASSERT(!mBrowser);
> +  if (mObserver) {
> +    nsCOMPtr<nsIObserverService> os = services::GetObserverService();
> +    os->RemoveObserver(mObserver, "tab-child-created");

This should also set mObserver to nullptr otherwise it will be double-removed on destruction.

::: gfx/layers/ipc/RemoteContentController.cpp
@@ +5,5 @@
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#include "mozilla/layers/RemoteContentController.h"
> +#include "base/message_loop.h"

style nit: add a blank line after the first include

@@ +171,5 @@
> +bool
> +RemoteContentController::GetTouchSensitiveRegion(CSSRect* aOutRegion)
> +{
> +  if (mTouchSensitiveRegion.IsEmpty())
> +    return false;

braces

::: layout/ipc/RenderFrameParent.cpp
@@ +223,5 @@
>        ContentChild::GetSingleton()->SendDeallocateLayerTreeId(mLayersId);
>      } else {
>        CompositorParent::DeallocateLayerTreeId(mLayersId);
>      }
> +    // TODO: notify the compositor?

you can drop the TODO

::: layout/ipc/RenderFrameParent.h
@@ -96,3 @@
>    bool HitTest(const nsRect& aRect);
>  
>    void StartScrollbarDrag(const AsyncDragMetrics& aDragMetrics);

This StartScrollbarDrag declaration needs to be removed
Attachment #8709420 - Flags: review?(bugmail.mozilla)
Attachment #8709420 - Flags: review+
Attachment #8709420 - Flags: feedback?(dvander)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #36)
> Comment on attachment 8709420 [details] [diff] [review]
> 1020199.patch
> 
> Review of attachment 8709420 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Also a heads-up that depending on who lands first, you may have to rebase
> around part 3 in bug 1231517 which modifies the signature of the ZoomToRect
> call in PBrowser.ipdl. Should be relatively straightforward I think.
> 

I pushed bug 1231517 to inbound, assuming it sticks, patch will need to be rebased.
(Assignee)

Comment 38

2 years ago
Created attachment 8713286 [details] [diff] [review]
1020199.patch

Rebased and took care of the review comments. Carrying over r+.
Attachment #8709420 - Attachment is obsolete: true
Attachment #8709420 - Flags: feedback?(dvander)
Attachment #8713286 - Flags: review+
Attachment #8713286 - Flags: feedback?(dvander)
Comment on attachment 8713286 [details] [diff] [review]
1020199.patch

Review of attachment 8713286 [details] [diff] [review]:
-----------------------------------------------------------------

The concept behind this looks fine, but I'd like to see more comments about the general design somewhere (preferably in one of the files implementing the protocol, or in ipdl). It's also hard to tell which thread some of these functions are running on, especially in the controller.

For example, CompositorParent receives messages on the compositor thread, but CompositorParent::UpdateRemoteContentController access the ContentParent's channel which is only safe on the main thread. Functions like this should have a comment or assertion that they're being called from the correct thread.

I'm also a little worried about this:

>   CompositorParent::RecvAdoptChild(const uint64_t& child)
>   {
>     MonitorAutoLock lock(*sIndirectLayerTreesLock);
>  +  if (mApzcTreeManager && sIndirectLayerTrees[child].mController) {
>  +    sIndirectLayerTrees[child].mController->ChildAdopted();
>  +  }

ChildAdopted will be called from the compositor thread, and it looks like this:

>  +void
>  +RemoteContentController::ChildAdopted()
>  +{
>  +  // Clear the cached APZCTreeManager.
>  +  mApzcTreeManager = nullptr;
>  +}

RemoteContentController::GetApzcTreeManager both reads and writes to mApzcTreeManager on the main thread, so this is not safe. Even though we have thread-safe refcounting, RefPtr<> itself is not atomic, so races could double-free, return dangling pointers, or leak. (Just holding the compositor lock for all of GetApzcTreeManager might be good enough.)
Attachment #8713286 - Flags: feedback?(dvander) → feedback-
(Assignee)

Comment 40

2 years ago
Created attachment 8717938 [details] [diff] [review]
1020199.patch

(In reply to David Anderson [:dvander] from comment #39)
> The concept behind this looks fine, but I'd like to see more comments about
> the general design somewhere (preferably in one of the files implementing
> the protocol, or in ipdl). It's also hard to tell which thread some of these
> functions are running on, especially in the controller.

I've added some comments and assertions, let me know if there's anything more that you want.

> For example, CompositorParent receives messages on the compositor thread,
> but CompositorParent::UpdateRemoteContentController access the
> ContentParent's channel which is only safe on the main thread. Functions
> like this should have a comment or assertion that they're being called from
> the correct thread.

I've added an assertion to UpdateRemoteContentController, although both the RemoteContentController and SendPAPZConstructor would have asserted already (and that's really where the main-thread-only stuff happens).

> RemoteContentController::GetApzcTreeManager both reads and writes to
> mApzcTreeManager on the main thread, so this is not safe. Even though we
> have thread-safe refcounting, RefPtr<> itself is not atomic, so races could
> double-free, return dangling pointers, or leak. (Just holding the compositor
> lock for all of GetApzcTreeManager might be good enough.)

I made callers of GetApzcTreeManager hold the lock. Not too happy about exposing the lock outside CompositorParent, but the alternative seems to be to pass the cache variable into GetApzcTreeManager (which would then just return the cached value in most cases) and I don't really like that either. Let me know what you think.
Attachment #8713286 - Attachment is obsolete: true
Attachment #8717938 - Flags: review?(dvander)
(Assignee)

Comment 41

2 years ago
Created attachment 8717940 [details] [diff] [review]
Interdiff

This is a diff of just the comment and assertion changes.
(Assignee)

Comment 42

2 years ago
While going over the patch again, I noticed that I didn't add back the guarding for bad layers ids in RecvContentReceivedInputBlock and RecvSetTargetAPZC. We probably need those even in the new setup, right?
Flags: needinfo?(bugmail.mozilla)
(In reply to Peter Van der Beken [:peterv] from comment #40)
> > RemoteContentController::GetApzcTreeManager both reads and writes to
> > mApzcTreeManager on the main thread, so this is not safe. Even though we
> > have thread-safe refcounting, RefPtr<> itself is not atomic, so races could
> > double-free, return dangling pointers, or leak. (Just holding the compositor
> > lock for all of GetApzcTreeManager might be good enough.)
> 
> I made callers of GetApzcTreeManager hold the lock. Not too happy about
> exposing the lock outside CompositorParent, but the alternative seems to be
> to pass the cache variable into GetApzcTreeManager (which would then just
> return the cached value in most cases) and I don't really like that either.
> Let me know what you think.

I don't really like this - we should be able to just have a lock as a field in RemoteContentController, that protects both RemoteContentController::GetApzcTreeManager and RemoteContentController::ChildAdopted. Would that work?


(In reply to Peter Van der Beken [:peterv] from comment #42)
> While going over the patch again, I noticed that I didn't add back the
> guarding for bad layers ids in RecvContentReceivedInputBlock and
> RecvSetTargetAPZC. We probably need those even in the new setup, right?

Yeah, it would be good to check that to detect corrupted/malicious child processes. It's not so important in the RecvContentReceivedInputBlock case since we don't actually use the guid, but in the RecvSetTargetAPZC case we definitely should.
Flags: needinfo?(bugmail.mozilla)
(Assignee)

Comment 44

2 years ago
Comment on attachment 8717938 [details] [diff] [review]
1020199.patch

Review of attachment 8717938 [details] [diff] [review]:
-----------------------------------------------------------------

New patch coming up.
Attachment #8717938 - Flags: review?(dvander)
(Assignee)

Comment 45

2 years ago
Turns out we need to protect mTouchSensitiveRegion with a lock too, since SetTouchSensitiveRegion and GetTouchSensitiveRegion are called form different threads (this is an existing bug).

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #43)
> I don't really like this - we should be able to just have a lock as a field
> in RemoteContentController, that protects both
> RemoteContentController::GetApzcTreeManager and
> RemoteContentController::ChildAdopted. Would that work?

I tried to have a lock in RemoteContentController to protect both mTouchSensitiveRegion and mApzcTreeManager. The issue is that RemoteContentController::GetApzcTreeManager acquires the RemoteContentController lock before calling CompositorParent::GetAPZCTreeManager, which acquires the sIndirectLayerTreesLock lock. And CompositorParent::RecvAdoptChild acquires the sIndirectLayerTreesLock lock before calling the RemoteContentController::ChildAdopted, which acquires the RemoteContentController lock. So we have a potential deadlock situation :-(.
We could probably let go of the sIndirectLayerTreesLock once we have the reference to the RemoteContentController, but before we call ChildAdopted on it. As long as you keep the RemoteContentController reference in a RefPtr<> it should be fine.
(Assignee)

Comment 47

2 years ago
Created attachment 8718804 [details] [diff] [review]
1020199.patch
Attachment #8717938 - Attachment is obsolete: true
Attachment #8717940 - Attachment is obsolete: true
Attachment #8718804 - Flags: review?(bugmail.mozilla)
(Assignee)

Comment 48

2 years ago
Created attachment 8718805 [details] [diff] [review]
Changes for review comments

These are just the locking, comment and thread assertion changes.
(Assignee)

Updated

2 years ago
Attachment #8718804 - Flags: review?(dvander)
Comment on attachment 8718804 [details] [diff] [review]
1020199.patch

Review of attachment 8718804 [details] [diff] [review]:
-----------------------------------------------------------------

I just reviewed the new bits, looks good.

::: gfx/layers/ipc/CompositorParent.h
@@ +369,5 @@
>    /**
>     * This returns a reference to the APZCTreeManager to which
> +   * pan/zoom-related events can be sent. The callers of this function need to
> +   * hold the sIndirectLayerTreesLock lock (which they can get by calling
> +   * IndirectLayerTreesLock()).

The IndirectLayerTreesLock function doesn't exist any more, comment needs updating

::: gfx/layers/ipc/RemoteContentController.h
@@ +116,5 @@
> +  uint64_t mLayersId;
> +  RefPtr<dom::TabParent> mBrowserParent;
> +
> +  // Monitor protecting members below accessed from multiple threads.
> +  mozilla::Monitor mMonitor;

You can use a mozilla::Mutex instead here, and s/MonitorAutoLock/MutexAutoLock/. We made that change in some APZ code recently as well (bug 1241991) since Mutex takes up less space.
Attachment #8718804 - Flags: review?(bugmail.mozilla) → review+
Comment on attachment 8718804 [details] [diff] [review]
1020199.patch

Review of attachment 8718804 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/ipc/RemoteContentController.cpp
@@ +375,5 @@
> +  MonitorAutoLock lock(mMonitor);
> +  if (!mApzcTreeManager) {
> +    mApzcTreeManager = CompositorParent::GetAPZCTreeManager(mLayersId);
> +  }
> +  return mApzcTreeManager;

There's still a subtle race here: when the lock is released on the main thread, but before the return value is assigned in its caller, the refcount of mApzcTreeManager could be 1. The compositor thread could jump in, set the pointer to null, and the reference count would drop to 0. GetApzcTreeManager() has now returns a dead pointer.

Just changing the return value to RefPtr<APZCTreeManager> should eliminate the risk, so r=me with that.

Thanks for adding comments!
Attachment #8718804 - Flags: review?(dvander) → review+

Comment 52

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/9e71a38057d1
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox47: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Depends on: 1249280
You need to log in before you can comment on or make changes to this bug.