Closed Bug 879475 Opened 11 years ago Closed 10 years ago

Allow nested oop <iframe mozbrowser> without nested content processes

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: justin.lebar+bug, Assigned: kanru)

References

(Blocks 1 open bug)

Details

Attachments

(12 files, 44 obsolete files)

17.62 KB, patch
justin.lebar+bug
: review+
dzbarsky
: checkin+
Details | Diff | Splinter Review
7.41 KB, patch
justin.lebar+bug
: review+
dzbarsky
: checkin+
Details | Diff | Splinter Review
3.31 KB, patch
justin.lebar+bug
: review+
dzbarsky
: checkin+
Details | Diff | Splinter Review
19.08 KB, patch
justin.lebar+bug
: review+
dzbarsky
: checkin+
Details | Diff | Splinter Review
9.06 KB, patch
justin.lebar+bug
: review+
dzbarsky
: checkin+
Details | Diff | Splinter Review
60.22 KB, patch
Details | Diff | Splinter Review
38.13 KB, patch
Details | Diff | Splinter Review
21.93 KB, patch
Details | Diff | Splinter Review
23.44 KB, patch
Details | Diff | Splinter Review
67.16 KB, patch
Details | Diff | Splinter Review
55.76 KB, patch
Details | Diff | Splinter Review
6.95 KB, patch
Details | Diff | Splinter Review
+++ This bug was initially created as a clone of Bug #761935 +++

B2G is multi-process, but the hierarchy of processes is depth two: Only the root process can have child processes.

This creates a problem for us on B2G with the browser app.  We'd like the browser app to run outside the root process, but we'd also like the browser app's tabs to run outside the browser app process (so that when a tab crashes e.g. due to OOM, it doesn't take down the whole browser).

Right now we have to compromise.  We can either run the browser app in the root process and run the browser tabs in their own processes, or we can run the browser app outside the root process and run the browser tabs inside the same process as the browser app.  We chose the former.

Bug 761935 is about one way to fix this problem: We could allow child processes to have their own children.

But I think this would be a lot of work.  One problem is that because an arbitrary-depth hierarchy of processes doesn't exist right now, I expect that most of our APIs won't work properly if we have nested content processes.  Many of them explicitly don't even try to work correctly under these circumstances.  We'd have to fix up all of these APIs and then get the graphics stack to work with nested content processes (I'm not at all familiar with the graphics stack, so I don't know how easy or hard this would be).

I want to propose in this bug an alternative and perhaps simpler way of accomplishing our goal of moving the browser app out of the main process.  Hopefully you can tell me whether this is a bad idea or not.

It turns out that a child-of-a-child doesn't need to talk to its parent very often.  Most of what it needs (e.g. network access, permission checks, hardware access) comes from the main process.  So what if we made our browser tab inside the browser process a child of the main process (*)?  We'd then fix up the few cases when the process wants to talk to its now-sibling.

We'd have to fix up mozbrowser, but I don't think this would be particularly complicated.  We'd just funnel communication between the two sibling processes through the main process.

We'd also have to fix up some stuff in graphics so that the stack would know where to draw the inner mozbrowser.  I don't know how complicated this would be, but it doesn't strike me as out of the question.

What do you guys think of this plan?  Have I misunderestimated something here, or does this sound like a good approach to try?

dzbarsky has said he might be interested in doing this work, but we need to figure out if it's of appropriate scope.

(*) This isn't an original idea, but I don't remember where I heard it.  Sorry to whomever I'm plagiarizing!
Flags: needinfo?(bent.mozilla)
Flags: needinfo?(bzbarsky)
Flags: needinfo?(jduell.mcbugs)
Flags: needinfo?(jonas)
Flags: needinfo?(bugs)
Flags: needinfo?(nical.bugzilla)
I like this idea :)
Flags: needinfo?(bent.mozilla)
Seems reasonable to me, modulo the graphics concerns.
Flags: needinfo?(bzbarsky)
Yes! We should absolutely do this! Like you say, in many cases we don't have a need to go through the intermediary process but it's better to talk to the parent process directly.

There are some cases in the browser API where we need to talk between the child and the grandchild, but I suspect that that communication is low-bandwidth enough that going through the parent process is fine. We can always set up a direct IPC connection between the child and the grandchild later if needed.
I like.
Communication between siblings needs to be asynchronous, or there must be one master sibling
to which other siblings can send synchronous messages, but the master can send only async messages.
Or master send sync and async, and others only async.
Flags: needinfo?(bugs)
Sounds fine from a necko perspective. (I assume we'd still have TabParent/Child to do security checks, or something equivalent).  Only thing that is likely to need changes would be popup windows for auth.

One benefit of this would be that it would allow us to start enforcing IsInBrowserElement (right now we have to trust child process's description for that, which potentially allows rogue tabs access to browser's cookies/cache/appcache/etc).
Flags: needinfo?(jduell.mcbugs)
FWIW I believe it's also fine for all the {cache|cookie|etc|}jar code if the architecture here is to have a different appID for the tab process than the browser app.
(In reply to Jason Duell (:jduell) from comment #6)
> FWIW I believe it's also fine for all the {cache|cookie|etc|}jar code if the
> architecture here is to have a different appID for the tab process than the
> browser app.

I think we'd have the same app-id + different is-browser flag for the browser processes, just like we have now.
OK, good.  I realized one case where we do rely on app id being same for browser and tab: deletion notifications, where IIRC we do get a single appId then do two deletes: (appId, notInBrowser), (appId, inbrowser).
So what we do currently, is the RenderFrameParent (in the main proces) builds a 'RefLayer' as a leaf layer in the layer tree.

This serves as an attachment point for the layer sub-tree from the child process, and also defines the positioning.

The child process builds its own layer (relative to itself), and both processes separately shadow these layer trees to the compositor.

The compositor then attaches the child process layer trees to the RefLayer attachment points in the main layer tree.

So I think this should still work. The browser process would still need a RenderFrameParent (or equivalent) to build RefLayers to attach the layers for the tab process. We'd need to modify the RefLayer attaching code to also walk through sub-trees looking for nested RefLayers. I don't think layer positioning should be a problem.
sorry for the delay,
I think it's a good idea and should not be too difficult from a layers point of view as Matt said. We already do something somewhat similar to attach the async-video stuff to transfer textures that are going through a different IPDL protocol than layer transactions.
Flags: needinfo?(nical.bugzilla)
Depends on: 885653
I'm not really happy with this patch, but the idea is to fix 

            case PContentBridgeMsgStart:
                {
                    Transport* t;
                    if ((!(t = mozilla::ipc::OpenDescriptor(td, mozilla::ipc::Transport::MODE_SERVER)))) {
                        return MsgValueError;
                    }
                    if ((!(AllocPContentBridgeParent(t, pid)))) {
                        return MsgProcessingError;
                    }
                    break;
                }
            case PContentBridgeMsgStart:
                {
                    Transport* t;
                    if ((!(t = mozilla::ipc::OpenDescriptor(td, mozilla::ipc::Transport::MODE_CLIENT)))) {
                        return MsgValueError;
                    }
                    if ((!(AllocPContentBridgeChild(t, pid)))) {
                        return MsgProcessingError;
                    }
                    break;
                }
Attachment #766385 - Flags: review?(justin.lebar+bug)
We need this because PBrowser can be managed by either PContent or PContentBridge, so we can't rely on IPDL to generate the right thing for us.
Attachment #770015 - Flags: review?(justin.lebar+bug)
This will be shared by ContentParent and ContentBridgeParent.
Attachment #770521 - Flags: review?(justin.lebar+bug)
Attached patch Part 6: Add nsIContentChild api (obsolete) — Splinter Review
Attachment #770522 - Flags: review?(justin.lebar+bug)
Attachment #770553 - Flags: review?(bent.mozilla)
Don't notice this bug before!  I hope it would not be late.  I think we can make two content processes talk to each other directly without a bridge at the b2g process by passing a socketpair to parent and child content processes.
(In reply to Thinker Li [:sinker] from comment #19)
> Don't notice this bug before!  I hope it would not be late.  I think we can
> make two content processes talk to each other directly without a bridge at
> the b2g process by passing a socketpair to parent and child content
> processes.

That's what I'm doing.  The main process sets up the bridge when it creates the child and then the content processes talk directly to each other.
(In reply to David Zbarsky (:dzbarsky) from comment #20)
> That's what I'm doing.  The main process sets up the bridge when it creates
> the child and then the content processes talk directly to each other.
This is cool!
Comment on attachment 766385 [details] [diff] [review]
Part 1: Rename the MsgStart messages to MsgStartChild for child protocols

Review of attachment 766385 [details] [diff] [review]:
-----------------------------------------------------------------

Could you explain why we need this changes? As I know, Bridge() will send a channel opened message to both processes, and both them known each other.  They can tell parent or child actor that should be created.  You can also change ChannelOpened() to indicate which side of actors that message receiver should create.
Comment on attachment 770016 [details] [diff] [review]
Part 4: Make PBlob manually keep track of its manager

Review of attachment 770016 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ContentChild.cpp
@@ +696,3 @@
>    NS_ENSURE_TRUE(actor, nullptr);
>  
>    if (!SendPBlobConstructor(actor, params)) {

I think this function will set the value of mManager for you.  So, you don't need to pass the pointer of mManager around.
Attachment #770928 - Flags: feedback?(justin.lebar+bug)
Attachment #770928 - Flags: feedback?(tlee)
(In reply to Thinker Li [:sinker] from comment #22)
> Comment on attachment 766385 [details] [diff] [review]
> Part 1: Rename the MsgStart messages to MsgStartChild for child protocols
> 
> Review of attachment 766385 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Could you explain why we need this changes? As I know, Bridge() will send a
> channel opened message to both processes, and both them known each other. 
> They can tell parent or child actor that should be created.  You can also
> change ChannelOpened() to indicate which side of actors that message
> receiver should create.

Both sides of the bridge are opened by ContentChild, so it needs to be able to distinguish the case statements.  See comment 12.(In reply to Thinker Li [:sinker] from comment #23)

> Comment on attachment 770016 [details] [diff] [review]
> Part 4: Make PBlob manually keep track of its manager
> 
> Review of attachment 770016 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/ipc/ContentChild.cpp
> @@ +696,3 @@
> >    NS_ENSURE_TRUE(actor, nullptr);
> >  
> >    if (!SendPBlobConstructor(actor, params)) {
> 
> I think this function will set the value of mManager for you.  So, you don't
> need to pass the pointer of mManager around.

It will, but I need to keep track of mManager so that Manager() will continue to work when Blobs can be managed by either PContent or PContentBridge.  Hopefully the WIP Part (n) makes things clearer.
We were already doing this on the parent side
Attachment #771043 - Flags: review?(bent.mozilla)
(In reply to David Zbarsky (:dzbarsky) from comment #25)
> Both sides of the bridge are opened by ContentChild, so it needs to be able
> to distinguish the case statements.
The channel opened message would be passed with the PID of the peer.  Is the PID enough for distinguish?
Comment on attachment 771132 [details] [diff] [review]
Part 10: Allow nested remote mozbrowsers to push layer transactions to the compositor

Review of attachment 771132 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/composite/AsyncCompositionManager.cpp
@@ +67,5 @@
>            }
>          } else {
>            ref->DetachReferentLayer(referent);
>            referent->SetAsyncPanZoomController(nullptr);
> +          WalkTheTree<OP>(referent, aReady, aTargetConfig);

Don't we need to do this for OP == Resolve too?
Attachment #771132 - Flags: review?(matt.woodrow) → review+
(In reply to Matt Woodrow (:mattwoodrow) from comment #30)
> Comment on attachment 771132 [details] [diff] [review]
> Part 10: Allow nested remote mozbrowsers to push layer transactions to the
> compositor
> 
> Review of attachment 771132 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/layers/composite/AsyncCompositionManager.cpp
> @@ +67,5 @@
> >            }
> >          } else {
> >            ref->DetachReferentLayer(referent);
> >            referent->SetAsyncPanZoomController(nullptr);
> > +          WalkTheTree<OP>(referent, aReady, aTargetConfig);
> 
> Don't we need to do this for OP == Resolve too?

When you call ConnectReferentLayer, you set the subtree as the child of the RefLayer.  So when you walk the children at the end of the function, you walk the subtree.  We only need to call into the subtree in the detach case because we remove it as a child.
Depends on: 890570
Attachment #766336 - Flags: review?(justin.lebar+bug) → review+
Attachment #769842 - Flags: review?(justin.lebar+bug) → review+
Comment on attachment 770015 [details] [diff] [review]
Part 3: Make PBrowser manually keep track of its manager

I was concerned that we're just hiding the existing PTab{Child,Parent}::Manager() functions, but dzbarsky said that we're going to get rid of these in a later patch.

This is nice; it cleans up the code quite a bit now that we don't have these static_cast's.
Attachment #770015 - Flags: review?(justin.lebar+bug) → review+
Attachment #772868 - Flags: feedback?(bugmail.mozilla)
Comment on attachment 770016 [details] [diff] [review]
Part 4: Make PBlob manually keep track of its manager

r=me, with the same caveat as the first paragraph of comment 32.
Attachment #770016 - Flags: review?(justin.lebar+bug) → review+
Attachment #774244 - Flags: review?(justin.lebar+bug)
Attached patch Part 6: Add nsIContentChild api (obsolete) — Splinter Review
Attachment #770521 - Attachment is obsolete: true
Attachment #770522 - Attachment is obsolete: true
Attachment #770521 - Flags: review?(justin.lebar+bug)
Attachment #770522 - Flags: review?(justin.lebar+bug)
Attachment #774246 - Flags: review?(justin.lebar+bug)
Comment on attachment 772868 [details] [diff] [review]
Rollup patch based off changeset 137652:bb4f5079e009

Review of attachment 772868 [details] [diff] [review]:
-----------------------------------------------------------------

I tried this out on a device and it worked as expected. After thinking about it some and discussing with smaug I don't believe that this will make things any worse than they already are in terms of input events, so I have no objections to this landing. We'll still need to clean up how input events work at some point.
Attachment #772868 - Flags: feedback?(bugmail.mozilla) → feedback+
Attached patch Part 11: Implement ContentBridge (obsolete) — Splinter Review
Attachment #776707 - Flags: review?(justin.lebar+bug)
Attachment #770928 - Attachment is obsolete: true
Attachment #770928 - Flags: feedback?(tlee)
Attachment #770928 - Flags: feedback?(justin.lebar+bug)
Attachment #776777 - Flags: review?(justin.lebar+bug)
Some general comments:

* It would be helpful if you added some documentation to the new classes you're
  adding indicating what they do.  It doesn't have to be a lot, but a few
  sentences in a header goes a long way.

* Maybe we should think a bit about the names we're using here.

  nsIContentParent probably shouldn't have the "ns" prefix, because it's in a
  C++ namespace.

  But is "IContentParent", "ContentParent", and "ContentBridgeParent" the best
  we can do?  It's not obvious from the name that IContentParent is a common
  interface for ContentParent and ContentBridgeParent, and it's not obvious
  from the names that ContentBridgeParent is for "sibling-to-sibling"
  communication.

  Supposing you could choose any names and not have to deal with the annoyance
  of renaming everything in the tree, what would you pick?
Attachment #776777 - Flags: review?(justin.lebar+bug) → review+
Comment on attachment 776710 [details] [diff] [review]
Part 12: Fix the places where we cast nsIContentParent to ContentParent

>diff --git a/dom/ipc/TabParent.cpp b/dom/ipc/TabParent.cpp
>index 8fb4235..aff1f4b 100644
>--- a/dom/ipc/TabParent.cpp
>+++ b/dom/ipc/TabParent.cpp
>@@ -273,26 +273,28 @@ TabParent::Destroy()
>     static_cast<IndexedDBParent*>(idbParents[i])->Disconnect();
>   }
> 
>   if (RenderFrameParent* frame = GetRenderFrame()) {
>     frame->Destroy();
>   }
>   mIsDestroyed = true;
> 
>-  // XXXdz what if manager is not ContentParent
>-  static_cast<ContentParent*>(Manager())->NotifyTabDestroying(this);
>+  if (XRE_GetProcessType() == GeckoProcessType_Default) {
>+    static_cast<ContentParent*>(Manager())->NotifyTabDestroying(this);
>+  }
>   mMarkedDestroying = true;
> }
> 
> bool
> TabParent::Recv__delete__()
> {
>-  // XXXdz what if manager is not ContentParent
>-  static_cast<ContentParent*>(Manager())->NotifyTabDestroyed(this, mMarkedDestroying);
>+  if (XRE_GetProcessType() == GeckoProcessType_Default) {
>+    static_cast<ContentParent*>(Manager())->NotifyTabDestroyed(this, mMarkedDestroying);
>+  }
>   return true;
> }
>
Would it be worthwhile to have a getter in TabParent that encapsulates this?

Or even better, what if we had a method on nsIContentParent to tell us whether
it's a ContentParent or a ContentParentBridge, and another one to cast down to
ContentParent* or ContentBridgeParent*, as appropriate?  You're not going to
screw up these static_cast's, but I bet someone else will.
Attachment #776710 - Flags: review?(justin.lebar+bug)
Comment on attachment 777469 [details] [diff] [review]
Add an nsIContentParent api r=jlebar

These are the patches that are in my branch, I just hadn't updated the attachments after merging to the CPOW stuff.  Hopefully this doesn't mess up reviewing too much for you.
Attachment #777469 - Flags: review?(justin.lebar+bug)
Attachment #777470 - Flags: review?(justin.lebar+bug)
Attachment #777471 - Flags: review?(justin.lebar+bug)
Attachment #774244 - Attachment is obsolete: true
Attachment #774244 - Flags: review?(justin.lebar+bug)
Attachment #774246 - Attachment is obsolete: true
Attachment #774246 - Flags: review?(justin.lebar+bug)
Attachment #776707 - Attachment is obsolete: true
Attachment #776707 - Flags: review?(justin.lebar+bug)
> Hopefully this doesn't mess up reviewing too much for you.

No, I'm just going off what's in your branch; that's easier for me, since I can search between patches and so on.

In case anyone else is interested, the branch in question is https://github.com/dzbarsky/mozilla-central/tree/nestedprocesses.
(In reply to Justin Lebar [:jlebar] from comment #46)
> Comment on attachment 776710 [details] [diff] [review]
> Part 12: Fix the places where we cast nsIContentParent to ContentParent
> 
> 
> Or even better, what if we had a method on nsIContentParent to tell us
> whether
> it's a ContentParent or a ContentParentBridge, and another one to cast down
> to
> ContentParent* or ContentBridgeParent*, as appropriate?  You're not going to
> screw up these static_cast's, but I bet someone else will.

I can add AsContentParent().  That's the only one we need.  In general though, you shouldn't need to downcast nsIContentParent.(In reply to Justin Lebar [:jlebar] from comment #45)

> * It would be helpful if you added some documentation to the new classes
> you're
>   adding indicating what they do.  It doesn't have to be a lot, but a few
>   sentences in a header goes a long way.
> 

Sure, I'll add some explanations.

> * Maybe we should think a bit about the names we're using here.
> 

I haven't come up with anything great yet.  I don't think we want things like GrandchildProcessParent.
(In reply to David Zbarsky (:dzbarsky) from comment #52)
> (In reply to Justin Lebar [:jlebar] from comment #46)

> 
> I haven't come up with anything great yet.  I don't think we want things
> like GrandchildProcessParent.


How about:
ContentParent -> ProcessParent
ContentBridgeParent -> SiblingBridgeParent
nsIContentParent -> ProcessOpenerParent or something
Some of these names are already taken, but they're taken by unimportant things that we could rename.

I think maybe

  ContentParent -> ProcessParent
  ContentBridgeParent -> SiblingProcessParent
  nsIContentParent -> IProcessParent

might work, but the proof will be doing the replace and looking at it in context.
Drive-by nit: the "*Content*" names were chosen because there are other subprocess things in gecko.  The things here are the "content" (agree that's not a great name) subprocess things.  Something overly general like "ProcessParent" is a step backwards IMHO.
Comment on attachment 777471 [details] [diff] [review]
Implement ContentBridge r=jlebar

I'm not done reviewing this, but I wanted to give you some feedback now, particularly since I've seen enough to give r-.

When you revise this patch, would you mind doing it as a new commit, so your
branch has an interdiff built in?  The easiest thing for me would be if the new
commit was placed immediately after this commit.  And then that way, when you
go to push, you can squash all of the fixup commits.

>diff --git a/dom/ipc/PContentBridge.ipdl b/dom/ipc/PContentBridge.ipdl
>new file mode 100644
>index 0000000..1ee3098
>--- /dev/null
>+++ b/dom/ipc/PContentBridge.ipdl

>+rpc protocol PContentBridge

Please add a (brief) comment explaining what this thing is.

>+{
>+    bridges PContent, PContent;
>+
>+    manages PBlob;
>+    manages PBrowser;
>+    manages PJavaScript;
>+
>+parent:
>+    sync SyncMessage(nsString aMessage, ClonedMessageData aData, CpowEntry[] aCpows)
>+      returns (nsString[] retval);
>+both:
>+    // Depending on exactly how the new browser is being created, it might be
>+    // created from either the child or parent process!
>+    //
>+    // The child creates the PBrowser as part of
>+    // TabChild::BrowserFrameProvideWindow (which happens when the child's
>+    // content calls window.open()), and the parent creates the PBrowser as part
>+    // of ContentParent::CreateTab.
>+    //
>+    // When the parent constructs a PBrowser, the child trusts the app token it
>+    // receives from the parent.  In that case, context can be any of the
>+    // IPCTabContext subtypes.
>+    //
>+    // When the child constructs a PBrowser, the parent doesn't trust the app
>+    // token it receives from the child.  In this case, context must have type
>+    // PopupIPCTabContext.  The browser created using a PopupIPCTabContext has
>+    // the opener PBrowser's app-id and containing-app-id.  The parent checks
>+    // that if the opener is a browser element, the context is also for a
>+    // browser element.
>+    //
>+    // This allows the parent to prevent a malicious child from escalating its
>+    // privileges by requesting a PBrowser corresponding to a highly-privileged
>+    // app; the child can only request privileges for an app which the child has
>+    // access to (in the form of a TabChild).
>+    async PBrowser(IPCTabContext context, uint32_t chromeFlags);

It would probably be better to put a pointer to the comment on
PContent::PBrowser(), as opposed to copy-pasting it.

>diff --git a/content/base/src/nsFrameLoader.cpp b/content/base/src/nsFrameLoader.cpp
>index cca3f85..7bc4e1c 100644
>--- a/content/base/src/nsFrameLoader.cpp
>+++ b/content/base/src/nsFrameLoader.cpp

>   nsCOMPtr<nsIDOMElement> ownerElement = do_QueryInterface(mOwnerContent);
>-  mRemoteBrowser = ContentParent::CreateBrowserOrApp(context, ownerElement);
>+  if (XRE_GetProcessType() == GeckoProcessType_Default) {
>+    mRemoteBrowser = ContentParent::CreateBrowserOrApp(context, ownerElement);
>+  } else {
>+    mRemoteBrowser = ContentParent::CreateTab(context, ownerElement);
>+  }

We need to figure out the nomenclature here; it's not at all clear what you
mean by "CreateTab", especially since the comment in ContentParent is identical
to the comment for CreateBrowserOrApp.  :)

Maybe it would be better to move the GetProcessType() check into ContentParent,
so there's one less footgun in the public ContentParent API.

>diff --git a/dom/ipc/ContentParent.h b/dom/ipc/ContentParent.h
>index 031bd15..871db2f 100644
>--- a/dom/ipc/ContentParent.h
>+++ b/dom/ipc/ContentParent.h

>     /**
>      * Get or create a content process for the given TabContext.  aFrameElement
>      * should be the frame/iframe element with which this process will
>      * associated.
>      */
>     static TabParent*
>     CreateBrowserOrApp(const TabContext& aContext,
>                        nsIDOMElement* aFrameElement);
> 
>+    /**
>+     * Get or create a content process for the given TabContext.  aFrameElement
>+     * should be the frame/iframe element with which this process will
>+     * associated.
>+     */
>+    static TabParent*
>+    CreateTab(const TabContext& aContext,
>+              nsIDOMElement* aFrameElement);

This either needs a different comment, or it needs to be merged with
CreateBrowserOrApp.  (The latter seems more sensible to me, offhand.)

>diff --git a/dom/ipc/ContentParent.cpp b/dom/ipc/ContentParent.cpp
>index 019d148..6b8bf03 100644
>--- a/dom/ipc/ContentParent.cpp
>+++ b/dom/ipc/ContentParent.cpp

>@@ -324,34 +326,37 @@ ContentParent::JoinAllSubprocesses()
> /*static*/ already_AddRefed<ContentParent>
>-ContentParent::GetNewOrUsed(bool aForBrowserElement)
>+ContentParent::GetNewOrUsed(bool aForBrowserElement, ContentParent* aOpener)
> {
>     if (!sNonAppContentParents)
>         sNonAppContentParents = new nsTArray<ContentParent*>();
> 
>     int32_t maxContentProcesses = Preferences::GetInt("dom.ipc.processCount", 1);
>     if (maxContentProcesses < 1)
>         maxContentProcesses = 1;
> 
>     if (sNonAppContentParents->Length() >= uint32_t(maxContentProcesses)) {
>         uint32_t idx = rand() % sNonAppContentParents->Length();
>         nsRefPtr<ContentParent> p = (*sNonAppContentParents)[idx];
>         NS_ASSERTION(p->IsAlive(), "Non-alive contentparent in sNonAppContentParents?");
>-        return p.forget();
>+        if (p->mOpener == aOpener) {
>+            return p.forget();
>+        }
>     }
> 
>     nsRefPtr<ContentParent> p =
>         new ContentParent(/* app = */ nullptr,
>+                          aOpener,
>                           aForBrowserElement,
>                           /* isForPreallocated = */ false,
>                           base::PRIVILEGES_DEFAULT,
>                           PROCESS_PRIORITY_FOREGROUND);
>     p->Init();
>     sNonAppContentParents->AppendElement(p);
>     return p.forget();
> }

This doesn't seem quite right.

The purpose of this code is to put a cap on the number of non-app content
processes.  If we already are at the limit of such processes and we try to
create a new non-app content process, we randomly pick one of the existing
processes.

I agree that we need some check like what you're added here, but as written
this lets us exceed the cap.

>@@ -484,16 +489,17 @@ ContentParent::CreateBrowserOrApp(const TabContext& aContext,
> 
>     if (!p) {
>         ChildPrivileges privs = PrivilegesForApp(ownApp);
>         p = MaybeTakePreallocatedAppProcess(manifestURL, privs,
>                                             initialPriority);
>         if (!p) {
>             NS_WARNING("Unable to use pre-allocated app process");
>             p = new ContentParent(ownApp,
>+                                  /* aOpener = */ nullptr,
>                                   /* isForBrowserElement = */ false,
>                                   /* isForPreallocated = */ false,
>                                   privs,
>                                   initialPriority);
>             p->Init();
>         }
>         sAppContentParents->Put(manifestURL, p);
>     }

>@@ -505,16 +511,91 @@ ContentParent::CreateBrowserOrApp(const TabContext& aContext,
>         aContext.AsIPCTabContext(),
>         /* chromeFlags */ 0);
> 
>     p->MaybeTakeCPUWakeLock(aFrameElement);
> 
>     return static_cast<TabParent*>(browser);
> }
> 
>+typedef std::map<ContentParent*, std::set<ContentParent*> > GrandchildMap;
>+static GrandchildMap sGrandchildProcessMap;

You need to remove |this| from the map (as both a key and a value!) in
MarkAsDead().

I have no idea what our rule is on using std::map in Gecko.  Can you ask
around, unless you're sure it's OK?  I don't have a problem with it on
principal.

>+bool
>+ContentParent::RecvCreateChildProcess(uint64_t* id)
>+{
>+    nsRefPtr<ContentParent> cp = GetNewOrUsed(/* isBrowserElement = */ true, this);
>+    *id = reinterpret_cast<uint64_t>(cp.get());

Using a pointer as an ID is bad, bad, bad, because malloc can reuse pointers.

We already have an ID on ContentParent; can we use that?

>+    GrandchildMap::iterator iter = sGrandchildProcessMap.find(this);
>+    if (iter == sGrandchildProcessMap.end()) {
>+        std::set<ContentParent*> children;
>+        children.insert(cp.get());
>+        sGrandchildProcessMap.insert(std::pair<ContentParent*, std::set<ContentParent*> >(this, children));

I think it'd be cleaner to write

          sGrandchildProcessMap[this] = children;

>+    } else {
>+        iter->second.insert(cp.get());
>+    }
>+    return true;
>+}

Do we need all of these |.get()|'s on cp?  I'd have thought we could rely on
the implicit conversion to ContentParent*.

>+bool
>+ContentParent::AnswerBridgeToChildProcess(const uint64_t& id)
>+{
>+    ContentParent* cp = reinterpret_cast<ContentParent*>(id);

What happens if a malicious child process sends us a bogus ID?  :)

>+    GrandchildMap::iterator iter = sGrandchildProcessMap.find(this);
>+    if (iter != sGrandchildProcessMap.end() &&
>+        iter->second.find(cp) != iter->second.end()) {
>+        return PContentBridge::Bridge(this, cp);
>+    } else {
>+        // You can't bridge to a process you didn't open!
>+        KillHard();
>+        return false;
>+    }
>+}

>@@ -1003,16 +1084,28 @@ ContentParent::ActorDestroy(ActorDestroyReason why)
>     // IPDL rules require actors to live on past ActorDestroy, but it
>     // may be that the kungFuDeathGrip above is the last reference to
>     // |this|.  If so, when we go out of scope here, we're deleted and
>     // all hell breaks loose.
>     //
>     // This runnable ensures that a reference to |this| lives on at
>     // least until after the current task finishes running.
>     NS_DispatchToCurrentThread(new DelayedDeleteContentParentTask(this));
>+
>+    // Destroy any processes created by this ContentParent
>+    GrandchildMap::iterator iter = sGrandchildProcessMap.find(this);
>+    if (iter != sGrandchildProcessMap.end()) {
>+        for(std::set<ContentParent*>::iterator child = iter->second.begin();
>+            child != iter->second.end();
>+            child++) {
>+            MessageLoop::current()->PostTask(
>+              FROM_HERE,
>+              NewRunnableMethod(*child, &ContentParent::ShutDownProcess, true));

Please comment what true is.  Also, it maybe should be false.  I think,
probably.  :)

>+        }
>+    }
> }
Attachment #777471 - Flags: review?(justin.lebar+bug) → review-
(In reply to Justin Lebar [:jlebar] from comment #56)
> Comment on attachment 777471 [details] [diff] [review]
> Implement ContentBridge r=jlebar
> 

> 
> >     /**
> >      * Get or create a content process for the given TabContext.  aFrameElement
> >      * should be the frame/iframe element with which this process will
> >      * associated.
> >      */
> >     static TabParent*
> >     CreateBrowserOrApp(const TabContext& aContext,
> >                        nsIDOMElement* aFrameElement);
> > 
> >+    /**
> >+     * Get or create a content process for the given TabContext.  aFrameElement
> >+     * should be the frame/iframe element with which this process will
> >+     * associated.
> >+     */
> >+    static TabParent*
> >+    CreateTab(const TabContext& aContext,
> >+              nsIDOMElement* aFrameElement);
> 
> This either needs a different comment, or it needs to be merged with
> CreateBrowserOrApp.  (The latter seems more sensible to me, offhand.)
> 

I merged these together.  

> 
> This doesn't seem quite right.
> 
> The purpose of this code is to put a cap on the number of non-app content
> processes.  If we already are at the limit of such processes and we try to
> create a new non-app content process, we randomly pick one of the existing
> processes.
> 
> I agree that we need some check like what you're added here, but as written
> this lets us exceed the cap.
> 

We can pick a random process and if the opener doesn't match, iterate through the array in order, using that process as the starting point and wrapping around.  I think it's possible that none of the processes will match, in which case it may be ok to exceed the cap anyway. (For example, if there are two browser apps and one is using all the processes, we should open one process for the second one.  Any subsequently loaded tabs will share that process so we only exceed the limit by 1.)

> 
> I have no idea what our rule is on using std::map in Gecko.  Can you ask
> around, unless you're sure it's OK?  I don't have a problem with it on
> principal.
> 

We already use std::map in this file for CompositorParent::sIndirectLayerTrees.  I think the policy is it's ok to use STL containers that are exception-safe.

> >+bool
> >+ContentParent::RecvCreateChildProcess(uint64_t* id)
> >+{
> >+    nsRefPtr<ContentParent> cp = GetNewOrUsed(/* isBrowserElement = */ true, this);
> >+    *id = reinterpret_cast<uint64_t>(cp.get());
> 
> Using a pointer as an ID is bad, bad, bad, because malloc can reuse pointers.
> 
> We already have an ID on ContentParent; can we use that?

We could, but then we would need another map between ID and ContentParent.


> Do we need all of these |.get()|'s on cp?  I'd have thought we could rely on
> the implicit conversion to ContentParent*.
> 

I think that didn't work for some reason, but I'll try again.

> >+bool
> >+ContentParent::AnswerBridgeToChildProcess(const uint64_t& id)
> >+{
> >+    ContentParent* cp = reinterpret_cast<ContentParent*>(id);
> 
> What happens if a malicious child process sends us a bogus ID?  :)
> 

That should be fine, we won't find the ContentParent in the map and kill the process.  We check that before we dereference the pointer.

> >@@ -1003,16 +1084,28 @@ ContentParent::ActorDestroy(ActorDestroyReason why)
> >     // IPDL rules require actors to live on past ActorDestroy, but it
> >     // may be that the kungFuDeathGrip above is the last reference to
> >     // |this|.  If so, when we go out of scope here, we're deleted and
> >     // all hell breaks loose.
> >     //
> >     // This runnable ensures that a reference to |this| lives on at
> >     // least until after the current task finishes running.
> >     NS_DispatchToCurrentThread(new DelayedDeleteContentParentTask(this));
> >+
> >+    // Destroy any processes created by this ContentParent
> >+    GrandchildMap::iterator iter = sGrandchildProcessMap.find(this);
> >+    if (iter != sGrandchildProcessMap.end()) {
> >+        for(std::set<ContentParent*>::iterator child = iter->second.begin();
> >+            child != iter->second.end();
> >+            child++) {
> >+            MessageLoop::current()->PostTask(
> >+              FROM_HERE,
> >+              NewRunnableMethod(*child, &ContentParent::ShutDownProcess, true));
> 
> Please comment what true is.  Also, it maybe should be false.  I think,
> probably.  :)
> 
> >+        }
> >+    }
> > }

When we call ShutDownProcess right above, we pass true to ensure that we close our channel, so perhaps we must do the same here.  Although, I think you understand process shutdown the best at this point ;)
> We can pick a random process and if the opener doesn't match, iterate through the array in order, 
> using that process as the starting point and wrapping around.  I think it's possible that none of 
> the processes will match, in which case it may be ok to exceed the cap anyway.

Yes, that sounds totally right.

> We could, but then we would need another map between ID and ContentParent.

Yes.

> That should be fine, we won't find the ContentParent in the map and kill the process.  We check 
> that before we dereference the pointer.

Mmm.  That is still very scary, but you are correct.
Except this still suffers from the "ABA problem", where we allocate a ContentParent at address X, then free it, then allocate a /different/ ContentParent at address X.
> When we call ShutDownProcess right above, we pass true to ensure that we close our channel, so 
> perhaps we must do the same here.  Although, I think you understand process shutdown the best at 
> this point ;)

It's different -- above, we're calling ShutDownProcess as a last-ditch attempt to shut ourselves down.  We pass |true| so that we call CloseWithError on the channel (which, like you say, ensures that the channel is closed).

But here, I think we want to try a soft shutdown of the grandchild process.  It's not time to shut it down as an error...yet.  I'm not sure when is that time.  Judging by the other bugs we're fighting against, we'll probably have to circle back around after you land this and figure out how to ensure that grandchildren are always killed.
(In reply to Justin Lebar [:jlebar] from comment #59)
> Except this still suffers from the "ABA problem", where we allocate a
> ContentParent at address X, then free it, then allocate a /different/
> ContentParent at address X.

Really?  Keep in mind that the reinterpret_cast will go away because we'll be using the ContentParent's mId.  Those should be unique, and the rest of the function ensures we have permission to muck with the ContentParent identified by the id.
If we use the ContentParent's mID, which is non-repeating, and map from that to ContentParent*, there's no problem.
Comment on attachment 777471 [details] [diff] [review]
Implement ContentBridge r=jlebar

Hooray, finished with this one.

>diff --git a/dom/ipc/PContentBridge.ipdl b/dom/ipc/PContentBridge.ipdl
>new file mode 100644
>index 0000000..1ee3098
>--- /dev/null
>+++ b/dom/ipc/PContentBridge.ipdl

>+rpc protocol PContentBridge

I think |rpc protocol| is only necessary if this protocol has rpc messages.
Since they're all sync or async, can this be a |sync protocol|?  Or does it
inherit its rpc-ness from PContent?

>+{
>+    bridges PContent, PContent;
>+
>+    manages PBlob;
>+    manages PBrowser;
>+    manages PJavaScript;
>+
>+parent:
>+    sync SyncMessage(nsString aMessage, ClonedMessageData aData, CpowEntry[] aCpows)
>+      returns (nsString[] retval);
>+both:
>+    async PBrowser(IPCTabContext context, uint32_t chromeFlags);
>+
>+    async PBlob(BlobConstructorParams params);
>+
>+    async PJavaScript();
>+
>+    AsyncMessage(nsString aMessage, ClonedMessageData aData, CpowEntry[] aCpows);
>+};
>+
>+}
>+}

>diff --git a/dom/ipc/ContentBridgeChild.cpp b/dom/ipc/ContentBridgeChild.cpp
>new file mode 100644
>index 0000000..832f87d
>--- /dev/null
>+++ b/dom/ipc/ContentBridgeChild.cpp
>+namespace mozilla {
>+namespace dom {
>+
>+NS_IMPL_ISUPPORTS1(ContentBridgeChild, nsIContentChild)
>+
>+ContentBridgeChild::ContentBridgeChild(Transport* aTransport)
>+  : mTransport(aTransport)
>+{}
>+
>+ContentBridgeChild::~ContentBridgeChild()
>+{
>+  XRE_GetIOMessageLoop()->PostTask(FROM_HERE, new DeleteTask<Transport>(mTransport));
>+}

Can you add a comment in the header that mTransport is an owning reference?
Also, how do you know that's true?

>+/*static*/ PContentBridgeChild*
>+ContentBridgeChild::Create(Transport* aTransport, ProcessId aOtherProcess)

Why not have this return ContentBridgeChild*?

>+{
>+  nsRefPtr<ContentBridgeChild> bridge =
>+    new ContentBridgeChild(aTransport);
>+  ProcessHandle handle;
>+  if (!base::OpenProcessHandle(aOtherProcess, &handle)) {
>+    // XXX need to kill |aOtherProcess|, it's boned
>+    return nullptr;
>+  }

It's OK if you want to do this in another bug, but this is an important XXX to
fix.

>+  bridge->mSelfRef = bridge;

When do you drop this ref?

>+  DebugOnly<bool> ok = bridge->Open(aTransport, handle, XRE_GetIOMessageLoop());
>+  MOZ_ASSERT(ok);

>+  // The return value is just compared to null for success checking,
>+  // we're not sharing a ref.
>+  return bridge.get();

I don't understand this comment; if this method only needs to return a bool,
then can we make it return a bool?  If we can't, we clearly are using the
pointer value for something...

>+}

>+jsipc::JavaScriptChild *
>+ContentBridgeChild::GetCPOWManager()
>+{
>+  if (ManagedPJavaScriptChild().Length()) {
>+    return static_cast<JavaScriptChild*>(ManagedPJavaScriptChild()[0]);
>+  }
>+  JavaScriptChild* actor = static_cast<JavaScriptChild*>(SendPJavaScriptConstructor());
>+  return actor;
>+}

Can you add a comment explaining why this can't live on nsIContentChild?

>diff --git a/dom/ipc/ContentBridgeParent.cpp b/dom/ipc/ContentBridgeParent.cpp
>new file mode 100644
>index 0000000..36be212
>--- /dev/null
>+++ b/dom/ipc/ContentBridgeParent.cpp

>+ContentBridgeParent::~ContentBridgeParent()
>+{
>+  XRE_GetIOMessageLoop()->PostTask(FROM_HERE, new DeleteTask<Transport>(mTransport));

Similarly about this mTransport.

>+}

>+/*static*/ PContentBridgeParent*
>+ContentBridgeParent::Create(Transport* aTransport, ProcessId aOtherProcess)
>+{
>+  nsRefPtr<ContentBridgeParent> bridge =
>+    new ContentBridgeParent(aTransport);
>+  ProcessHandle handle;
>+  if (!base::OpenProcessHandle(aOtherProcess, &handle)) {
>+    // XXX need to kill |aOtherProcess|, it's boned

Similarly about this XXX.

>+    return nullptr;
>+  }
>+  bridge->mSelfRef = bridge;

Similarly about breaking this ref.

>+  DebugOnly<bool> ok = bridge->Open(aTransport, handle, XRE_GetIOMessageLoop());
>+  MOZ_ASSERT(ok);
>+  // The return value is just compared to null for success checking,
>+  // we're not sharing a ref.
>+  return bridge.get();

I similarly don't get this comment.

>+}

>+jsipc::JavaScriptParent*
>+ContentBridgeParent::GetCPOWManager()
>+{
>+  if (ManagedPJavaScriptParent().Length()) {
>+    return static_cast<JavaScriptParent*>(ManagedPJavaScriptParent()[0]);
>+  }
>+  JavaScriptParent* actor = static_cast<JavaScriptParent*>(SendPJavaScriptConstructor());
>+  return actor;
>+}

Can you add a comment explaining why this can't live on nsIContentParent?

>diff --git a/dom/ipc/ContentChild.h b/dom/ipc/ContentChild.h
>index a5d68a8..1d37333 100644
>--- a/dom/ipc/ContentChild.h
>+++ b/dom/ipc/ContentChild.h

>+    ContentBridgeParent* GetLastBridge() { return mLastBridge; }
>+    ContentBridgeParent* mLastBridge;

Initialize mLastBridge to nullptr in the ContentChild constructor?  Also it's
not clear why we have a public getter and also a public member.

Maybe the method should be called GetLastBridgeParent()?

>diff --git a/dom/ipc/ContentParent.h b/dom/ipc/ContentParent.h
>index 031bd15..871db2f 100644
>--- a/dom/ipc/ContentParent.h
>+++ b/dom/ipc/ContentParent.h

>+    virtual bool RecvCreateChildProcess(uint64_t* id) MOZ_OVERRIDE;
>+    virtual bool AnswerBridgeToChildProcess(const uint64_t& id) MOZ_OVERRIDE;

>@@ -433,16 +446,17 @@ private:
>+    ContentParent* mOpener;

This is a bit subtle; I'd appreciate a comment explaining what this is.

Also, I don't think you have guaranteed that |this| is always destructed before
|mOpener|.  The easy thing to do would be to make this a weak ref.

>diff --git a/dom/ipc/ContentParent.cpp b/dom/ipc/ContentParent.cpp
>index 019d148..6b8bf03 100644
>--- a/dom/ipc/ContentParent.cpp
>+++ b/dom/ipc/ContentParent.cpp

>+/*static*/ TabParent*
>+ContentParent::CreateTab(const TabContext& aContext,
>+                         nsIDOMElement* aFrameElement)
>+{
>+    MOZ_ASSERT(aContext.IsBrowserElement());
>+    ContentChild* child = ContentChild::GetSingleton();
>+    uint64_t id;
>+    if (!child->SendCreateChildProcess(&id)) {
>+        return nullptr;
>+    }
>+    if (!child->CallBridgeToChildProcess(id)) {
>+        return nullptr;
>+    }

I'm nervous about the RPC usage here, but I haven't looked through those
patches yet, so I guess this gets a pass for now.  :)

>+    nsRefPtr<TabParent> tp(new TabParent(child->GetLastBridge(), aContext));

Using GetLastBridge() like this is a hack.  I'm not familiar enough with IPDL
to know what's the right way to do this, but there has got to be a better way.
Maybe bent can help.

>+    tp->SetOwnerElement(aFrameElement);
>+    uint32_t chromeFlags = 0;
>+
>+    // Propagate the private-browsing status of the element's parent
>+    // docshell to the remote docshell, via the chrome flags.
>+    nsCOMPtr<Element> frameElement = do_QueryInterface(aFrameElement);
>+    MOZ_ASSERT(frameElement);
>+    nsIDocShell* docShell =
>+        frameElement->OwnerDoc()->GetWindow()->GetDocShell();

I don't personally like MOZ_ASSERTs where, in the absence of the assert, you're
going to crash with a null-pointer exception on the next line.

Note also that we're going to crash if frameElement has no window.  I think a
busted child process could cause us to do that.

>+    MOZ_ASSERT(docShell);
>+    nsCOMPtr<nsILoadContext> loadContext = do_QueryInterface(docShell);

It's not clear why docShell must be non-null here, for the same reason as
window might be null.
(In reply to Justin Lebar [:jlebar] from comment #63)
> Comment on attachment 777471 [details] [diff] [review]
> Implement ContentBridge r=jlebar
> 
> Hooray, finished with this one.

> 
> I think |rpc protocol| is only necessary if this protocol has rpc messages.
> Since they're all sync or async, can this be a |sync protocol|?  Or does it
> inherit its rpc-ness from PContent?
> 

It must be rpc because PBrowser is rpc.


> >+ContentBridgeChild::~ContentBridgeChild()
> >+{
> >+  XRE_GetIOMessageLoop()->PostTask(FROM_HERE, new DeleteTask<Transport>(mTransport));
> >+}
> 
> Can you add a comment in the header that mTransport is an owning reference?
> Also, how do you know that's true?
> 

This is what CompositorParent and ImageBridgeParent do.

> >+{
> >+  nsRefPtr<ContentBridgeChild> bridge =
> >+    new ContentBridgeChild(aTransport);
> >+  ProcessHandle handle;
> >+  if (!base::OpenProcessHandle(aOtherProcess, &handle)) {
> >+    // XXX need to kill |aOtherProcess|, it's boned
> >+    return nullptr;
> >+  }
> 
> It's OK if you want to do this in another bug, but this is an important XXX
> to
> fix.

http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorParent.cpp#804 =(

> 
> >+  bridge->mSelfRef = bridge;
> 
> When do you drop this ref?
> 

Oops!

> >+  DebugOnly<bool> ok = bridge->Open(aTransport, handle, XRE_GetIOMessageLoop());
> >+  MOZ_ASSERT(ok);
> 
> >+  // The return value is just compared to null for success checking,
> >+  // we're not sharing a ref.
> >+  return bridge.get();
> 
> I don't understand this comment; if this method only needs to return a bool,
> then can we make it return a bool?  If we can't, we clearly are using the
> pointer value for something...
> 

Need to look into this.


> >@@ -433,16 +446,17 @@ private:
> >+    ContentParent* mOpener;
> 
> This is a bit subtle; I'd appreciate a comment explaining what this is.
> 
> Also, I don't think you have guaranteed that |this| is always destructed
> before
> |mOpener|.  The easy thing to do would be to make this a weak ref.
> 

Are you concerned about a race between when the app shuts down and tells its children to shut down and when the destructors actually run?


> >diff --git a/dom/ipc/ContentParent.cpp b/dom/ipc/ContentParent.cpp
> >index 019d148..6b8bf03 100644
> >--- a/dom/ipc/ContentParent.cpp
> >+++ b/dom/ipc/ContentParent.cpp
> 
> >+/*static*/ TabParent*
> >+ContentParent::CreateTab(const TabContext& aContext,
> >+                         nsIDOMElement* aFrameElement)
> >+{
> >+    MOZ_ASSERT(aContext.IsBrowserElement());
> >+    ContentChild* child = ContentChild::GetSingleton();
> >+    uint64_t id;
> >+    if (!child->SendCreateChildProcess(&id)) {
> >+        return nullptr;
> >+    }
> >+    if (!child->CallBridgeToChildProcess(id)) {
> >+        return nullptr;
> >+    }
> 
> I'm nervous about the RPC usage here, but I haven't looked through those
> patches yet, so I guess this gets a pass for now.  :)
> 
> >+    nsRefPtr<TabParent> tp(new TabParent(child->GetLastBridge(), aContext));
> 
> Using GetLastBridge() like this is a hack.  I'm not familiar enough with IPDL
> to know what's the right way to do this, but there has got to be a better
> way.
> Maybe bent can help.
> 

It's a hack, but I think it should be safe.

> >+    tp->SetOwnerElement(aFrameElement);
> >+    uint32_t chromeFlags = 0;
> >+
> >+    // Propagate the private-browsing status of the element's parent
> >+    // docshell to the remote docshell, via the chrome flags.
> >+    nsCOMPtr<Element> frameElement = do_QueryInterface(aFrameElement);
> >+    MOZ_ASSERT(frameElement);
> >+    nsIDocShell* docShell =
> >+        frameElement->OwnerDoc()->GetWindow()->GetDocShell();
> 
> I don't personally like MOZ_ASSERTs where, in the absence of the assert,
> you're
> going to crash with a null-pointer exception on the next line.
> 

This is preexisting code.  the assert is there mostly for sanity - it's impossible for the the owner of a process to not be an Element.  I can add a null-check for window.
 
> Note also that we're going to crash if frameElement has no window.  I think a
> busted child process could cause us to do that.
> 
> >+    MOZ_ASSERT(docShell);
> >+    nsCOMPtr<nsILoadContext> loadContext = do_QueryInterface(docShell);
> 
> It's not clear why docShell must be non-null here, for the same reason as
> window might be null.

Again, this is preexisting but I can add a check so we don't kill the parent process.
(In reply to Justin Lebar [:jlebar] from comment #63)
> Comment on attachment 777471 [details] [diff] [review]
> Implement ContentBridge r=jlebar
> 


> 
> I don't understand this comment; if this method only needs to return a bool,
> then can we make it return a bool?  If we can't, we clearly are using the
> pointer value for something...
> 

We don't use the pointer value, so I can change this to return bool.  It needs an IPDL compiler patch though, so I'll do it as another patch.
Attachment #780212 - Flags: review?(justin.lebar+bug) → review+
Comment on attachment 766385 [details] [diff] [review]
Part 1: Rename the MsgStart messages to MsgStartChild for child protocols

r=me, but I think it would be cleaner to have _messageStartName() take a
keyword arg (e.g., _messageStartName(p.decl.type, child=True)).

>diff --git a/ipc/ipdl/ipdl/lower.py b/ipc/ipdl/ipdl/lower.py
>--- a/ipc/ipdl/ipdl/lower.py
>+++ b/ipc/ipdl/ipdl/lower.py

>@@ -1532,17 +1532,17 @@ class _GenerateProtocolCode(ipdl.ast.Vis
>         # spit out message type enum and classes
>         msgenum = TypeEnum('MessageType')
>         msgstart = _messageStartName(self.protocol.decl.type) +' << 16'
>         msgenum.addId(self.protocol.name + 'Start', msgstart)
>-        msgenum.addId(self.protocol.name +'PreStart', '('+ msgstart +') - 1')
>+        #msgenum.addId(self.protocol.name +'PreStart', '('+ msgstart +') - 1')

I guess this is an unrelated change?  I'd prefer if we just deleted the line if
it's not necessary.
Attachment #766385 - Flags: review?(justin.lebar+bug) → review+
Comment on attachment 779983 [details] [diff] [review]
Part 14: Allocing an actor for a bridged or opened protocol should return bool

Yay.
Attachment #779983 - Flags: review?(justin.lebar+bug) → review+
(In reply to Justin Lebar [:jlebar] from comment #68)
> Comment on attachment 766385 [details] [diff] [review]
> Part 1: Rename the MsgStart messages to MsgStartChild for child protocols
> 

> 
> >@@ -1532,17 +1532,17 @@ class _GenerateProtocolCode(ipdl.ast.Vis
> >         # spit out message type enum and classes
> >         msgenum = TypeEnum('MessageType')
> >         msgstart = _messageStartName(self.protocol.decl.type) +' << 16'
> >         msgenum.addId(self.protocol.name + 'Start', msgstart)
> >-        msgenum.addId(self.protocol.name +'PreStart', '('+ msgstart +') - 1')
> >+        #msgenum.addId(self.protocol.name +'PreStart', '('+ msgstart +') - 1')
> 
> I guess this is an unrelated change?  I'd prefer if we just deleted the line
> if
> it's not necessary.

Yeah, we were generating an enum value that's never used.  Not a big deal.
Comment on attachment 777470 [details] [diff] [review]
Add an nsIContentChild api r=jlebar

>diff --git a/dom/ipc/ContentChild.cpp b/dom/ipc/ContentChild.cpp
>index 0464ddf..4d5d356 100644
>--- a/dom/ipc/ContentChild.cpp
>+++ b/dom/ipc/ContentChild.cpp

>@@ -958,19 +850,19 @@ ContentChild::ActorDestroy(ActorDestroyReason why)
> 
>-    if (sFirstIdleTask) {
>+    /*if (sFirstIdleTask) {
>         sFirstIdleTask->Cancel();
>-    }
>+    }*/

What's going on here?
Attachment #777470 - Flags: review?(justin.lebar+bug) → review+
Comment on attachment 777469 [details] [diff] [review]
Add an nsIContentParent api r=jlebar

>diff --git a/content/base/src/nsFrameLoader.h b/content/base/src/nsFrameLoader.h
>index 315a8b0..8178733 100644
>--- a/content/base/src/nsFrameLoader.h
>+++ b/content/base/src/nsFrameLoader.h

>@@ -431,17 +425,17 @@ private:
>   bool mObservingOwnerContent : 1;
> 
>   // Backs nsIFrameLoader::{Get,Set}Visible.  Visibility state here relates to
>   // whether this frameloader's <iframe mozbrowser> is setVisible(true)'ed, and
>   // doesn't necessarily correlate with docshell/document visibility.
>   bool mVisible : 1;
> 
>   // XXX leaking
>-  nsCOMPtr<nsIObserver> mChildHost;
>+  nsCOMPtr<mozilla::dom::nsIContentParent> mChildHost;

I've bitrotted you a bit here, sorry.

The reason for the bitrot is that it was helpful, when debugging ContentParent
leaks, for nsFrameLoader.h to maintain a pointer of type X, while our memory
reporter reported the addresses of ContentParent*'s also cast to type X.  It
would be helpful if you could maintain that invariant (perhaps by changing the
memory reporter in ContentParent.cpp to report nsIContentParent's, or
something).

>diff --git a/content/base/src/nsFrameLoader.cpp b/content/base/src/nsFrameLoader.cpp
>index e23249f..cca3f85 100644
>--- a/content/base/src/nsFrameLoader.cpp
>+++ b/content/base/src/nsFrameLoader.cpp

>@@ -2062,17 +2071,17 @@ nsFrameLoader::TryRemoteBrowser()
>     nsCOMPtr<nsIDOMWindow> rootWin = do_GetInterface(rootItem);
>     nsCOMPtr<nsIDOMChromeWindow> rootChromeWin = do_QueryInterface(rootWin);
>     NS_ABORT_IF_FALSE(rootChromeWin, "How did we not get a chrome window here?");
> 
>     nsCOMPtr<nsIBrowserDOMWindow> browserDOMWin;
>     rootChromeWin->GetBrowserDOMWindow(getter_AddRefs(browserDOMWin));
>     mRemoteBrowser->SetBrowserDOMWindow(browserDOMWin);
> 
>-    mChildHost = mRemoteBrowser->Manager();
>+    mChildHost = do_QueryInterface(mRemoteBrowser->Manager());
>   }
>   return true;
> }

It doesn't look like nsIContent{Child,Parent} is the sort of interface you can
QI to.

Also I don't think you need to QI this, because TabParent::Manager() returns an
nsIContentParent*.

>diff --git a/dom/ipc/nsIContentParent.cpp b/dom/ipc/nsIContentParent.cpp
>new file mode 100644
>index 0000000..27d841e
>--- /dev/null
>+++ b/dom/ipc/nsIContentParent.cpp

We need to be careful with this one.  We're doing security checks here, but
those security checks must be done (or redone) in the root process.

It's not clear to me that we're meeting this burden here if the
nsIContentParent is a ContentBridgeParent.  Am I missing something?
Attachment #777469 - Flags: review?(justin.lebar+bug) → review+
(In reply to Justin Lebar [:jlebar] from comment #71)
> Comment on attachment 777470 [details] [diff] [review]
> Add an nsIContentChild api r=jlebar
> 
> >diff --git a/dom/ipc/ContentChild.cpp b/dom/ipc/ContentChild.cpp
> >index 0464ddf..4d5d356 100644
> >--- a/dom/ipc/ContentChild.cpp
> >+++ b/dom/ipc/ContentChild.cpp
> 
> >@@ -958,19 +850,19 @@ ContentChild::ActorDestroy(ActorDestroyReason why)
> > 
> >-    if (sFirstIdleTask) {
> >+    /*if (sFirstIdleTask) {
> >         sFirstIdleTask->Cancel();
> >-    }
> >+    }*/
> 
> What's going on here?

Ah, I left that because I wasn't sure what to do about it; good thing you caught it!  So the issue here is that we only create the sFirstIdleTask when the ContentChild allocates PBrowserChild.  The easiest solution here is probably to have ContentBridgeChild call ContentChild::GetSingleton()->RecvPBrowserConstructor.
(In reply to Justin Lebar [:jlebar] from comment #72)
> Comment on attachment 777469 [details] [diff] [review]
> Add an nsIContentParent api r=jlebar
> 

> I've bitrotted you a bit here, sorry.
> 
> The reason for the bitrot is that it was helpful, when debugging
> ContentParent
> leaks, for nsFrameLoader.h to maintain a pointer of type X, while our memory
> reporter reported the addresses of ContentParent*'s also cast to type X.  It
> would be helpful if you could maintain that invariant (perhaps by changing
> the
> memory reporter in ContentParent.cpp to report nsIContentParent's, or
> something).
> 

Hmm, are you talking about 
nsPrintfCString path("queued-ipc-messages/content-parent"               
                              "(%s, pid=%d, %s, 0x%p, refcnt=%d)",               
                              NS_ConvertUTF16toUTF8(friendlyName).get(),         
                             cp->Pid(), channelStr, cp, refcnt);

That doesn't seem to cast the pointer to nsIObserver.  nsIObserver isn't the first interface ContentParent inherits from, so aren't you getting the pointer to the PContentParent*? Am I missing something here?

> 
> It doesn't look like nsIContent{Child,Parent} is the sort of interface you
> can
> QI to.
> 

We can add an IID to it later which would allow QI.  But you're right, this doesn't need a QI, and I actually took it out at some point.
> That doesn't seem to cast the pointer to nsIObserver.  nsIObserver isn't the first interface 
> ContentParent inherits from, so aren't you getting the pointer to the PContentParent*?

You're understanding this perfectly.  :)

It's a result of the fact that this prints a PContentParent* that I changed the pointer in nsFrameLoader.h from nsCOMPtr<nsIObserver> to nsRefPtr<ContentParent>.
Ben, review ping?
Flags: needinfo?(bent.mozilla)
Attachment #770553 - Attachment is obsolete: true
Attachment #770553 - Flags: review?(bent.mozilla)
Attachment #806124 - Flags: review?(bent.mozilla)
Attachment #771000 - Attachment is obsolete: true
Attachment #771000 - Flags: review?(bent.mozilla)
Attachment #806125 - Flags: review?(bent.mozilla)
Attachment #771043 - Attachment is obsolete: true
Attachment #771043 - Flags: review?(bent.mozilla)
Attachment #806126 - Flags: review?(bent.mozilla)
Comment on attachment 806124 [details] [diff] [review]
Use nsIContentParent in indexeddb r=bent

Review of attachment 806124 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, sorry for the delay.
Attachment #806124 - Flags: review?(bent.mozilla) → review+
Comment on attachment 806125 [details] [diff] [review]
Fix IndexedDBObjectStore to not assume that PContent connects the same processes as PBrowser r=bent

Review of attachment 806125 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, just nits here really.

::: dom/indexedDB/IDBObjectStore.cpp
@@ +13,5 @@
>  
>  #include "jsfriendapi.h"
>  #include "mozilla/dom/ContentChild.h"
>  #include "mozilla/dom/nsIContentParent.h"
> +#include "mozilla/dom/TabChild.h"

Nit: Please put in alphabetic order

@@ +2845,5 @@
> +  // Our "parent" process may be either the root process or another content
> +  // process if this indexedDB is managed by a PBrowser that is managed by a
> +  // PContentBridge.  We need to find which one it is so that we can create
> +  // PBlobs that are managed by the right nsIContentChild.
> +  IndexedDBChild* indexedDB =

Nit: Call this 'rootActor' maybe?

@@ +2846,5 @@
> +  // process if this indexedDB is managed by a PBrowser that is managed by a
> +  // PContentBridge.  We need to find which one it is so that we can create
> +  // PBlobs that are managed by the right nsIContentChild.
> +  IndexedDBChild* indexedDB =
> +    static_cast<IndexedDBChild*>(objectStoreActor->Manager()->Manager()->Manager());

Nit: wrap to 80 chars

@@ +3108,5 @@
>  }
>  
>  nsresult
> +AddHelper::PackArgumentsForParentProcess(ObjectStoreRequestParams& aParams,
> +                                         nsIContentChild* aBlobCreator)

In all of these please assert a non-null aBlobCreator once at the beginning of the function (after any others that may be there already)

::: dom/indexedDB/ipc/IndexedDBChild.cpp
@@ +162,5 @@
> +                               const nsCString& aASCIIOrigin)
> +: mFactory(nullptr)
> +, mManagerContent(aContentChild)
> +, mManagerTab(nullptr)
> +, mASCIIOrigin(aASCIIOrigin)

Nit: Please stick with existing style and fit as many items on a line as possible under 80 chars.
Attachment #806125 - Flags: review?(bent.mozilla) → review+
Comment on attachment 806126 [details] [diff] [review]
nsIContentChild::GetOrCreateActorForBlob should ensure the blob has the correct manager r=bent

Review of attachment 806126 [details] [diff] [review]:
-----------------------------------------------------------------

r=me!
Attachment #806126 - Flags: review?(bent.mozilla) → review+
These patches no longer pass try: https://tbpl.mozilla.org/?tree=Try&rev=43f712a3e13e

I'm not sure when I'll have time to look into this.
Flags: needinfo?(bent.mozilla)
This patch breaks my build (linux clang x64). the parent cset builds.

0:10.33 nsMemoryInfoDumper.o
 0:11.79 In file included from ../../../http2/xpcom/base/nsMemoryInfoDumper.cpp:1:
 0:11.79 In file included from ../../../http2/xpcom/base/nsMemoryInfoDumper.cpp:13:
 0:11.79 ../../dist/include/mozilla/dom/ContentParent.h:254:5: error: virtual function 'AllocPCompositorParent' has a different return type ('bool') than the function it overrides (which has return type 'PCompositorParent *' (aka 'mozilla::layers::PCompositorParent *'))
 0:11.79     AllocPCompositorParent(mozilla::ipc::Transport* aTransport,
 0:11.79     ^
 0:11.80 ../../ipc/ipdl/_ipdlheaders/mozilla/dom/PContentParent.h:707:5: note: overridden virtual function is here
 0:11.80     AllocPCompositorParent(
 0:11.80     ^
 0:11.80 In file included from ../../../http2/xpcom/base/nsMemoryInfoDumper.cpp:1:
 0:11.80 In file included from ../../../http2/xpcom/base/nsMemoryInfoDumper.cpp:13:
 0:11.80 ../../dist/include/mozilla/dom/ContentParent.h:257:5: error: virtual function 'AllocPImageBridgeParent' has a different return type ('bool') than the function it overrides (which has return type 'PImageBridgeParent *' (aka 'mozilla::layers::PImageBridgeParent *'))
 0:11.80     AllocPImageBridgeParent(mozilla::ipc::Transport* aTransport,
 0:11.80     ^
 0:11.80 ../../ipc/ipdl/_ipdlheaders/mozilla/dom/PContentParent.h:711:5: note: overridden virtual function is here
 0:11.80     AllocPImageBridgeParent(
 0:11.80     ^
 0:11.80 In file included from ../../../http2/xpcom/base/nsMemoryInfoDumper.cpp:1:
 0:11.80 In file included from ../../../http2/xpcom/base/nsMemoryInfoDumper.cpp:14:
 0:11.80 ../../dist/include/mozilla/dom/ContentChild.h:81:5: error: virtual function 'AllocPCompositorChild' has a different return type ('bool') than the function it overrides (which has return type 'PCompositorChild *' (aka 'mozilla::layers::PCompositorChild *'))
 0:11.80     AllocPCompositorChild(mozilla::ipc::Transport* aTransport,
 0:11.80     ^
 0:11.80 ../../ipc/ipdl/_ipdlheaders/mozilla/dom/PContentChild.h:602:5: note: overridden virtual function is here
 0:11.80     AllocPCompositorChild(
 0:11.80     ^
 0:11.80 In file included from ../../../http2/xpcom/base/nsMemoryInfoDumper.cpp:1:
 0:11.80 In file included from ../../../http2/xpcom/base/nsMemoryInfoDumper.cpp:14:
 0:11.80 ../../dist/include/mozilla/dom/ContentChild.h:84:5: error: virtual function 'AllocPImageBridgeChild' has a different return type ('bool') than the function it overrides (which has return type 'PImageBridgeChild *' (aka 'mozilla::layers::PImageBridgeChild *'))
 0:11.80     AllocPImageBridgeChild(mozilla::ipc::Transport* aTransport,
 0:11.80     ^
 0:11.80 ../../ipc/ipdl/_ipdlheaders/mozilla/dom/PContentChild.h:606:5: note: overridden virtual function is here
 0:11.80     AllocPImageBridgeChild(
 0:11.80     ^
 0:11.80 4 errors generated.
 0:11.80 distcc[15228] ERROR: compile /home/mcmanus/src/mozilla2/wd/http2/xpcom/base/nsMemoryInfoDumper.cpp on localhost failed
 0:11.81 
 0:11.81 In the directory  /home/mcmanus/src/mozilla2/wd/obj-debug-http2/xpcom/base
 0:11.81 The following command failed to execute properly:
 0:11.81 distcc clang++ -o nsMemoryInfoDumper.o -c -I../../dist/stl_wrappers -I../../dist/system_wrappers -include /home/mcmanus/src/mozilla2/wd/http2/config/gcc_hidden.h -DMOZ_GLUE_IN_PROGRAM -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -DSTATIC_EXPORTABLE_JS_API -DNO_NSPR_10_SUPPORT -DOS_POSIX=1 -DOS_LINUX=1 -I/home/mcmanus/src/mozilla2/wd/http2/ipc/chromium/src -I/home/mcmanus/src/mozilla2/wd/http2/ipc/glue -I../../ipc/ipdl/_ipdlheaders -I/home/mcmanus/src/mozilla2/wd/http2/xpcom/base/../build -I/home/mcmanus/src/mozilla2/wd/http2/xpcom/ds -I/home/mcmanus/src/mozilla2/wd/http2/xpcom/base -I. -I../../dist/include -I/home/mcmanus/src/mozilla2/wd/obj-debug-http2/dist/include/nspr -I/home/mcmanus/src/mozilla2/wd/obj-debug-http2/dist/include/nss -fPIC -Qunused-arguments -DMOZILLA_CLIENT -include ../../mozilla-config.h -MD -MP -MF .deps/nsMemoryInfoDumper.o.pp -Qunused-arguments -Qunused-arguments -Wall -Wpointer-arith -Woverloaded-virtual -Werror=return-type -Wtype-limits -Wempty-body -Wsign-compare -Wno-invalid-offsetof -Wno-c++0x-extensions -Wno-extended-offsetof -Wno-unknown-warning-option -Wno-return-type-c-linkage -Wno-mismatched-tags -fno-exceptions -fno-strict-aliasing -fno-rtti -fno-exceptions -fno-math-errno -std=gnu++0x -I/home/mcmanus/src/mozilla2/wd/http2/build/unix/headers -pthread -pipe -DDEBUG -D_DEBUG -DTRACING -g -Os -freorder-blocks -fno-omit-frame-pointer -I/home/mcmanus/src/mozilla2/wd/http2/widget/gtk/compat -pthread -I/usr/include/gtk-2.0 -I/usr/lib/x86_64-linux-gnu/gtk-2.0/include -I/usr/include/atk-1.0 -I/usr/include/cairo -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/pango-1.0 -I/usr/include/gio-unix-2.0/ -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libpng12 -I/usr/include/harfbuzz -I/usr/include/gtk-unix-print-2.0 /home/mcmanus/src/mozilla2/wd/http2/xpcom/base/nsMemoryInfoDumper.cpp
 0:11.81 make[5]: *** [nsMemoryInfoDumper.o] Error 1
(In reply to Patrick McManus [:mcmanus] from comment #88)
> This patch breaks my build (linux clang x64). the parent cset builds.
> 

I actually think it was just missing a CLOBBER update (or had a bad interaction with ehsahn's #include work). my build has gotten farther now.
Attachment #809495 - Flags: review?(evilpies)
Sorry, I don't really understand this code, can you please ask somebody else for review?
Attachment #809495 - Flags: review?(evilpies)
As a meta-comment on this bug, there are UX designs for the future of B2G that basically do away with the "browser as a separate app" concept. The new UX design makes the browser more or less an integral part of the root process, and so splitting it out into a separate app is no longer necessary. Given that plan, I was wondering if this bug is still useful at all. Jonas says that although the built-in browser app may not use it, the work itself is still useful:

On 13-10-11 14:18 , Jonas Sicking wrote:
> We should definitely finish the work in bug 879475. It's something
> that will be useful for many other things than the built-in browser
> app. For example I'd like to force all certified (and eventually
> privileged) apps to always use out-of-process iframes for all
> not-from-app-package content. We're already seeing partners creating
> certified apps which use <iframe>s to render web content, and the
> problem is going to get much more prevalent over time. Fixing that
> will require bug 879475 to be finished.
> 
> So absolutely full steam ahead!
> 
> Only thing I would say have changed is that it's even more important
> that we land good tests given that we won't get automatic testing from
> the browser app.
I think this is still useful if we want to implement a replaceable homescreen in the future. The homescreen itself could be a separate process and has many widgets (like the metro UI).
Agreed. I don't think we'll have a shortage of features that will need OOP <iframe>s.
Just to catch up - are we happy with the patches we have, is there more stuff to come?  Based on the last few comments, sounds like we do want this, regardless of the future of "browser as a separate app".  Knowing what we'll do here and how we'll do it will inform the dynamic toolbar work.  Not a fire, just checking to see where we are.
Flags: needinfo?(dzbarsky)
(In reply to Milan Sreckovic [:milan] from comment #95)
> Just to catch up - are we happy with the patches we have, is there more
> stuff to come?  Based on the last few comments, sounds like we do want this,
> regardless of the future of "browser as a separate app".  Knowing what we'll
> do here and how we'll do it will inform the dynamic toolbar work.  Not a
> fire, just checking to see where we are.

I've fixed up and rebased these patches, they're in my git repo.  I do think we want this for e10s, someone just needs to fix it up a little and land it.  There are currently a few windows+linux xpcshell and mochitest failures.  Unfortunately I don't have access to a linux or windows machine and won't have time to work on this stuff for a while.
Flags: needinfo?(dzbarsky)
Botond, do you have a linux or windows machine where you can setup the build with the patches and see if we can reproduce the test failures?  Just to give us a head start.
Flags: needinfo?(botond)
Sorry, put the second comment in without the first one - David, thanks for the update, and thanks for getting this patch almost to the end.  I don't know how far we can take it without you, but we can at least see if we can reproduce the test failures locally and take it from there.  Thus the comment 97.  Thanks again.
(In reply to Milan Sreckovic [:milan] from comment #97)
> Botond, do you have a linux or windows machine where you can setup the build
> with the patches and see if we can reproduce the test failures?  Just to
> give us a head start.

I have a Linux machine and a Windows 8 machine, I can try to reproduce on them. Is there a try run that shows the failing tests?
Flags: needinfo?(botond)
(In reply to David Zbarsky (:dzbarsky) from comment #96)
> I've fixed up and rebased these patches, they're in my git repo.

Is it this one: https://github.com/dzbarsky/mozilla-central/tree/nestedprocesses?
Yep, it's this one(In reply to Botond Ballo [:botond] from comment #100)
> (In reply to David Zbarsky (:dzbarsky) from comment #96)
> > I've fixed up and rebased these patches, they're in my git repo.
> 
> Is it this one:
> https://github.com/dzbarsky/mozilla-central/tree/nestedprocesses?

Yep, it's this one.  I can't find a try run that show the failing tests, but IIRC it was dom/browser-element/mochitest/priority/test_Audio.html and some xpcshell tests under netwerk/ipc.
Any news? I'd like to take this if nobody is working on it. It's a bit hard to find what had been landed and what not though..
David, can you update Kan-Ru on the status of this?
Flags: needinfo?(dzbarsky)
Attachment #766336 - Flags: checkin+
Attachment #766385 - Flags: checkin+
Attachment #769842 - Flags: checkin+
Attachment #770015 - Flags: checkin+
Attachment #770016 - Flags: checkin+
Attachment #772868 - Attachment is obsolete: true
Attachment #779983 - Attachment is obsolete: true
Kan-Ru,

I've marked the patches as obsolete/checked in.  However, some of the patches attached to this bug have had changes applied to them to address review comments.  You can see the patches that remain to be landed and their ordering at https://github.com/dzbarsky/mozilla-central/commits/nestedprocesses

These patches failed a few tests on tryserver.  I can't find a try run that show the failing tests, but IIRC it was dom/browser-element/mochitest/priority/test_Audio.html on linux and some xpcshell tests under netwerk/ipc.

Thanks for volunteering to pick this up, and let me know if there's anything else I can do to help you!
Flags: needinfo?(dzbarsky)
Assignee: dzbarsky → kchen
Got distracted by other blocking issues. I'll back to this soon.
While un-bitrotten the patches, I found I wanted to try another approach. Instead of making these two very different class (ContentParent, ContentBridgeParent) share one common base (IContentParent), I created a new top level protocol PContentContent that bridges PContent and PContent. PContentBridge is then used to manage protocols that are required for communicating between Contents, no matter it's B2G<->Content or Content<->Content.

The patches are recorded here

https://github.com/kanru/gecko-dev/commits/nestedoop
Blocks: 961689
Is work here still progressing?
Blocks: 980935
(In reply to Kan-Ru Chen [:kanru] from comment #106)
> While un-bitrotten the patches, I found I wanted to try another approach.
> Instead of making these two very different class (ContentParent,
> ContentBridgeParent) share one common base (IContentParent), I created a new
> top level protocol PContentContent that bridges PContent and PContent.
> PContentBridge is then used to manage protocols that are required for
> communicating between Contents, no matter it's B2G<->Content or
> Content<->Content.
> 
> The patches are recorded here
> 
> https://github.com/kanru/gecko-dev/commits/nestedoop

Hey Kan-Ru, do you have some time to try to land those patches as if by just fixing the test failures and land any changes as a followup ? I understand that you want to experiment and made more changes, but fixing this bug in its current state will unlock some deadlines for the Gaia team :)

Basically this prevent us to iterate over 'widgets', and to identify which messages needs to be bridge between the child and the grand-child, versus grandchild-parent.
Flags: needinfo?(kchen)
(In reply to Vivien Nicolas (:vingtetun) (:21) (NOT reading bugmails, needinfo? please) from comment #108)
> (In reply to Kan-Ru Chen [:kanru] from comment #106)
> > While un-bitrotten the patches, I found I wanted to try another approach.
> > Instead of making these two very different class (ContentParent,
> > ContentBridgeParent) share one common base (IContentParent), I created a new
> > top level protocol PContentContent that bridges PContent and PContent.
> > PContentBridge is then used to manage protocols that are required for
> > communicating between Contents, no matter it's B2G<->Content or
> > Content<->Content.
> > 
> > The patches are recorded here
> > 
> > https://github.com/kanru/gecko-dev/commits/nestedoop
> 
> Hey Kan-Ru, do you have some time to try to land those patches as if by just
> fixing the test failures and land any changes as a followup ? I understand
> that you want to experiment and made more changes, but fixing this bug in
> its current state will unlock some deadlines for the Gaia team :)
> 
> Basically this prevent us to iterate over 'widgets', and to identify which
> messages needs to be bridge between the child and the grand-child, versus
> grandchild-parent.

I agree that land the original patches maybe the fastest way to unlock Gaia development. I'll put priority on this.
Flags: needinfo?(kchen)
To use TabParent in a Content process there are some remaining issues:

* RecvSyncMessage
* RecvRpcMessage
* RecvAsyncMessage

  Call AssertAppPrincipal which needs ContentParent::KillHard

* AllocPContentPermissionRequestParent

  Uses nsIContentPermissionPrompt which is B2G process only

* AllocPOfflineCacheUpdateParent
* RecvPOfflineCacheUpdateConstructor
* DeallocPOfflineCacheUpdateParent

  Uses nsOfflineCacheUpdateService, SecurityManager and PermissionManager

* RecvSetOfflinePermission

  Uses nsOfflineCacheUpdateService

* RecvPIndexedDBConstructor

  if (!IndexedDatabaseManager::IsMainProcess()) {
    NS_RUNTIMEABORT("Not supported yet!");
  }

* RecvZoomToRect
* RecvContentReceivedTouch
* RecvUpdateZoomConstraints

  APZC assums ContentParent
I'm currently unbitrotting the original patchset. After that me and Gina will fix the remaining issues mentioned above (comment 110). I think if all goes well we will need two or three weeks to get a rather stable patchset that could be reviewed and tested. If more issues were spotted during the process, we will need more time to triage and fix bugs.
Let me know if I can help answer any questions about the original patches!
Blocks: 1013810
Attached patch Refresh patch in one file (obsolete) — Splinter Review
Refreshed patch.

Currently it couldn't pass the Necko IPC security check because ContentParent doesn't have any PBrowsers.

Nested remote layers doesn't render because the remote layer subtree is not registered to the compositor.
It's great if the PContent for each "nested" child process only holds the PBrowsers that actually run in that child process. I.e. it'd be good if the PBrowser for the innermost iframe only lives in the PContent for the innermost PContent, and not for the "middle" PContent. That should give us the security properties that we want.
(In reply to Jonas Sicking (:sicking) from comment #115)
> It's great if the PContent for each "nested" child process only holds the
> PBrowsers that actually run in that child process. I.e. it'd be good if the
> PBrowser for the innermost iframe only lives in the PContent for the
> innermost PContent, and not for the "middle" PContent. That should give us
> the security properties that we want.

Right. Currently the PBrowsers are managed either by a PContent or a PContentBridge. In the future I'd like to move all PBrowsers back to PContent and use PContentBridge purely for communication.
Attachment #777469 - Attachment is obsolete: true
Attachment #780212 - Attachment is obsolete: true
Attachment #8428659 - Attachment is obsolete: true
Attachment #8430704 - Attachment description: Add an nsIContentParent api. r=jlebar → Part 001. Add an nsIContentParent api. r=jlebar
Attachment #777470 - Attachment is obsolete: true
Attachment #806124 - Attachment is obsolete: true
Attachment #806126 - Attachment is obsolete: true
Attachment #776710 - Attachment is obsolete: true
Attachment #776777 - Attachment is obsolete: true
Attachment #777471 - Attachment is obsolete: true
Attachment #809495 - Attachment is obsolete: true
Hi Jonas,

I'd like to land these patches first and fix the remaining issues in followup bugs. These patches tend to bitrot quickly. Land them first then we could identify remaining issues and fix them in parallel. It also makes developers aware that they should consider nested case when writing IPC code. How do you think?

Hi Bent, Honza and Matt, you reviewed the original patches. Are you OK with landing these patches pref'ed off?
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(jonas)
Flags: needinfo?(honzab.moz)
Flags: needinfo?(bent.mozilla)
Can you please provide the patch(es) with a 7 line context as we all do?  I'd like to compare it with https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=890570&attachment=782832.  Thanks.
(In reply to Honza Bambas (:mayhemer) from comment #126)
> Can you please provide the patch(es) with a 7 line context as we all do? 
> I'd like to compare it with
> https://bugzilla.mozilla.org/page.cgi?id=splinter.
> html&bug=890570&attachment=782832.  Thanks.

My bad. Here are the patches with 7 line context.
Attachment #8430704 - Attachment is obsolete: true
Attachment #8430705 - Attachment is obsolete: true
Attachment #8430708 - Attachment is obsolete: true
Attachment #8430918 - Attachment description: art 003. Use nsIContentParent in indexeddb. r=bent → Part 003. Use nsIContentParent in indexeddb. r=bent
Attachment #8430713 - Attachment is obsolete: true
I'm totally happy to have patches landed but turned off. In fact, for big complex features like this I'd recommend it. As long as the patches get reviews first of course. I.e. you still need review before landing.

I'd even say it's ok to land turned on as long as it doesn't regress any existing functionality/performance/stability. But we should be careful not to enable 3rd party apps that could use <iframe mozbrowser remote> to cause exploitable crashes.
Flags: needinfo?(jonas)
Yes, fine with me too.
Flags: needinfo?(matt.woodrow)
Comment on attachment 8430924 [details] [diff] [review]
Part 006. Bug 890570: Fix http auth prompts for nested content processes r=honzab

Review of attachment 8430924 [details] [diff] [review]:
-----------------------------------------------------------------

The patch now looks much much better!  Thanks.

Few comments you should consider fixing tho.

And then I'm not against landing this.

::: dom/ipc/TabChild.cpp
@@ +612,5 @@
>      }
>  };
>  
>  StaticRefPtr<TabChild> sPreallocatedTab;
> +std::map<uint64_t, nsRefPtr<TabChild> > TabChild::sTabChildMap;

according [1] this is a static initializer.  please do something like this:

// public static
std::map<uint64_t, nsRefPtr<TabChild> >& TabChild::TabChildMap()
{
  static std::map<uint64_t, nsRefPtr<TabChild> > sTabChildMap;
  return sTabChildMap;
}

And instead of a direct access to TabChild::sTabChildMap just use TabChild::TabChildMap().  Note that the initialization is not thread safe..


[1] http://www.cplusplus.com/reference/map/map/map/

::: dom/ipc/TabChild.h
@@ +258,5 @@
>      virtual ~TabChild();
>  
>      bool IsRootContentDocument();
>  
> +    uint64_t Id() {

should be const

@@ +272,5 @@
> +        static uint64_t sId = 0;
> +        sId++;
> +        aTabChild->mUniqueId = sId;
> +        sTabChildMap[sId] = aTabChild;
> +        return sId;

on what thread(s) is this called?  because this code is far from being thread-safe you should at least do MOZ_ASSERT(NS_IsMainThread()) here.

::: netwerk/ipc/NeckoParent.cpp
@@ +697,5 @@
>    }
>    return actor.forget();
>  }
>  
> +std::map<uint64_t, nsCOMPtr<nsIAuthPromptCallback> > sCallbackMap;

static init as well.

@@ +705,5 @@
> +NeckoParent::NestedFrameAuthPrompt::NestedFrameAuthPrompt(PNeckoParent* aParent,
> +                                                          uint64_t aNestedFrameId)
> +  : mNeckoParent(aParent)
> +  , mNestedFrameId(aNestedFrameId)
> +{}

{
}

@@ +731,5 @@
> +  if (mNeckoParent->SendAsyncAuthPromptForNestedFrame(mNestedFrameId,
> +                                                      spec,
> +                                                      realm,
> +                                                      callbackId)) {
> +    sCallbackMap[callbackId] = nsCOMPtr<nsIAuthPromptCallback>(callback);

Nit: this does addref/addref/release.  Try to static_cast |callback| to nsISupports* and assign directly, it should work (if not, leave like this).  Then the code will do just a single addref.
Flags: needinfo?(honzab.moz)
Attachment #8430915 - Attachment is obsolete: true
Attachment #8430916 - Attachment is obsolete: true
Attachment #8430918 - Attachment is obsolete: true
Attachment #8430922 - Attachment is obsolete: true
Attachment #8430920 - Attachment is obsolete: true
Fix static initializer
Attachment #8430924 - Attachment is obsolete: true
Blocks: nested-oop
Attachment #8433915 - Attachment is obsolete: true
There were also failures on the b2g emulator across multiple suites:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=c5f9a189da8b&jobname=b2g_emulator
(jobs only just completed)

Btw part 6 had the wrong bug number - can you fix before landing?
Flags: needinfo?(bent.mozilla)
This bug is a bit difficult to follow. Does the patch enable content process to content process communication?
(In reply to Malini Das [:mdas] from comment #160)
> This bug is a bit difficult to follow. Does the patch enable content process
> to content process communication?

It does. Content and Content are communicated through the PContentBridge protocol.
We'll probably want message manager communication for sibling processes.
(In reply to Olli Pettay [:smaug] from comment #162)
> We'll probably want message manager communication for sibling processes.

+1!

In the meantime, is there an example of using PContentBridge in js (if it is possible)?
(In reply to Malini Das [:mdas] from comment #163)
> (In reply to Olli Pettay [:smaug] from comment #162)
> > We'll probably want message manager communication for sibling processes.
> 
> +1!
> 
> In the meantime, is there an example of using PContentBridge in js (if it is
> possible)?

This is not possible yet. In the future if we want we could create a new type of message manager that connects sibling processes.

Currently PContentBridge only connects sibling processes that are logically parent/children. They could communicate using the familiar browser message managers.
(In reply to Kan-Ru Chen [:kanru] from comment #164)
> (In reply to Malini Das [:mdas] from comment #163)
> > (In reply to Olli Pettay [:smaug] from comment #162)
> > > We'll probably want message manager communication for sibling processes.
> > 
> > +1!
> > 
> > In the meantime, is there an example of using PContentBridge in js (if it is
> > possible)?
> 
> This is not possible yet. In the future if we want we could create a new
> type of message manager that connects sibling processes.
> 
> Currently PContentBridge only connects sibling processes that are logically
> parent/children. They could communicate using the familiar browser message
> managers.

Coo! Will you raise a bug about this? I'm not sure which Product/Compartment to put it in. Thanks!

It would really help b2g automation if we could have access to content-to-content message passing. Right now, we're hacking a way around basic information passing by storing data in the marionette server running in the shared b2g root process, and retrieving it from the correct child process, but this is indirect, makes us store extra data and send extra messages around, and if our needs get any more complicated, it will be difficult to figure out what the 'correct' child process will be without direct message passing.
I've filed Bug 1026021 about content-to-content communication
Depends on: 1024470
Flags: in-testsuite?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: