Closed Bug 1053634 Opened 7 years ago Closed 4 years ago

Make all app processes forked from Nuwa

Categories

(Core :: IPC, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: kk1fff, Assigned: kk1fff)

References

(Blocks 1 open bug)

Details

Attachments

(6 files, 21 obsolete files)

12.42 KB, patch
khuey
: review+
Details | Diff | Splinter Review
5.50 KB, patch
khuey
: review+
Details | Diff | Splinter Review
17.86 KB, patch
smaug
: review+
Details | Diff | Splinter Review
14.04 KB, patch
kanru
: review-
Details | Diff | Splinter Review
2.41 KB, patch
Details | Diff | Splinter Review
10.45 KB, patch
Details | Diff | Splinter Review
Currently we allow fork from b2g when there's not already a preallocated process there. I think we can make the nsFrameLoader be able to launch process without blocking the DOM operation, so we can wait for a new preallocated process.
Kyle, do you think it is possible to make frame loader load process asynchronously (like what we tried to do in bug 938470)? What will probably be the problem that we need to address if we make it work that way?
Flags: needinfo?(khuey)
That's really a question for smaug.
Flags: needinfo?(khuey) → needinfo?(bugs)
FrameLoader could use create mRemoteBrowser asynchronously (and that would lead creating the ContentPArent async). Need to just be careful that we don't try to recreate them while the
asynchronous creation is in process. 

One problem is that currently after a remote iframe is created, one could start using message manager
to send message to the child side. That wouldn't really work if the child side isn't there.
Flags: needinfo?(bugs)
(In reply to Olli Pettay [:smaug] from comment #3)
> One problem is that currently after a remote iframe is created, one could
> start using message manager
> to send message to the child side. That wouldn't really work if the child
> side isn't there.

To fix this, I think we can queue the messages until child has been created.
I can send patch for this.
Assignee: nobody → kk1fff
Attachment #8476444 - Attachment description: WIP Patch 2: Try next runnable when calling to a callback after new preallocated process doesn't really take the preallocated process.. → WIP Patch 2: Try next runnable when calling to a callback after a new preallocated process created doesn't really take the preallocated process.
See Also: → 1108624
See Also: → 1114854
blocking-b2g: --- → 2.2?
See Also: → 1002894
Whiteboard: [CR 786372]
Whiteboard: [CR 786372] → [caf priority: p2][CR 786372]
blocking-b2g: 2.2? → 2.2+
Patrick, what more's needed to land this for 2.2? Who should review these patches?
Flags: needinfo?(kk1fff)
I am testing my patches on try and I guess we are pretty much there. I'll request review from khuey and/or smaug once they passed tests on try.
Flags: needinfo?(kk1fff)
Attachment #8476442 - Attachment is obsolete: true
Attachment #8476444 - Attachment is obsolete: true
Attachment #8568988 - Flags: review?(khuey)
Comment on attachment 8568989 [details] [diff] [review]
Patch Part 2: Queue frame message up when content process havne't been created

>+struct nsFrameLoader::QueuedFrameMessage {
{ goes to the next line.

>+  nsString message;
>+  ClonedMessageData data;
Why is this safe? ClonedMessageData has PBlob[]

>+  InfallibleTArray<mozilla::jsipc::CpowEntry> cpowEntries;
It is not clear to me if cpowEntries can be kept alive this way. Please ask billm.
I would have assumed you should store JS::Heap<JSObject*> mCpows and root that object.
Is there any reason to store CpowEntry objects?

>+  bool nullPrincipal;
>+  nsCString principalString;
>+  QueuedFrameMessage()
>+    : nullPrincipal(false) {
>+  }
>+};
Please use 'm' prefix for member variables.


>+            nsCOMPtr<nsISupports> iSupport = nullptr;
>+            rv = NS_DeserializeObject(msg.principalString, getter_AddRefs(iSupport));
>+            NS_ENSURE_SUCCESS(rv, NS_ERROR_FAILURE);
>+            principal = do_QueryInterface(iSupport);
>+            NS_ENSURE_TRUE(principal, NS_ERROR_FAILURE);
...
>+      } else {
>+        nsCOMPtr<nsISerializable> serializalbePrincipal = do_QueryInterface(aPrincipal);
>+        nsresult rv = NS_SerializeToString(serializalbePrincipal, msg->principalString);
>+        if (NS_FAILED(rv)) {
I don't understand this stuff
Why don't you just store nsCOMPtr<nsIPrincipal>?


Why mozilla::UniquePtr?
Why not just 
nsTArray<QueuedFrameMessage> mQueuedMessage;



So this is for frame message manager. How does the process message manager setup work?
Attachment #8568989 - Flags: review?(bugs) → review-
So please test sending blobs and cpows, and test also process message manager.
(In reply to Olli Pettay [:smaug] from comment #14)
> >+  nsString message;
> >+  ClonedMessageData data;
> Why is this safe? ClonedMessageData has PBlob[]

What is the proper way to store PBlob so I can send it later?
Hi Bill,

I'm writing a patch that tries to keep the frame messages and process messages in a buffer when content process is not ready. So I want to store cpows and PBlob when queuing the messages up and resend them later. What is the proper way to preserve the data structure so I can queue them up and send later?

Thanks!
Flags: needinfo?(wmccloskey)
What you're doing with CpowEntry should be fine. I don't know about PBlob though. You could ask bent about that.
Flags: needinfo?(wmccloskey)
(In reply to Bill McCloskey (:billm) from comment #18)
> What you're doing with CpowEntry should be fine. I don't know about PBlob
> though. You could ask bent about that.

Thanks, Bill!
Hi Ben,

I am trying to queue frame messages up and resend them later after content process becomes ready. One of the data structure I need to preserve is ClonedMessageData, which contains an array of PBlob. What is the right way to preserve the data structure so it can be sent later?

Thanks!
Flags: needinfo?(bent.mozilla)
Pinged @bent by email today requesting a response here.
@bent is on PTO until Monday, March 9th.
Whiteboard: [caf priority: p2][CR 786372] → [caf priority: p1][CR 786372]
Any update here? It seems to be blocking multiple issues so appreciate quick fix.
No longer blocks: CAF-v3.0-FL-metabug
(In reply to Patrick Wang (Chih-Kai Wang) (:kk1fff) from comment #20)
> Hi Ben,
> 
> I am trying to queue frame messages up and resend them later after content
> process becomes ready. One of the data structure I need to preserve is
> ClonedMessageData, which contains an array of PBlob. What is the right way
> to preserve the data structure so it can be sent later?
> 
> Thanks!

So I've thought about this a bit and I think it's generally ok.  The Blob actor is destroyed when the DOM blob object on the receiving side is destroyed.  If the child isn't running yet it's hard for it to be GCd.

If this does turn out to be a problem for some reason, we can always store a strong reference the underlying FileImpl* and recreate the Blob (via BlobParent::GetOrCreate) to send the message.  But we should start with just storing the PBlob.
Flags: needinfo?(bent.mozilla)
Thank you, Kyle!
update with review comments. only process message manager part is still working in progress.
Attachment #8568989 - Attachment is obsolete: true
(In reply to Olli Pettay [:smaug] from comment #14)
> I would have assumed you should store JS::Heap<JSObject*> mCpows and root
> that object.

I just realized that I can't call any function that wraps objects for cross process operation, since no cross process protocol, including ContentParent, has been created at that point. Olli, could you elaborate more about what you think I should store? Another question is that JSContext doesn't seem be able to be stored, does it sound okay to use a AutoContext when the message is sent later? Thanks!
Flags: needinfo?(bugs)
So you'd just store JSObject* and then create actual cpows out of it later.

For the context, use AutoJSAPI and initialize it with the JSObject and then
AutoJSAPI.cx() should give you a sane JSContext.
Flags: needinfo?(bugs)
(In reply to Patrick Wang (Chih-Kai Wang) (:kk1fff) from comment #27)
> (In reply to Olli Pettay [:smaug] from comment #14)
> > I would have assumed you should store JS::Heap<JSObject*> mCpows and root
> > that object.
> 
> I just realized that I can't call any function that wraps objects for cross
> process operation, since no cross process protocol, including ContentParent,
> has been created at that point. Olli, could you elaborate more about what
> you think I should store? Another question is that JSContext doesn't seem be
> able to be stored, does it sound okay to use a AutoContext when the message
> is sent later? Thanks!

Unless I'm missing something, there's no reason to do this. It's perfectly fine (and a lot safer) to store the CPOWs as numbers as you were doing.
Never mind, I didn't understand comment 27. I guess smaug's solution will be fine. If that turns out to be too complicated, maybe we could just not handle CPOWs. They shouldn't be used on b2g.
(In reply to Olli Pettay [:smaug] from comment #28)
> So you'd just store JSObject* and then create actual cpows out of it later.
> 
> For the context, use AutoJSAPI and initialize it with the JSObject and then
> AutoJSAPI.cx() should give you a sane JSContext.

Thanks, Olli!

(In reply to Bill McCloskey (:billm) from comment #29)
> Unless I'm missing something, there's no reason to do this. It's perfectly
> fine (and a lot safer) to store the CPOWs as numbers as you were doing.

The problem I met in comment 27 is that forming a cpow entry needs Wrap() method of a CPowManager, which is actually an instance of PJavaScriptParent actor. But at the time we try to queue the messages up, content process hasn't formed yet, so there's no Javascript protocol that we can use to wrap the object.
Attachment #8568988 - Attachment is obsolete: true
Attachment #8568988 - Flags: review?(khuey)
Attachment #8578554 - Flags: review?(khuey)
Michael, you can try this version.
Flags: needinfo?(mvines)
Attachment #8578555 - Flags: review?(bugs)
Hi Patrick, these patches don't apply on v2.2.  Can you please rebase them?
Flags: needinfo?(mvines) → needinfo?(kk1fff)
Comment on attachment 8578555 [details] [diff] [review]
Patch Part 2: Queue frame message up when content process havne't been created v.2

>+struct PendingFrameMessageQueue::QueuedFrameMessage
>+{
>+  struct ConstructParameter {
So this is a stack only struct? Could you use MOZ_STACK_CLASS


>+    const nsAString& mMessage;
>+    const StructuredCloneData& mData;
>+    JSContext* const mCx;
>+    JS::Handle<JSObject*>& mCpows;
>+    nsIPrincipal* const mPrincipal;
>+  };
>+
>+  ~QueuedFrameMessage()
>+  {
>+    if (mData.mData) {
>+      delete [] mData.mData;
>+    }
>+  }
>+
>+  QueuedFrameMessage(const ConstructParameter aParam)
>+    : mMessage(aParam.mMessage)
>+    , mCpows(aParam.mCx, aParam.mCpows)
>+    , mPrincipal(aParam.mPrincipal)
>+  {
>+    if (aParam.mData.mData) {
>+      mData.mData = new uint64_t [aParam.mData.mDataLength];
>+      memcpy(mData.mData, aParam.mData.mData, sizeof(uint64_t) * aParam.mData.mDataLength);
>+      mData.mDataLength = aParam.mData.mDataLength;
>+    }
>+    mData.mClosure = aParam.mData.mClosure;
>+  }
>+
>+  nsString mMessage;
>+  StructuredCloneData mData;
Could you perhaps store the actual data using nsAutoPtr or UniquePtr so that manual delete wouldn't be needed.

>+  JS::Rooted<JSObject*> mCpows;
Rooted is supposed to be used on stack only. and looks like QueuedFrameMessage is in heap. So you want PersistentRooted
>+void PendingFrameMessageQueue::SendQueuedMessage(MessageManagerCallback* aTarget)
>+{
>+  MOZ_ASSERT(aTarget, "Invalid argument");
>+  AutoJSAPI jsapi;
You must always initialize AutoJSAPI using Init(...)


>+
>+        // Before preallocated process is created, we setup a queue to keep all asynchronous
>+        // messages.
>+        if (!mPendingProcessMessage) {
>+          mPendingProcessMessage = MakeUnique<PendingFrameMessageQueue>(
>+            true /* add to process message manager */);
>+        }
I'm lost with the naming. mPendingProcessMessage sounds like some single thing, but
PendingFrameMessageQueue is a queue of messages.
> class nsFrameLoader MOZ_FINAL : public nsIFrameLoader,
>                                 public nsStubMutationObserver,
>                                 public mozilla::dom::ipc::MessageManagerCallback
> {
>   friend class AutoResetInShow;
>   typedef mozilla::dom::PBrowserParent PBrowserParent;
>   typedef mozilla::dom::TabParent TabParent;
>   typedef mozilla::layout::RenderFrameParent RenderFrameParent;
>@@ -368,11 +370,29 @@ private:
>   TabParent* mRemoteBrowser;
>   uint64_t mChildID;
> 
>   // See nsIFrameLoader.idl. EVENT_MODE_NORMAL_DISPATCH automatically
>   // forwards some input events to out-of-process content.
>   uint32_t mEventMode;
> 
>   nsCOMPtr<nsIRunnable> mDelayedStartLoadingRunnable;
>+  mozilla::UniquePtr<PendingFrameMessageQueue> mPendingFrameMessage;
>+  mozilla::UniquePtr<PendingFrameMessageQueue> mPendingProcessMessage;
I don't understand how a frameloader can keep the messages for a process alive.
There can be several frameloaders for one child process.
Attachment #8578555 - Flags: review?(bugs) → review-
(In reply to Michael Vines [:m1] [:evilmachines] from comment #35)
> Hi Patrick, these patches don't apply on v2.2.  Can you please rebase them?

Sure, I will put a 2.2-based patch in next version.
Flags: needinfo?(kk1fff)
(In reply to Olli Pettay [:smaug] from comment #36)
> >+  mozilla::UniquePtr<PendingFrameMessageQueue> mPendingProcessMessage;
> I don't understand how a frameloader can keep the messages for a process
> alive.
> There can be several frameloaders for one child process.

That's true. It should be put under a structure which represents a process. I put it under a frameloader because that ContentParent doesn't not exist then, and the frameloader is that triggers loading a new process, it should be the only frameloader which is connecting to the process.
(In reply to Patrick Wang (Chih-Kai Wang) (:kk1fff) from comment #38)
> it should be the only frameloader which is connecting to the process.

Hmm.. if there are two frameloaders trying to open two apps that are backed by the same process. the message can be duplicated.
Attached patch patch on v.2.2 (obsolete) — Splinter Review
Attachment #8579845 - Flags: feedback?(mvines)
Comment on attachment 8579868 [details] [diff] [review]
Patch Part 2: Queue frame message up when content process havne't been created v.3

Changes in this version:
* Use nsRefPtr to control lifespan of PendingMessageQueue. If there were two (or more) nsFrameLoaders that have the same manifest URL try to create remote frames and find that preallocated process is not ready, all of them shares the same PendingMessageQueue for the process, and each of them has its own PendingMessageQueue for its frame.
* After the content process is created, one of the nsFrameLoaders got notified and send the process message it queued to the target process, and then the PendingMessageQueue for that process is made invalid, frame loaders of the same process won't send twice to the content process.
* Fix naming, use UniquePtr to hold the data buffer and use PersistentRooted<JSObject*> to hold the js structure.
Attachment #8579868 - Flags: review?(bugs)
Comment on attachment 8579845 [details] [diff] [review]
patch on v.2.2

Looks good!
Attachment #8579845 - Flags: feedback?(mvines) → feedback+
Comment on attachment 8579868 [details] [diff] [review]
Patch Part 2: Queue frame message up when content process havne't been created v.3

>+void PendingMessageQueue::SendQueuedMessages(MessageManagerCallback* aTarget)
>+{
>+  MOZ_ASSERT(aTarget, "Invalid argument");
>+  if (mInvalidated) {
>+    return;
>+  }
>+
>+  AutoJSAPI jsapi;
>+  jsapi.Init();
You want to initialize jsapi with some object, like mCpows and _always_ check the return value of the init.

>         mDelayedStartLoadingRunnable = new DelayedStartLoadingRunnable(this);
>         ContentParent::AddRunAfterPreallocatedProcessReady(
>           mDelayedStartLoadingRunnable.get());
>+
>+
extra newline

>       TryRemoteBrowser();
>       if (!mRemoteBrowser) {
>         NS_WARNING("Couldn't create child process for iframe.");
>         return NS_ERROR_FAILURE;
>       }

>+
>+      // If we queued messages before process ready, it's our chance to
>+      // resend.
>+      if (mPendingFrameMessages) {
>+        mPendingFrameMessages->SendQueuedMessages(this);
>+        mPendingFrameMessages = nullptr;
>+      }
>+      if (mPendingProcessMessages) {
>+        ContentParent* cp = static_cast<ContentParent*>(mRemoteBrowser->Manager());
>+        mPendingProcessMessages->SendQueuedMessages(cp);
>+        mPendingProcessMessages = nullptr;
>+      }
Hmm, we probably need to keep the messages in the same order they were sent, whether or not frame or process mm was used.
Otherwise the child side will get the messages in unexpected order. So, I think you should merge the queues to one, and then
just store the "target" or the message in the message itself.


Took a bit time to figure out how this all is supposed to work, but after reading part 1 it came clearer.
But anyhow, I think we really need to just have one queue.
And think about failure cases - what if some process or frame is never initiated properly. Do we clear the message queue?
Attachment #8579868 - Flags: review?(bugs) → review-
(In reply to Olli Pettay [:smaug] from comment #44)
> And think about failure cases - what if some process or frame is never
> initiated properly. Do we clear the message queue?

In current patch, when the frame loader is killed, the queue of frame messages is also killed, and when all frame loaders of a process are killed, the process message queue is killed. But someone has to deleted the frame loader, I guess frame loader is deleted when the frame is removed from the DOM tree?
Frameloader is indeed destroyed when removed from DOM. Perhaps we should explicitly clear the relevant
entries in the queue when FrameLoader is "Destroyed". Otherwise we'd need to traverse/unlink/trace the cpows etc.
Hmm... found a bug when writing summary.. so it's still a wip.
Attachment #8581570 - Attachment is obsolete: true
Comment on attachment 8578554 [details] [diff] [review]
Patch Part 1: wait for nuwa to fork a new app process

Review of attachment 8578554 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/nsFrameLoader.cpp
@@ +360,5 @@
>    }
>  
>    if (mRemoteFrame) {
>      if (!mRemoteBrowser) {
> +      if (Preferences::GetBool("dom.ipc.processPrelaunch.enabled", false) &&

This can be a followup, but can we just cache this somewhere (on the PreallocatedProcessManager, probably) and just check that boolean?

@@ +365,5 @@
> +          !ContentParent::PreallocatedProcessReady()) {
> +        // We failed to get the content parent from preallocated process. ContentProcess
> +        // will create one for us.
> +        if (mDelayedStartLoadingRunnable) {
> +          ContentParent::RemoveRunAfterPreallocatedProcessReady(mDelayedStartLoadingRunnable.get());

the .get() shouldn't be necessary

@@ +371,5 @@
> +        }
> +
> +        mDelayedStartLoadingRunnable = new DelayedStartLoadingRunnable(this);
> +        ContentParent::AddRunAfterPreallocatedProcessReady(
> +          mDelayedStartLoadingRunnable.get());

and here

::: dom/ipc/ContentParent.cpp
@@ +684,5 @@
>  
> +#ifdef MOZ_NUWA_PROCESS
> +    if (Preferences::GetBool("dom.ipc.processPrelaunch.enabled", false)) {
> +        return nullptr;
> +    }

Why would we do this?

@@ +887,5 @@
> +ContentParent::AddRunAfterPreallocatedProcessReady(nsIRunnable* aRequest)
> +{
> +#ifdef MOZ_NUWA_PROCESS
> +    PreallocatedProcessManager::AddRunAfterPreallocatedProcessReady(aRequest);
> +#endif

Given that these runnables are never going to run if !MOZ_NUWA_PROCESS, it seems like perhaps the functions themselves should be #ifdefed?

::: dom/ipc/PreallocatedProcessManager.cpp
@@ +57,5 @@
> +    RunAfterPreallocatedProcessReadyRunnable(nsIRunnable* wrappedRunnable)
> +      : mWrappedRunnable(wrappedRunnable) { }
> +
> +    NS_IMETHODIMP
> +    Run() {

NS_IMETHOD, and override
Attachment #8578554 - Flags: review?(khuey)
Comment on attachment 8579845 [details] [diff] [review]
patch on v.2.2

Review of attachment 8579845 [details] [diff] [review]:
-----------------------------------------------------------------

Patrick, we applied this patch locally in our builds and we are seeing really bad boot performance with this patch. With gaia's reference-workload-x-heavy, its taking 160+ seconds for the HomeScreen to come up compared to ~20 seconds without this patch which is an 8 fold increase! Please take a look into what's causing delays in HS to come up.
Attachment #8579845 - Flags: feedback-
(In reply to Inder from comment #50)
> Patrick, we applied this patch locally in our builds and we are seeing
> really bad boot performance with this patch. With gaia's
> reference-workload-x-heavy, its taking 160+ seconds for the HomeScreen to
> come up compared to ~20 seconds without this patch which is an 8 fold
> increase! Please take a look into what's causing delays in HS to come up.

I can reproduce, looks like it occurs only in the first boot after pushing reference load into phone. Figuring out the root cause.
I saw IndexDB thread in B2G process is pretty busy, I guess it's because we pushed so many data thus b2g needs to build index for them. Nuwa can freeze only when B2G process is stabilized, that means no app process can be created before B2G done accessing the data we pushed.

A solution for this case is that if a process needw to be created when Nuwa hasn't been ready, we full back to forking from B2G. This can incur more memory usage in the first boot after a user pushed a large amount of data into the phone (I suppose this rarely happens, though), but after reboot, b2g process won't be so busy like this, so process can fork from Nuwa. I suggest that we fix this in another bug, since: 1. I believe a user rarely push a large amount of data at once. 2. The solution can be affected by some timing condition, e.g., in a usual case, is homescreen created before or after Nuwa being ready? If this condition isn't handled properly, there could always be some processes forked from B2G, and that is what we are trying to prevent here.
Patrick -- In more recent testing we discovered that it takes around 7 mins in first boot with high load which is really bad. It's interfering with our testing also. We would like it to be addressed with this fix.
Attached patch Patch 1 v2 WIP (obsolete) — Splinter Review
The previous one doesn't init browser API if content process isn't created. This cause home screen initialize twice in first boot.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #49)
> Comment on attachment 8578554 [details] [diff] [review]
> Patch Part 1: wait for nuwa to fork a new app process
> 
> Review of attachment 8578554 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/base/nsFrameLoader.cpp
> @@ +360,5 @@
> >    }
> >  
> >    if (mRemoteFrame) {
> >      if (!mRemoteBrowser) {
> > +      if (Preferences::GetBool("dom.ipc.processPrelaunch.enabled", false) &&
> 
> This can be a followup, but can we just cache this somewhere (on the
> PreallocatedProcessManager, probably) and just check that boolean?
> 
It would need also add an pref observer. I'll file a follow up.

> ::: dom/ipc/ContentParent.cpp
> @@ +684,5 @@
> >  
> > +#ifdef MOZ_NUWA_PROCESS
> > +    if (Preferences::GetBool("dom.ipc.processPrelaunch.enabled", false)) {
> > +        return nullptr;
> > +    }
> 
> Why would we do this?

If prelaunch is off, we will reach this line since we can't get a content process from preallocated process. But in that case, we should go ahead and fork a process from B2G.
(In reply to Inder from comment #53)
> Patrick -- In more recent testing we discovered that it takes around 7 mins
> in first boot with high load which is really bad. It's interfering with our
> testing also. We would like it to be addressed with this fix.

A quick fix is that we can force B2G to fork if Nuwa is not able to get ready after a while. However, under the condition in which Nuwa cannot get ready, the app launch time and memory will equal to (or worse than, since main process is busy) that in non-Nuwa case.
Fix according to review comments. Add InitializeBrowser() call in ReallyStartLoadingInternal.
Attachment #8578554 - Attachment is obsolete: true
Attachment #8584460 - Flags: review?(khuey)
Attachment #8584460 - Attachment is obsolete: true
Attachment #8584460 - Flags: review?(khuey)
Attachment #8584463 - Flags: review?(khuey)
Using only one queue per process. When a frame message is being put into the queue, PendingMessageQueue records the target frame. If it's a process message is being put to the queue, the target field is left null. When a remote frame is created, the corresponding frame loader ask PMQ to send the message in order. PMQ checks if target frame loader is ready before sending a message through it. If the frame loader wasn't ready yet, PMQ stops sending. After the frame loader gets ready, it calls PMQ so it can continue.
Attachment #8582356 - Attachment is obsolete: true
Attachment #8583826 - Attachment is obsolete: true
Attachment #8584466 - Flags: review?(bugs)
If B2G is busy, nuwa might not be able to become ready since it becomes ready only when B2G is stable. This patch ask B2G to fork directly if Nuwa can get ready after 30 sec.
Attachment #8584470 - Flags: review?(khuey)
Comment on attachment 8584463 [details] [diff] [review]
Patch Part 1: wait for nuwa to fork a new app process v.2

Review of attachment 8584463 [details] [diff] [review]:
-----------------------------------------------------------------

How good do you think our test coverage is?  I'm a little worried about the potential for regressions.
Attachment #8584463 - Flags: review?(khuey) → review+
Comment on attachment 8584470 [details] [diff] [review]
Patch Part 3: force b2g fork if nuwa can't get ready after a while

Review of attachment 8584470 [details] [diff] [review]:
-----------------------------------------------------------------

Why are we tracking mBackupTask at all?  We never read that variable.

30 seconds seems awfully long ...

::: dom/ipc/PreallocatedProcessManager.cpp
@@ +24,5 @@
>  #define DEFAULT_ALLOCATE_DELAY 1000
>  #define NUWA_FORK_WAIT_DURATION_MS 2000 // 2 seconds.
>  
> +// If Nuwa isn't ready after DIRECT_FORK_THRESHOLD, a preallocated
> +// process will be created directly from B2G process.

the preallocated process ... from the B2G process

@@ +26,5 @@
>  
> +// If Nuwa isn't ready after DIRECT_FORK_THRESHOLD, a preallocated
> +// process will be created directly from B2G process.
> +#define DIRECT_FORK_THRESHOLD 30000 // 30 secs
> +#define DIRECT_FORK_INTERVAL 5000 // 10 secs

5000 ms != 10 sec ;)
Attachment #8584470 - Flags: review?(khuey) → review+
Comment on attachment 8584466 [details] [diff] [review]
Patch Part 2: Queue frame message up when content process havne't been created v.4

I don't see anyone calling AddPendingMessageQueue ever.
PendingMessageQueue::AddToProcessMessageManager() calls RemovePendingMessageQueue(this); and then sets mAddedToProcessMessageManager = true;

RemoveFromProcessMessageManager also calls RemovePendingMessageQueue(this);

Something seems wrong here.
Attachment #8584466 - Flags: review?(bugs) → review-
(In reply to Olli Pettay [:smaug] from comment #63)
> Comment on attachment 8584466 [details] [diff] [review]
> Patch Part 2: Queue frame message up when content process havne't been
> created v.4
> 
> I don't see anyone calling AddPendingMessageQueue ever.
> PendingMessageQueue::AddToProcessMessageManager() calls
> RemovePendingMessageQueue(this); and then sets mAddedToProcessMessageManager
> = true;
> 
> RemoveFromProcessMessageManager also calls RemovePendingMessageQueue(this);
> 
> Something seems wrong here.

I missed that line after copying the function...
Actually I keep the pointer to the task to cancel the directly-fork task after nuwa ready. It's a pretty simple change but is with some modification to the logic, so requesting review again.
Attachment #8584470 - Attachment is obsolete: true
Attachment #8586067 - Flags: review?(khuey)
Attached file interdiff for patch part 3. v.1->v.2 (obsolete) —
interdiff for part 3.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) (PTO starting 4/1, back 4/8) from comment #62)
> Comment on attachment 8584470 [details] [diff] [review]
> Patch Part 3: force b2g fork if nuwa can't get ready after a while
> 
> Review of attachment 8584470 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Why are we tracking mBackupTask at all?  We never read that variable.
> 
> 30 seconds seems awfully long ...
Yeah... I like to make sure that we are under the condition that Nuwa is not able to ready after boot, e.g., b2g is too busy reading the imported data. If the value is too short, some process might be able to be forked from b2g even in a normal case.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) (PTO starting 4/1, back 4/8) from comment #61)
> Comment on attachment 8584463 [details] [diff] [review]
> Patch Part 1: wait for nuwa to fork a new app process v.2
> 
> Review of attachment 8584463 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> How good do you think our test coverage is?  I'm a little worried about the
> potential for regressions.

Well... we have tests case to make sure we can produce a new process, but I don't think there's any test case about timing issue these patches might create.
Comment on attachment 8586067 [details] [diff] [review]
Patch Part 3: force b2g fork if nuwa can't get ready after a while v.2

Review of attachment 8586067 [details] [diff] [review]:
-----------------------------------------------------------------

Please apply the fixes to the comments I mentioned earlier.

::: dom/ipc/PreallocatedProcessManager.cpp
@@ +455,5 @@
>      mozilla::unused << ppmm->BroadcastAsyncMessage(
>        NS_LITERAL_STRING("TEST-ONLY:nuwa-ready"),
>        JS::NullHandleValue, JS::NullHandleValue, cx, 1);
>    }
> +  if (mBackupProcessTask) {

nit: \n above and below the if
Attachment #8586067 - Flags: review?(khuey) → review+
Comment on attachment 8586196 [details] [diff] [review]
Patch Part 2: Queue frame message up when content process havne't been created v.4

>+void PendingMessageQueue::SendQueuedMessages(ContentParent* aContentParent)
>+{
>+  MOZ_ASSERT(aContentParent, "Invalid argument");
>+  AutoJSAPI jsapi;
>+  jsapi.Init();
Please fix this as I said in comment 44


This all is so complicated that this definitely needs tests.
You may want to add some pref to postpone child process creation to test this all.
So, this should not land without tests.
Attachment #8586196 - Flags: review?(bugs) → review+
Patrick,

Can you create the tests and make the changes requested by Olli by Friday, 6pm PDT. CAF's India-based testing teams needs to have this change ready by then to test on Sunday.

Thanks,
Mike

(In reply to Olli Pettay [:smaug] (review queue is getting too long. New requests after Easter, please) from comment #72)
>
> ... Please fix this as I said in comment 44
> 
> This all is so complicated that this definitely needs tests.
> You may want to add some pref to postpone child process creation to test
> this all.
> So, this should not land without tests.
Flags: needinfo?(kk1fff)
(In reply to Mike Lee [:mlee] from comment #73)
> Patrick,
> 
> Can you create the tests and make the changes requested by Olli by Friday,
> 6pm PDT. CAF's India-based testing teams needs to have this change ready by
> then to test on Sunday.
> 
> Thanks,
> Mike

I'm working on it.
Flags: needinfo?(kk1fff)
(In reply to Patrick Wang (Chih-Kai Wang) (:kk1fff) from comment #74)
> (In reply to Mike Lee [:mlee] from comment #73)
> > Patrick,
> > 
> > Can you create the tests and make the changes requested by Olli by Friday,
> > 6pm PDT. CAF's India-based testing teams needs to have this change ready by
> > then to test on Sunday.
> > 
> > Thanks,
> > Mike
> 
> I'm working on it.

Hi Mike:
 I spoke to Patrick already, he is working on the test cases now, but focusing on m-c first.

Due to review process, and rebase/uplift effort, can we make this ready by next Thur(4/9)?
(TPE will a 4-days long weekend starting from 4/3) 
If CAF has any "must have by this Friday 6pm PDT" reason, please kindly let us know.
Flags: needinfo?(mlee)
(In reply to shawn ku [:sku] from comment #75)
> ...
> Due to review process, and rebase/uplift effort, can we make this ready by
> next Thur(4/9)?
> (TPE will a 4-days long weekend starting from 4/3) 
> If CAF has any "must have by this Friday 6pm PDT" reason, please kindly let
> us know.

Thanks Shawn. I've replied to you, Patrick, Olli and CAF by email as the justification contains information that CAF is unable to share publicly.
Flags: needinfo?(mlee)
Attachment #8586070 - Attachment is obsolete: true
Attached patch WIP: test cases (obsolete) — Splinter Review
I haven't figure out why message listener attached to cpmm can be deleted.
Attached patch 2.2 patch (obsolete) — Splinter Review
2.2 based patch
Attachment #8579845 - Attachment is obsolete: true
Any update?
Flags: needinfo?(kk1fff)
I am still fixing test failure.
Flags: needinfo?(kk1fff)
I'll update bug once I have any result or patch update.
(In reply to Patrick Wang (Chih-Kai Wang) (:kk1fff) from comment #78)
> Created attachment 8587962 [details] [diff] [review]
> WIP: test cases
> 
> I haven't figure out why message listener attached to cpmm can be deleted.

The document (nsDocument) isn't destroyed before cpmm called back, continue finding the reason why the object is deleted.
found the root cause: cpmm listener of previous test isn't removed (previous test did call removeMessageListener, but it still in the listener list of frame message manager). thus frame message manager got error when calling the old listener and returned.
Hi Patrick, what's the ETA for fixing this bug?
Flags: needinfo?(kk1fff)
(In reply to Michael Vines [:m1] [:evilmachines] from comment #85)
> Hi Patrick, what's the ETA for fixing this bug?

Probably take another week, I guess, since several try test cases still fail. I'll try to concentrate on this this week, hopefully make it landed sooner.
Flags: needinfo?(kk1fff)
The test needs nested oop.
Depends on: 1138793
Attachment #8587962 - Attachment is obsolete: true
Attachment #8588093 - Attachment is obsolete: true
Attachment #8593262 - Flags: review?(kanru)
Attachment #8587961 - Attachment is obsolete: true
Attachment #8593263 - Flags: review?(khuey)
Attachment #8593262 - Attachment description: Patch Part 3: change for nested oop → Patch Part 4: change for nested oop
Attachment #8593264 - Flags: review?(khuey)
Duplicate of this bug: 948021
Comment on attachment 8593262 [details] [diff] [review]
Patch Part 4: change for nested oop

Review of attachment 8593262 [details] [diff] [review]:
-----------------------------------------------------------------

I discussed with Patrick offline, the PendingMessageQueue will not work properly with nested-content-process, we need to fix that.

::: dom/base/nsFrameLoader.cpp
@@ +481,5 @@
>  };
>  
> +#ifdef MOZ_NUWA_PROCESS
> +bool
> +nsFrameLoader::WaitForPreallocatedProcess()

Rename this to IsWaitingPreallocatedProcess()

@@ +519,5 @@
> +          if (XRE_GetProcessType() == GeckoProcessType_Default) {
> +            ContentParent::RemoveRunAfterPreallocatedProcessReady(mDelayedStartLoadingRunnable);
> +          } else {
> +            ContentChild::GetSingleton()->CancelRequestingChildProcess(mDelayedStartLoadingRunnable);
> +          }

I think this is very odd. Why would we run mDelayedStartLoadingRunnable without preallocated process available? I also think ContentParent should remove the runnable by itself.

::: dom/ipc/PContent.ipdl
@@ +603,5 @@
>  
> +    /**
> +     * A preallocated process is ready
> +     */
> +    async ChildProcessAvailable();

PreallocatedProcessAvailable()

@@ +986,5 @@
>  
> +    /**
> +     * Ask parent to notify us when a preallocated process is ready
> +     */
> +    async RequestChildProcess();

RequestPreallocatedProcses()
Attachment #8593262 - Flags: review?(kanru) → review-
I discussed with Kanru. As the change is pretty large to make loading an frame asynchronous, and the complexity doesn't seen converge now, the risk of taking patches of 1053634 to 2.2 has increased. It have been already quite hard to just rebase the patches onto 2.2, and it could incur unexpected regression. I propose that we simply wait for nuwa forking a process when there isn't a preallocated process for now, and go back to this to make loading a frame asynchronous later. Bug 1155547 is opened for the simpler fix.
blocking-b2g: 2.2+ → 2.2?
Attachment #8593263 - Flags: review?(khuey)
Attachment #8593264 - Flags: review?(khuey)
No longer blocks: CAF-v2.2-metabug
Whiteboard: [caf priority: p1][CR 786372]
remove nom based on Comment 93
blocking-b2g: 2.2? → ---
See Also: → 1155547
Nuwa is gone after bug 1284674.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.