Closed Bug 771765 (Nuwa) Opened 12 years ago Closed 11 years ago

Investigate implementing Gecko "template" content process

Categories

(Core :: IPC, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla27
blocking-b2g 1.3+
blocking-basecamp -
Tracking Status
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- unaffected

People

(Reporter: cjones, Assigned: cyu)

References

Details

(Keywords: perf, Whiteboard: [MemShrink:P1] [c= ])

Attachments

(15 files, 59 obsolete files)

163.77 KB, application/x-gzip
Details
107.39 KB, application/x-gzip
Details
62.25 KB, text/plain
Details
229.83 KB, text/plain
Details
9.91 KB, text/plain
Details
22.41 KB, text/plain
Details
21.11 KB, patch
cyu
: review+
Details | Diff | Splinter Review
2.74 KB, patch
cyu
: review+
Details | Diff | Splinter Review
7.46 KB, patch
cyu
: review+
Details | Diff | Splinter Review
61.84 KB, patch
cyu
: review+
Details | Diff | Splinter Review
26.14 KB, patch
cyu
: review+
Details | Diff | Splinter Review
33.64 KB, patch
cyu
: review+
Details | Diff | Splinter Review
36.42 KB, patch
cyu
: review+
Details | Diff | Splinter Review
17.03 KB, patch
cyu
: review+
Details | Diff | Splinter Review
21.05 KB, patch
Details | Diff | Splinter Review
Our current IPC system fork()s+exec()s each content process that's launched and then inits XPCOM.  This works fine when we only have one content process, as we do currently.  However, in the deep dark past when we initialized XPCOM in multiple plugin processes, it was observed that XPCOM init added something like a 10x overhead to process startup.  (Or maybe it was 50x.  High, at any rate.)

XPCOM startup has since been optimized quite a bit, and for content processes particularly as well.

B2G should have had process-per-app last month, and it'll be enabled any day now.  When that happens, launching an "app" will fork()+exec()+initalize XPCOM every time.  fork()+exec() (including linking) is not free, but relatively cheap and well understood.  XPCOM init is less understood by me.

If this overhead prevents us from meeting app-startup delay targets, we'll need to do something cleverer.  The best idea circulated is the old UNIX-beard trick of pre-initializing a helper process and then fork()ing it for each new worker.  (Android does this through its "zygote" process.)

First step here is to quantify fork()+exec()+XPCOM init overhead, preferably for b2g on low-end HW.  <10ms is pretty OK, <100ms may be sufficient for now.  >100ms is probably too high.

If we need to be clever with a pre-initialized template process (we need a good name for this!), then the next decision is where in startup to leave the template frozen.  The deeper, the smaller the startup cost, but the more engineering effort required.  I think the reasonable options are to freeze
 1. somehwere between main(() and XPCOM init
 2. after some step in XPCOM init (before threads can be created)

In current gecko, threads are created even during (1) so it's definitely easiest to impl freezing *just* after dlopen()ing libxul.  Whether that's good enough will depend on where the content-process-creation cost is going.
chris, this is not blocking for v.1.  if you disagree, please re-nom.
blocking-basecamp: --- → -
Blocks: slim-fast
As mentioned in bug 830641, we can do it by running current plugin-container to a snapshot point as a template, and waiting for requests for forking from parent process.  Once parent process requests for a new content process, the template process forks itself, but do nothing in template itself.  In new process, it recreate all threads and sockets found in the template process.  AFAIK, we only need to recreate sockets and threads to run new content process.  Other fds can be inherited from the template and used directly, although IPC needs some extra-works.
Attached patch WIP for comment 3 (obsolete) — — Splinter Review
This patch fork content processes from a template process, I call it Nuwa process.  The Nuwa process would freeze itself at some point (1.5s for now) by blocking all threads except main thread and IO thread.  Once parent process ask for a new process, the Nuwa process forks itself.  Since all threads but main thread do not survive crossing fork, this patch will remember status of all threads at frozen point to recreate these threads at new processes and restore their states.

This patch also recreate epoll fd and IPC protocol actors.  For IPC protocol actors, we have to do some minor change for parent side to create a new pipe and an actor for each protocol for a new process.  For epoll fd, we need to remember status of each epoll fd, and rebuild these fds for new processes.

There are more resources that should be rebuilt for new processes.  Cervantes shall follow up.
Thinker, do you expect this approach to win anything over the current "preallocated process" hack?
(In reply to Chris Jones [:cjones] [:warhammer] from comment #5)
> Thinker, do you expect this approach to win anything over the current
> "preallocated process" hack?

I think so if it has properly implemented.
The only negative impaction is the overhead of wrappers.  I have no numbers, but I guess wrappers are not intensively touched because most wrappers are for resource allocations.  For example, we don't allocate socket pairs or epoll fds time to time.  This approach at least reduces overhead of creation of content processes in memory by sharing data, and in CPU by skipping initialization and dynamic linking.  We can do even more by loading frequently used JS modules in preparation phase to improve start time more of applications.

Maybe, maintenance is another issue.  The implementation is not easy to be sound.  It can lost some stats for some kind of resources that was never used before.  So, I have added lines of precondition checking to make sure all threads are aware of and ready for freezing and forking.  In fortunate, we don't need to handle all cases for content processes doing only limited side effects for limitations of process capabilities.  And, we don't do complicated works during preparation phase of the template process.

Not do it now, but I have some more ideas for applying this hacky mechanism for other improvements.  For example, freeze and save states of dialer, home screen, and other important apps for rebuilding processes for theses app from the saved states.  We may do some GC hack to share more data for frequently used JS code.
Attached patch WIP v2 (obsolete) — — Splinter Review
Updates:
- Bug fixes
- Added replication of signal fds created by pipe() or socketpair() calls.
Attached patch WIP, v3 (obsolete) — — Splinter Review
There is still some bugs, but can be a demo.  It save a lot of memory.  With this patch, unagi can open 6~10 apps while 4~6 apps before applying the patch.  The best part of this patch is preallocated processes cost very few memory since most memory are shared with others. And it takes little CPU time to create a new content process, it just fork from the template and recreate threads and sockets.  Since preallocated process cost little memory, we don't need to kill it for OOM.  It means we can get homescreen back faster.
Attachment #716995 - Attachment is obsolete: true
Attachment #721638 - Attachment is obsolete: true
Assignee: nobody → tlee
I believe this is something worth more people to look at and provide their feedback. Jonas, Andreas, who can be the guy to help Thinker in case cjones is not available?
Flags: needinfo?(jonas)
Flags: needinfo?(gal)
I may be the next best thing after cjones.  The benefits thinker has here sound good to me.  Like cjones I'm not entirely convinced that we can do this safely, but I'm prepared to be impressed!
Flags: needinfo?(jonas)
Flags: needinfo?(gal)
I think that the significant savings comes from having the libxul.so text pages being shared between processes rather than being re-loaded and re-linked.

It may make sense to consider a hybrid approach where we fork just after loading libxul.so to create the "template" and then continue execution as previously done for the preallocated process.

The .text section in libxul.so is right around 13.5 Mb, so being able to share those pages seems like a significant win. The .rodata is another almost 3Mb.

If we fork before creating any other threads or other data structures which need to be "fixed" then it would dramatically reduce the maintenance aspects and still give us most of the win.
I don't know enough on this topic to have a strong opinion. But being able to share the Gecko code pages sounds very valuable.

The only concern that I would have is that it makes the ASLR weaker since all child processes have the same memory layout. But that sounds ok as long as the parent process has a different layout.
We can share most of libxul's .text pages today, right?  That's just basic shared library infrastructure.  Relocations are another story, of course.
The memory saving comes from two major parts, GOT(reloacation) and data generated at initialization stage and never changed later.  The later one is comprised by various parts, include our typelibrary and some tables for optimization.  This patch freeze the template process after loading preloaded JS (in TabChild).  So. some of memory saving comes from JS data (source file and some others), but do not include objects generated by the JS for changes during run time.
Attached patch WIP, v4 (obsolete) — — Splinter Review
Fix some bugs, it seems nearly ready in functionality, but still need some clean up.
Attachment #727751 - Attachment is obsolete: true
(In reply to Thinker Li [:sinker] from comment #16)
> https://wiki.mozilla.org/NuwaTemplateProcess explain the patch

Thinker, this is awesome.
Alias: Nuwa
One question I have about this is: Do you ever unfreeze the Nuwa process?

That is, do we run the Nuwa process to some point, freeze it, and then fork every parasite off the Nuwa process without thawing it?  Or do we freeze the Nuwa process every time we want to fork a new parasite and then thaw the Nuwa afterwards?
Never thaw it!  All threads of Nuwa process are frozen except the IO thread and the main thread for running IPC.
Attached patch WIP, v5 (obsolete) — — Splinter Review
Fix a dangling pointer issue.  This issue had struggled us for a few weeks by making odd behavior of cycle collector.
Attachment #728610 - Attachment is obsolete: true
(In reply to Thinker Li [:sinker] from comment #20)
> Created attachment 741410 [details] [diff] [review]
> WIP, v5
This patch had been tested on unagi with changeset 0a10eca0c521 of m-c.
Attached patch WIP v6 (obsolete) — — Splinter Review
- Fix zombie and a race condition.
- Some cleanup.
Attachment #741410 - Attachment is obsolete: true
Attached file memory report (obsolete) —
About memory from the current patch, with a hack to prevent the template and preallocated processes from re-fork()ing when killed (that's why all app processes are reparented to init).
I can't open this memory report; after gunzip'ing it, it still contains binary data.
Looks like this file is double-gzip'ed for some reason.  It worked after I gunzip'ed it twice.
It's hard for me to interpret this memory report; it would be helpful if you could include one with nuwa disabled, so we can diff them.  (In a recently nightly, you can diff straight from about:memory.)
Attached file Memory report with the patch —
Updated memory report.
Attachment #745826 - Attachment is obsolete: true
The tests are made using the following steps:
- kill the pre-opened apps (Homescreen and Usage). This is to maximize the memory sharing with the patch. This step doesn't make much difference without this patch.
- Open the following apps in order:
  * Communications
  * Messages
  * Settings
  * Template
  * Calendar
(With the patch the following apps can be opened without the above apps killed by the lowmem killer)
  * Clock
  * Email
  * Music
  * Video
The attachments are double gzip'd again. Looks like done by bugzilla during upload.
> (With the patch the following apps can be opened without the above apps killed by the lowmem killer)
>  * Clock
>  * Email
>  * Music
>  * Video

Wow.  That's impressive.
Attached file Memory report without the patch (obsolete) —
Let's see if I can make this work.
> The attachments are double gzip'd again. Looks like done by bugzilla during upload.

It may be Firefox (??).  wget doesn't double-gzip.
Attachment #746214 - Attachment is obsolete: true
Anywho, the main thing to look at here is the change in USS (reported as "resident-unique").  It is

Communications: -4.7mb
Settings: -4.7mb
Messages: -4.3mb
Template: -4.9mb
Homescreen: -7.2mb (but 1.9mb of this is an image that's in the second report but not the first; i.e. I think the screen was off, or maybe a different app was showing)

So this looks like a ~4.5mb win per child process, which is /fantastic/.  The Nuwa process itself has a USS of 4.5mb, which means that Nuwa is a win as soon as you have two apps open, and it may be neutral with even one app.

We can see where the savings are coming from by looking at the proportional set size of one of the apps.  Here's how the PSS of the template app changed after Nuwa.

-5.36 MB (100.0%) -- pss
├──-3.14 MB (58.59%) -- shared-libraries
│  ├──-2.39 MB (44.60%) -- shared-libraries-mozilla
│  │  ├──-2.30 MB (42.98%) -- libxul.so
│  │  │  ├──-1.74 MB (32.48%) ── [rw-p]
│  │  │  └──-0.56 MB (10.50%) ── [r-xp] [2]
│  │  ├──-0.08 MB (01.51%) ++ libnss3.so
│  │  └──-0.01 MB (00.11%) ++ (2 tiny)
│  └──-0.75 MB (13.99%) -- shared-libraries-other
│     ├──-0.55 MB (10.21%) ++ (43 tiny)
│     ├──-0.07 MB (01.39%) -- libmedia.so
│     │  ├──-0.06 MB (01.08%) ── [rw-p]
│     │  └──-0.02 MB (00.31%) ── [r-xp]
│     ├──-0.07 MB (01.28%) -- libcrypto.so
│     │  ├──-0.06 MB (01.08%) ── [rw-p]
│     │  └──-0.01 MB (00.20%) ── [r-xp]
│     └──-0.06 MB (01.11%) -- libstagefright.so
│        ├──-0.03 MB (00.62%) ── [rw-p]
│        └──-0.03 MB (00.49%) ── [r-xp]
├──-2.14 MB (39.88%) -- anonymous
│  ├──-2.13 MB (39.81%) ── anonymous, outside brk()/[rw-p] [27]
│  └──-0.00 MB (00.07%) ── anonymous, within brk()/[rw-p]
├──-0.08 MB (01.48%) -- other-files
│  ├──-0.07 MB (01.39%) ── omni.ja/[r--p]
│  └──-0.00 MB (00.09%) ++ (2 tiny)
└──-0.00 MB (00.05%) ── main thread's stack/[rw-p]

Notice that 3/5 of the improvement comes from shared libraries; I presume this is savings from relocations.  The other 2/5 of the improvement comes from the heap.

This is really great stuff, guys.  Let me know if you'd like me to do any other analysis on it.
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P1]
Some update: we rebased the patch to the latest m-c and found regressions: new threads need freezing, new protocols need cloning, etc. We are fixing the regressions and prepare for review once the regressions are fixed. In the meantime, we will update the wiki for a more detailed description of how it works and more importantly, the gotchas and how to fix them.
This needs an automated test at landing that runs in our automation and shows up red on checkin in our waterfall, otherwise this will regress instantly. Can you enable this for Linux dynamically and write a test that exercises it in our Linux test automation?
If this replaces our "preallocated app process" (I'm not sure if it does), we already have some tests for that in dom/browser-element/priority, and it's trivial to add more.
(In reply to Justin Lebar [:jlebar] from comment #37)
> If this replaces our "preallocated app process" (I'm not sure if it does),
> we already have some tests for that in dom/browser-element/priority, and
> it's trivial to add more.

It has its preallocated app process (forked from the template). I think we should be able to use the existing test cases and add new ones for the template.
Is there a reason that we can't merge the nuwa process and the preallocated process?  That is, we'd do all of the work to create a preallocated process inside the nuwa process.
(In reply to Justin Lebar [:jlebar] from comment #39)
> Is there a reason that we can't merge the nuwa process and the preallocated
> process?  That is, we'd do all of the work to create a preallocated process
> inside the nuwa process.

To clarify, from ContentParent's view, the preallocated process, with or without the template, is the same. When launching a new app, we check if there is any ContentParent instance that represents the preallocated process. If yes, then we use the instance with the underlying child process states ready. The difference is only in the code path of creating/finding spare ContentParent instances.
> To clarify, from ContentParent's view, the preallocated process, with or without the 
> template, is the same.

I understand.  I'm asking why it has to be that way.  Is it possible for us to load all the stuff that we load into the preallocated process into the nuwa process instead?
I think Justin is talking about homescreen and usage they are loaded after booting up with preallocated process, not with Nuwa. Right?!  These two preloaded apps are launched before the Nuwa process being ready, so they are launched by running moz_plugin directly.  I think we can postpone the launching of homescreen and usage app until the Nuwa process is ready.  I am not sure if it causes a longer booting time.  Maybe it is not longer, or maybe we can fix the problem.
What you're talking about is a problem, but not the one I'm talking about.  :)

What I mean is, the nuwa process starts up and does some gecko initialization.  My question is, could we do all of the preallocated process initialization?  Then the nuwa process would /be/ the preallocated process.
Do you mean to call PreloadSlowThings()?
Sure!  But, I am not sure how many protocols would be involved in these stuffs, it takes time to check these protocols to make sure they are working properly after preloading.  So, I suggest to have another patch for that.
> So, I suggest to have another patch for that.

Yeah, I think that's the right thing to do.  But if we can do this, it might be another large win.
Attachment #753228 - Flags: review?(justin.lebar+bug)
In the patch the feature is not enabled by default. Defining MOZ_NUWA_PROCESS before building enables the feature.

We will add test cases and update wiki for a more detailed description in parallel while the patch is under review.

For PreloadSlowThings() and integrating with PreallocatedProcessManager, we will track them in separate bugs.
Comment on attachment 753228 [details] [diff] [review]
Support content process forked from a template process on b2g

Well, this is going to be a fun review.  :)

Thinker and Cervantes, I presume that you've both already reviewed and are happy with this whole patch?

Is it possible for you to split this up into smaller pieces that still make sense?  Looking over the diffstat, this patch touches stuff all over the place: gfx, netwerk, js, indexeddb, imagelib...  At the very least, it would be helpful to have the patch split along these lines so different people can review those bits.
> Thinker and Cervantes, I presume that you've both already reviewed and are
> happy with this whole patch?
> 
Yes, we are happy with the patch. Since some part of it is really hacky (especially the dark art part), we think it'd be good to disable it by default initially and let people interested in it play with it and provide feedback/comment/review. 

> Is it possible for you to split this up into smaller pieces that still make
> sense?  Looking over the diffstat, this patch touches stuff all over the
> place: gfx, netwerk, js, indexeddb, imagelib...  At the very least, it would
> be helpful to have the patch split along these lines so different people can
> review those bits.
The touched components mostly are due to the need to freeze the thread created in these modules in the template process. It's good to receive the comments from whom work on these modules though the changes are simple in these modules. We will provide the patch as separate parts.
Attached patch Part 1: the Nuwa API and low-level wrappers. (obsolete) — — Splinter Review
Attachment #743546 - Attachment is obsolete: true
Attachment #753228 - Attachment is obsolete: true
Attachment #753228 - Flags: review?(justin.lebar+bug)
Attachment #755319 - Flags: review?(justin.lebar+bug)
Attached patch Part 2: IPC and glue changes. (obsolete) — — Splinter Review
Attachment #755320 - Flags: review?(justin.lebar+bug)
Attachment #755321 - Flags: review?(justin.lebar+bug)
Attached patch Part 5: PContent protocol changes (obsolete) — — Splinter Review
Attachment #755324 - Flags: review?(justin.lebar+bug)
Attachment #755326 - Flags: review?(justin.lebar+bug)
Attachment #755327 - Flags: review?(justin.lebar+bug)
Attached patch Part 8: process initialization flow changes (obsolete) — — Splinter Review
Attachment #755328 - Flags: review?(justin.lebar+bug)
Hi Justin,

I request you as the reviewer for all the parts. If you prefer others for reviewing some parts please help include other reviewers. Thanks.
Update part 6: take out the parts that fixes the crash in cycle collector.

The crash happens with or without the Nuwa process patch, but with Nuwa it's more easily reproduced. Open another bug for the crash.
Attachment #755326 - Attachment is obsolete: true
Attachment #755326 - Flags: review?(justin.lebar+bug)
Attachment #755840 - Flags: review?(justin.lebar+bug)
Sorry, correction to comment #59: the removed part is not a fix to cycle collector but only debug code for a crash. It's for tracking a problem of calling virtual method from base constructor. So no new bug for that.
Attached patch Part 1: the Nuwa API and low-level wrappers. (obsolete) — — Splinter Review
Update build flags to make MOZ_NUWA_PROCESS flag enabled only for gonk.
Assignee: tlee → cyu
Attachment #755319 - Attachment is obsolete: true
Attachment #755319 - Flags: review?(justin.lebar+bug)
Attachment #757834 - Flags: review?(justin.lebar+bug)
Attached patch Part 2: IPC and glue changes. (obsolete) — — Splinter Review
Fix IPC code build error for OS_WIN.
Attachment #755320 - Attachment is obsolete: true
Attachment #755320 - Flags: review?(justin.lebar+bug)
Attachment #757835 - Flags: review?(justin.lebar+bug)
Fix build error for OS_WIN: disable specific protocol cloning.
Attachment #755322 - Attachment is obsolete: true
Attachment #755322 - Flags: review?(justin.lebar+bug)
Attachment #757837 - Flags: review?(justin.lebar+bug)
Attached patch Part 5: PContent protocol changes (obsolete) — — Splinter Review
Fix compiler warnings.
Attachment #755324 - Attachment is obsolete: true
Attachment #755324 - Flags: review?(justin.lebar+bug)
Attachment #757838 - Flags: review?(justin.lebar+bug)
Rebase and resolve conflict.
Attachment #755840 - Attachment is obsolete: true
Attachment #755840 - Flags: review?(justin.lebar+bug)
Attachment #757841 - Flags: review?(justin.lebar+bug)
Comment on attachment 755328 [details] [diff] [review]
Part 8: process initialization flow changes

We wait one second before freezing the nuwa process.  What happens if, when the second is up, the nuwa process is not finished doing whatever it normally does?

Also, these casts of function pointers are really scary.  It's not clear to me why we need them at all, but in any case can we write functions with the right signatures so we can get rid of the casts?
Comment on attachment 755321 [details] [diff] [review]
Part 3: IPC glue and IPDL codegen to support protocol cloning.

bent, this one is pretty deep IPC stuff.  I'm happy to do the review if you'd like, but you probably know this code better than anyone.  Would you like to review this?
Attachment #755321 - Flags: review?
Flags: needinfo?(bent.mozilla)
Two general comments that I did not make because they appear all over the place:

The use of the word 'parasite' in this patch is confusing.  I think sometimes when you use it you're referring to the thread that was stopped (for example, parasitedThreadID) while other times you're referring to the act of restarting the thread after fork()ing (for example, paraFunction).  I suspect that for the first case the word you're looking for is 'paralyzed'.  I'm not sure that there's a better word for the second case than 'restart' or 'recreate'.

I'm hardly a pthreads expert but aren't comparisons of thread ids supposed to be done with pthread_equal() instead of operator==?
Comment on attachment 757834 [details] [diff] [review]
Part 1: the Nuwa API and low-level wrappers.

Review of attachment 757834 [details] [diff] [review]:
-----------------------------------------------------------------

::: configure.in
@@ +8815,5 @@
>  AC_SUBST(MOZ_PNG_CFLAGS)
>  AC_SUBST(MOZ_PNG_LIBS)
>  
> +if test "$MOZ_WIDGET_TOOLKIT" = gonk -a -n "$MOZ_NUWA_PROCESS"; then
> +    export MOZ_NUWA_PROCESS

Do you really need this export?  I would expect the AC_DEFINE/AC_SUBST to be enough.

@@ +8817,5 @@
>  
> +if test "$MOZ_WIDGET_TOOLKIT" = gonk -a -n "$MOZ_NUWA_PROCESS"; then
> +    export MOZ_NUWA_PROCESS
> +    AC_DEFINE(MOZ_NUWA_PROCESS)
> +    AC_SUBST(MOZ_NUWA_PROCESS)

You should AC_SUBST unconditionally (outside of the if loop).

::: mozglue/build/Makefile.in
@@ +84,5 @@
>  endif
>  endif
>  
>  ifeq (Android,$(OS_TARGET))
> +ifeq (1, $(MOZ_NUWA_PROCESS))

ifdef MOZ_NUWA_PROCESS should work fine.

@@ +118,5 @@
>  ifeq (Android, $(OS_TARGET))
>  WRAP_LDFLAGS := $(filter -Wl%,$(WRAP_LDFLAGS))
>  endif
>  
> +ifeq (1, $(MOZ_NUWA_PROCESS))

Same here.

::: mozglue/build/Nuwa.cpp
@@ +9,5 @@
> +/* Hacking pthread to implement Zygote mode preforking. */
> +
> +/* Nuwa is a goddess who create mankind according her figure in the
> + * ancient Chinese mythology.
> + */

This comment should be in the header.

Also it should be "... created mankind according to ..."

@@ +77,5 @@
> +static bool sNuwaReady = false;  // Nuwa process is ready.
> +static bool sNuwaPendingSpawn = false; // Are there any pending spawn requests?
> +static bool sNuwaForking = false;
> +
> +// Information of fds of transports of top level protocols.

"fds of transports of top level protocols" is sufficient, I think.

@@ +85,5 @@
> +typedef std::vector<std::pair<pthread_key_t, void *>> TLSInfoList;
> +
> +#ifndef NUWA_STACK_SIZE
> +#define NUWA_STACK_SIZE (1024 * 32)
> +#endif

Is there any reason that 32k was chosen?  If there is it would be good to document it in a comment.

@@ +121,5 @@
> +static void
> +RunCustomizedParasite() {
> +  thread_info_t *tinfo = sCurrentParasitingThread;
> +  if (tinfo->paraFunc != NULL)
> +    tinfo->paraFunc(tinfo->paraArg);

Super nit: use braces even on one line ifs.

@@ +128,5 @@
> +/**
> + * Every thread should be marked as ethier TINFO_FLAG_NUWA_SUPPORT or
> + * TINFO_FLAG_NUWA_SKIP, or it means a potential error.  We force
> + * Gecko code to mark every single thread going to be recreated by
> + * Nuwa to make sure no unknown accident.

s/ethier/either/

to mark every single thread to make sure there are no accidents when recreating threads with Nuwa.

@@ +161,5 @@
> +/**
> + * This mutex protects the access to thread info:
> + * sAllThreads, sThreadCount, sThreadFreezeCount, sParasiteVIPCount.
> + */
> +static pthread_mutex_t sThreadCountLock = PTHREAD_MUTEX_INITIALIZER;

I think it would be useful if the order in which sThreadFreezeLock, sThreadCountLock, sForkLock, and sSpecKeyLock must be acquired is documented in a comment.

@@ +176,5 @@
> +static pthread_mutex_t sForkLock = PTHREAD_MUTEX_INITIALIZER;
> +static pthread_cond_t sForkWaitCond = PTHREAD_COND_INITIALIZER;
> +
> +
> +static pthread_mutex_t sSpecKeyLock = PTHREAD_MUTEX_INITIALIZER;

A comment explaining what this lock is for wouldn't hurt either.

@@ +208,5 @@
> +/**
> + * Get thread info of the current thread.
> + *
> + * @return thread_info_t for the current thread.
> + */

Move this comment inside the #if !defined.

@@ +238,5 @@
> +  pthread_mutex_unlock(&sThreadCountLock);
> +  return NULL;
> +}
> +#define CUR_THREAD_INFO GetCurThreadInfo()
> +#define SET_THREAD_INFO(x)

Is defining this to nothing intentional?  Might be worth a comment (e.g. #define SET_THREAD_INFO(x) /* nothing to do */)

@@ +267,5 @@
> +    }
> +
> +    void AddEvents(int aFd, Events &aEvents) {
> +      if (mEvents.find(aFd) != mEvents.end())
> +        abort();

Perhaps this check should be DEBUG only?

@@ +281,5 @@
> +    void ModifyEvents(int aFd, Events &aEvents) {
> +      const_iterator it = mEvents.find(aFd);
> +
> +      if (it == mEvents.end())
> +        abort();

Likewise.

@@ +289,5 @@
> +
> +    const Events &FindEvents(int aFd) const {
> +      const_iterator it = mEvents.find(aFd);
> +      if (it == mEvents.end())
> +        abort();

And maybe here too?

@@ +356,5 @@
> +  EpollManager() {}
> +};
> +
> +static thread_info_t *
> +pthread_info_new(void) {

I'm not sure I like prefixing our own functions with pthread_.  It's easy to confuse them for real pthreads functions when reading code.

@@ +389,5 @@
> +
> +static void
> +pthread_info_cleanup(void *arg) {
> +  if (!sIsNuwaProcess || sNuwaForking)
> +    return;

If this function is called when sNuwaForking is true, does the thread_info_t ever get cleaned up?

@@ +576,5 @@
> +}
> +
> +static pthread_mutex_t sParasiteWaitLock = PTHREAD_MUTEX_INITIALIZER;
> +static pthread_mutex_t sParasiteGateLock = PTHREAD_MUTEX_INITIALIZER;
> +static pthread_mutex_t sParasiteVIPGateLock = PTHREAD_MUTEX_INITIALIZER;

Documentation for these locks would be nice too.  Especially for the VIP lock.

@@ +622,5 @@
> + * Thread freeze points. Note that the freeze points are implemented as macros
> + * so as not to garble the content of the stack after setjmp().
> + *
> + * In the nuwa process, when the threads supporting nuwa calls the wrapper
> + * functions, freeze point 1 setjmp() to save the state. We only allow the

"calls setjmp()" or "setjmp()s"

@@ +625,5 @@
> + * In the nuwa process, when the threads supporting nuwa calls the wrapper
> + * functions, freeze point 1 setjmp() to save the state. We only allow the
> + * threads to be frozen in the wrapper functions. If thread freezing is not
> + * enabled yet, the wrapper functions act like their wrapped counterparts,
> + * except the extra actions in the freeze points. If thread freezing is enabled,

except for the extra actions

@@ +626,5 @@
> + * functions, freeze point 1 setjmp() to save the state. We only allow the
> + * threads to be frozen in the wrapper functions. If thread freezing is not
> + * enabled yet, the wrapper functions act like their wrapped counterparts,
> + * except the extra actions in the freeze points. If thread freezing is enabled,
> + * The threads will be frozen by calling one of the wrapper functions. The

'the threads' (lowercase)

@@ +627,5 @@
> + * threads to be frozen in the wrapper functions. If thread freezing is not
> + * enabled yet, the wrapper functions act like their wrapped counterparts,
> + * except the extra actions in the freeze points. If thread freezing is enabled,
> + * The threads will be frozen by calling one of the wrapper functions. The
> + * threads can be frozen in one of the following points:

in any of the following points

@@ +639,5 @@
> + * 3) Freeze point 2: blocks the current thread by acquiring sThreadFreezeLock.
> + *    If freezing is not enabled then revert the counter change in freeze
> + *    point 1.
> + */
> +#define THREAD_FREEZE_POINT1()                    \

Try to line up all the backslashes in the macros.

@@ +658,5 @@
> +      pthread_cond_signal(&sThreadChangeCond);     \
> +                                                   \
> +      if (sIsFreezing) {                           \
> +        REAL(pthread_mutex_lock)(&sThreadFreezeLock); \
> +        abort();                                  \

We never return from the pthread_mutex_lock call, right?  Can you add a comment?

@@ +690,5 @@
> +                                                  \
> +      if (sIsFreezing) {                          \
> +        freezePoint1 = true;                     \
> +        REAL(pthread_mutex_lock)(&sThreadFreezeLock); \
> +        abort();                                  \

Same here.

@@ +705,5 @@
> +    if (sNuwaReady && sIsNuwaProcess) {           \
> +      pthread_mutex_unlock(&sThreadCountLock);    \
> +      freezePoint2 = true;                       \
> +      REAL(pthread_mutex_lock)(&sThreadFreezeLock); \
> +      abort();                                    \

And here.

@@ +719,5 @@
> +    if (sNuwaReady && sIsNuwaProcess) {           \
> +      pthread_mutex_unlock(&sThreadCountLock);    \
> +      freezePoint2 = true;                       \
> +      REAL(pthread_mutex_lock)(&sThreadFreezeLock); \
> +      abort();                                    \

And here.

@@ +733,5 @@
> + * pthread_cond_wait() and pthread_cond_timedwait():
> + *
> + * These functions are wrapped by the above freeze point macros. By the time a
> + * new process is forked, the parasited thread will block in one of the wrapper
> + * functions. While in recreating the thread, we longjmp() to

When recreating the thread ...

@@ +1044,5 @@
> +   * pthread_create_startup().  And pthread_create_startup() will
> +   * return immediately after returning from real start routine, so
> +   * all collapsed values does not affect the result.
> +   *
> +   * All outter frames of pthread_create_startup() and

*outer

@@ +1333,5 @@
> +PrepareNuwaProcess() {
> +  sIsNuwaProcess = true;
> +  // Explicitly ignore SIGCHLD so we don't have to call watpid() to reap
> +  // dead child processes.
> +  signal(SIGCHLD, SIG_IGN);

Hmm, do we need to undo this somewhere?

@@ +1337,5 @@
> +  signal(SIGCHLD, SIG_IGN);
> +
> +  // Make marked threads block in one freeze point.
> +  REAL(pthread_mutex_lock)(&sThreadFreezeLock);
> +  // sleep(20);

Remove commented out code when we're done here.

@@ +1438,5 @@
> +  }
> +}
> +
> +/**
> + * The caller is at a line wished to recovery and return after parasiting.

"The caller is at the line it wishes to return to after the thread is recreated."

@@ +1440,5 @@
> +
> +/**
> + * The caller is at a line wished to recovery and return after parasiting.
> + *
> + * We wish the parasited thread restart at the calling line.  This

We wish to restart the paralyzed thread at the calling line.

@@ +1441,5 @@
> +/**
> + * The caller is at a line wished to recovery and return after parasiting.
> + *
> + * We wish the parasited thread restart at the calling line.  This
> + * function return true for returning at Nuwa process, and return

returns true in the Nuwa process, returns false in the forked process.

::: mozglue/build/Nuwa.h
@@ +22,5 @@
> +
> +#define NUWA_NEWFD_PARENT 0
> +#define NUWA_NEWFD_CHILD 1
> +
> +extern "C" {

I think it would make more sense to name these functions NuwaXXXX.  Right now whether "Nuwa" comes in the middle somewhere or at the end, and there doesn't seem to be any consistency.

@@ +26,5 @@
> +extern "C" {
> +
> +/**
> + * Block further requests to spawn a new process.
> + */

I also think it would be useful to note what thread these functions can be called on, and in the case of functions that take callbacks, what thread the callbacks are invoked on.

@@ +53,5 @@
> +
> +/**
> + * Marks the current thread as not supporting Nuwa.
> + */
> +void MarkThreadNuwaSkip() __attribute__((weak));

What does it mean for a thread to not support Nuwa, and how is that different from a thread that calls neither MarkThreadNuwa nor MarkThreadNuwaSkip?
Comment on attachment 757835 [details] [diff] [review]
Part 2: IPC and glue changes.

Review of attachment 757835 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/GeckoChildProcessHost.cpp
@@ +40,5 @@
> +// #ifdef MOZ_NUWA_PROCESS
> +#include "nsTArray.h"
> +#include "nsClassHashtable.h"
> +#include "nsHashKeys.h"
> +// #endif

We generally don't conditionally include headers that aren't platform specific, even if we only use them in platform specific code.  So remove the commented out #ifdef/#endif please.
Comment on attachment 755321 [details] [diff] [review]
Part 3: IPC glue and IPDL codegen to support protocol cloning.

Review of attachment 755321 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/ProtocolUtils.h
@@ +165,5 @@
> +                  ProtocolCloneContext *aCtx);
> +};
> +
> +/**
> + * All top-level protocols should inherits this class.

... should inherit this ...
Comment on attachment 757838 [details] [diff] [review]
Part 5: PContent protocol changes

Review of attachment 757838 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ContentChild.cpp
@@ +1291,5 @@
> +class CallSpawnNuwa : public nsRunnable {
> +public:
> +    NS_IMETHOD Run() {
> +        SpawnNuwa();
> +        if (!IsNuwaProcess()) {

How about just

if (IsNuwaProcess()) {
  return NS_OK;
}

// In new process
...

::: dom/ipc/ContentParent.cpp
@@ +211,5 @@
> +// This number is fairly arbitrary ... the intention is to put off
> +// launching another app process until the last one has finished
> +// loading its content, to reduce CPU/memory/IO contention.
> +static int sPreallocateDelayMs;
> +// XXX Removed in the latest revision. Temporarily add them back.

I don't know what to do with code that's been removed in later versions ;-)

@@ +226,5 @@
>  // The first content child has ID 1, so the chrome process can have ID 0.
>  static uint64_t gContentChildID = 1;
>  
> +
> +// The sNuwaProcess point to Nuwa process for create new process later.

sNuwaProcess points to the Nuwa process which is used for forking new processes later.

::: dom/ipc/PContent.ipdl
@@ +356,5 @@
>  
>      FileSystemUpdate(nsString fsName, nsString mountPoint, int32_t fsState,
>                       int32_t mountGeneration);
>  
> +    // Notify Nuwa process to create one more process.

Ask the Nuwa process to create a new child process.
Comment on attachment 757841 [details] [diff] [review]
Part 6: support thread parasite of threads created in the template process.

Review of attachment 757841 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/chromium/src/chrome/common/child_thread.cc
@@ +51,5 @@
> +    MarkThreadNuwa(nullptr, nullptr);
> +    if (!ParasitePointThreadNuwa()) {
> +	NS_RUNTIMEABORT("Should not be here!");
> +    }
> +}

This file seems to use 2 space indenting, not 4.
Comment on attachment 755328 [details] [diff] [review]
Part 8: process initialization flow changes

Review of attachment 755328 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/gonk/nsAppShell.cpp
@@ +648,5 @@
>          obsServ->AddObserver(this, "browser-ui-startup-complete", false);
>      }
>  
> +#ifdef MOZ_NUWA_PROCESS
> +    // Make sure main thread was waked up after Nuwa fork

was woken up
(In reply to Justin Lebar [:jlebar] from comment #66)
> We wait one second before freezing the nuwa process.  What happens if, when
> the second is up, the nuwa process is not finished doing whatever it
> normally does?

I discussed this with Cervantes and Thinker in .tw and I think we agreed that we need a more deterministic point to freeze the Nuwa process at before we can ship this.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #68)
> 
> The use of the word 'parasite' in this patch is confusing.  I think
> sometimes when you use it you're referring to the thread that was stopped
> (for example, parasitedThreadID) while other times you're referring to the
> act of restarting the thread after fork()ing (for example, paraFunction).  I
> suspect that for the first case the word you're looking for is 'paralyzed'. 
> I'm not sure that there's a better word for the second case than 'restart'
> or 'recreate'.
> 
We need a clearer clarification of the terms used:
Parasite: actually means the action of recreating "some resource". For example, after fork() from the Nuwa process, we create a new thread. Then instead of growing the stack from the thread entry function, we replace the stack directly, like a parasite occupying its host.
parasitedTheadID refers to the thread in the forked() process. I will rename 'threadID' to nuwaTheadID to make it clear that which refers to the thread ID in the Nuwa process and which one refers to the one in the forked process.
For the thread that was stopped, we refer it to being frozen.
The largest confusion may come from ContentChild.cpp that it call DoParasitePoint() in the Nuwa process. It's like taking snapshot and continue running. We'll use the term 'Checkpoint' or 'Snapshot' to make it clear. 

> I'm hardly a pthreads expert but aren't comparisons of thread ids supposed
> to be done with pthread_equal() instead of operator==?
Will change that. We are too used to the fact that the underlying implementation is of integral type :).
> Parasite: actually means the action of recreating "some resource". For example, after 
> fork() from the Nuwa process, we create a new thread. Then instead of growing the stack 
> from the thread entry function, we replace the stack directly, like a parasite occupying 
> its host.

Okay, but "parasite" still isn't the right word.  In general usage, "parasite" is not a verb.  But also, when a parasite infects a host, the host lives on; the parasite is merely attached to the host.  When we fork the Nuwa process and stick the old thread into the new thread, the old thread no longer exists; we can't kill the parasite without killing the host, because they're now the same thing.

We could say that the fork()'ed process "adopts" the Nuwa thread's stack.
Adoption still feels strange, especially for other resources like file descriptors. I think a plain and simple word, "re-creation", should work.
Generally, we'll update the patch for more documentation and naming issues such as disambiguation of 'parasite' and the confusing use of 'pthread' in the method names.

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #69)
> Comment on attachment 757834 [details] [diff] [review]
> Part 1: the Nuwa API and low-level wrappers.
> 
> Review of attachment 757834 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: configure.in
> @@ +8815,5 @@
> >  AC_SUBST(MOZ_PNG_CFLAGS)
> >  AC_SUBST(MOZ_PNG_LIBS)
> >  
> > +if test "$MOZ_WIDGET_TOOLKIT" = gonk -a -n "$MOZ_NUWA_PROCESS"; then
> > +    export MOZ_NUWA_PROCESS
> 
> Do you really need this export?  I would expect the AC_DEFINE/AC_SUBST to be
> enough.
> 
Yes, it needs to be propagated to js/src/configure.in. 

> @@ +85,5 @@
> > +typedef std::vector<std::pair<pthread_key_t, void *>> TLSInfoList;
> > +
> > +#ifndef NUWA_STACK_SIZE
> > +#define NUWA_STACK_SIZE (1024 * 32)
> > +#endif
> 
> Is there any reason that 32k was chosen?  If there is it would be good to
> document it in a comment.
> 
I think it is a daring assumption from the observation in the Nuwa process that the threads that we intend to take snapshots and then freeze will not exceed this stack size. It's set to this size to reduce the memory used by the Nuwa process.

> @@ +238,5 @@
> > +  pthread_mutex_unlock(&sThreadCountLock);
> > +  return NULL;
> > +}
> > +#define CUR_THREAD_INFO GetCurThreadInfo()
> > +#define SET_THREAD_INFO(x)
> 
> Is defining this to nothing intentional?  Might be worth a comment (e.g.
> #define SET_THREAD_INFO(x) /* nothing to do */)
Yes, the macro is used in _pthread_create_startup(). It's used if the __thread specifier is supported. I will add a comment.

> 
> @@ +267,5 @@
> > +    }
> > +
> > +    void AddEvents(int aFd, Events &aEvents) {
> > +      if (mEvents.find(aFd) != mEvents.end())
> > +        abort();
> 
> Perhaps this check should be DEBUG only?
It's intentional. It can be seen from the patches that many assertions are made not only in the debug build but also in the release build. We follow the 'die early principal' and want it to break very obviously when it breaks. Another example is in protocol cloning. We place an abort in the release build to make it obvious when more IPC protocol objects need to be cloned.

> 
> I'm not sure I like prefixing our own functions with pthread_.  It's easy to
> confuse them for real pthreads functions when reading code.
> 
> @@ +389,5 @@
> > +
> > +static void
> > +pthread_info_cleanup(void *arg) {
> > +  if (!sIsNuwaProcess || sNuwaForking)
> > +    return;
> 
> If this function is called when sNuwaForking is true, does the thread_info_t
> ever get cleaned up?
> 
This looks like to be a bug. We'll check that.

> 
> @@ +1333,5 @@
> > +PrepareNuwaProcess() {
> > +  sIsNuwaProcess = true;
> > +  // Explicitly ignore SIGCHLD so we don't have to call watpid() to reap
> > +  // dead child processes.
> > +  signal(SIGCHLD, SIG_IGN);
> 
> Hmm, do we need to undo this somewhere?
I can't think of any situation where we need to catch SIGCHLD and wait() to get the exit status of the content process. The chrome process should know the death of the content process via the breakage of the IPC channel.

> 
> @@ +53,5 @@
> > +
> > +/**
> > + * Marks the current thread as not supporting Nuwa.
> > + */
> > +void MarkThreadNuwaSkip() __attribute__((weak));
> 
> What does it mean for a thread to not support Nuwa, and how is that
> different from a thread that calls neither MarkThreadNuwa nor
> MarkThreadNuwaSkip?
I will update the comment to make it clearer. For thread snapshot/freezing and re-creation, we need to mark every thread either as needing re-creation or not. We count every thread created in the Nuwa process using the pthread_create() wrapper. The Nuwa process is considered ready if the number of frozen thread + the number of skipped threads = the number of created threads. So if a thread is created but not marked, the Nuwa process will never be ready and this is a bug we need to fix. This happened in rebasing the patch and we fix it by marking the threads in the image decoding thread pool.
(In reply to Justin Lebar [:jlebar] from comment #66)
> Comment on attachment 755328 [details] [diff] [review]
> Part 8: process initialization flow changes
> 
> We wait one second before freezing the nuwa process.  What happens if, when
> the second is up, the nuwa process is not finished doing whatever it
> normally does?

One second is for the rest of the Nuwa process to do preloading or initialization as much as possible.  So, it don't hurt any if the second is up and something is still running, becuase the Nuwa always wait for all threads until all are blocked/frozen.

We still need to have a deterministic way for avoiding wasting time on waiting while we have already preloaded and initialized all we want.  But, I don't think it is a prerequisite for committing or shipping code.
While updating the patch, I feel it difficult in choosing the right words to describe the ideas in the work. Your comments are very much appreciated:

* In the Nuwa process, the threads are [checked | tracked | marked] as supporting Nuwa.
* The action of using wrappers to keep track of thread stack or file descriptors in the Nuwa process is [infection | implantation].
* The action of bring the frozen thread back to life in the fork()ed process is [recovery | re-creation | revitalization | re-activation | reincarnation].
* In the fork()ed process, the action of replacing the underlying resource (like file descriptor, thread id) without upper layer code noticing the change is [re-creation | replacement] of these resources.
khuey knows the code best right now, so maybe he can help with comment 81.
Flags: needinfo?(khuey)
(In reply to Cervantes Yu from comment #81)
> While updating the patch, I feel it difficult in choosing the right words to
> describe the ideas in the work. Your comments are very much appreciated:
> 
> * In the Nuwa process, the threads are [checked | tracked | marked] as
> supporting Nuwa.

Marked.

> * The action of using wrappers to keep track of thread stack or file
> descriptors in the Nuwa process is [infection | implantation].

I don't think either of those fit.  How about "interception"?  The wrappers intercept the calls to poll/pipe/pthread_whatever so that we can run the code we need to make Nuwa work.

> * The action of bring the frozen thread back to life in the fork()ed process
> is [recovery | re-creation | revitalization | re-activation | reincarnation].

I would say the threads are re-created.

> * In the fork()ed process, the action of replacing the underlying resource
> (like file descriptor, thread id) without upper layer code noticing the
> change is [re-creation | replacement] of these resources.

Replacement.
Flags: needinfo?(khuey)
Attached patch Part 1: the Nuwa API and low-level wrappers. (obsolete) — — Splinter Review
Attachment #757834 - Attachment is obsolete: true
Attachment #757834 - Flags: review?(justin.lebar+bug)
Attachment #760849 - Flags: review?(khuey)
Attachment #760849 - Flags: review?(justin.lebar+bug)
Attached patch Part 1: the Nuwa API and low-level wrappers. (obsolete) — — Splinter Review
Attachment #757838 - Attachment is obsolete: true
Attachment #757838 - Flags: review?(justin.lebar+bug)
Attachment #760850 - Flags: review?(khuey)
Attachment #760850 - Flags: review?(justin.lebar+bug)
Attachment #757841 - Attachment is obsolete: true
Attachment #757841 - Flags: review?(justin.lebar+bug)
Attachment #760851 - Flags: review?(khuey)
Attachment #760851 - Flags: review?(justin.lebar+bug)
Attached patch Part 8: process initialization flow changes (obsolete) — — Splinter Review
Attachment #755328 - Attachment is obsolete: true
Attachment #755328 - Flags: review?(justin.lebar+bug)
Attachment #760852 - Flags: review?(khuey)
Attachment #760852 - Flags: review?(justin.lebar+bug)
Some notes about the updates: p1, p5, p6 and p8 are updated per comment #69. Other parts will be updated later. The main updates are renames and more documentation for p1.
> @@ +161,5 @@
> > +/**
> > + * This mutex protects the access to thread info:
> > + * sAllThreads, sThreadCount, sThreadFreezeCount, sParasiteVIPCount.
> > + */
> > +static pthread_mutex_t sThreadCountLock = PTHREAD_MUTEX_INITIALIZER;
> 
> I think it would be useful if the order in which sThreadFreezeLock,
> sThreadCountLock, sForkLock, and sSpecKeyLock must be acquired is documented
> in a comment.
> 
There are no specific order for acquiring these locks. Their purposes are more important than the order in which they are acquired. So I didn't add comment for this. I think the sequence of how the things work could better be updated in wiki.
Attached patch Part 1: the Nuwa API and low-level wrappers. (obsolete) — — Splinter Review
Attachment #761968 - Flags: review?(justin.lebar+bug)
Attachment #761968 - Flags: review?(khuey)
Attached patch Part 2: IPC and glue changes. (obsolete) — — Splinter Review
Attachment #757835 - Attachment is obsolete: true
Attachment #760849 - Attachment is obsolete: true
Attachment #757835 - Flags: review?(justin.lebar+bug)
Attachment #760849 - Flags: review?(khuey)
Attachment #760849 - Flags: review?(justin.lebar+bug)
Attachment #761969 - Flags: review?(khuey)
Attachment #761969 - Flags: review?(justin.lebar+bug)
Attachment #755321 - Attachment is obsolete: true
Attachment #755321 - Flags: review?(justin.lebar+bug)
Attachment #761970 - Flags: review?(khuey)
Attachment #761970 - Flags: review?(justin.lebar+bug)
Attached patch Part 5: PContent protocol changes (obsolete) — — Splinter Review
Attachment #760850 - Attachment is obsolete: true
Attachment #760850 - Flags: review?(khuey)
Attachment #760850 - Flags: review?(justin.lebar+bug)
Attachment #761971 - Flags: review?(khuey)
Attachment #761971 - Flags: review?(justin.lebar+bug)
Attachment #760851 - Attachment is obsolete: true
Attachment #760851 - Flags: review?(khuey)
Attachment #760851 - Flags: review?(justin.lebar+bug)
Attachment #761972 - Flags: review?(khuey)
Attachment #761972 - Flags: review?(justin.lebar+bug)
Attached patch Part 8: process initialization flow changes (obsolete) — — Splinter Review
Attachment #760852 - Attachment is obsolete: true
Attachment #760852 - Flags: review?(khuey)
Attachment #760852 - Flags: review?(justin.lebar+bug)
Attachment #761973 - Flags: review?(khuey)
Attachment #761973 - Flags: review?(justin.lebar+bug)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #72)
> Comment on attachment 757838 [details] [diff] [review]
> 
> ::: dom/ipc/ContentParent.cpp
> @@ +211,5 @@
> > +// This number is fairly arbitrary ... the intention is to put off
> > +// launching another app process until the last one has finished
> > +// loading its content, to reduce CPU/memory/IO contention.
> > +static int sPreallocateDelayMs;
> > +// XXX Removed in the latest revision. Temporarily add them back.
> 
> I don't know what to do with code that's been removed in later versions ;-)

They are moved to PreallocatedProcessManager class for handling process preallocation. We plan to integrate Nuwa preallocation into PreallocatedProcessManager in a followup bug.
(In reply to Justin Lebar [:jlebar] from comment #67)
> Would you like to review this?

Yeah, once khuey is satisfied you can give it to me.
Flags: needinfo?(bent.mozilla)
Comment on attachment 761968 [details] [diff] [review]
Part 1: the Nuwa API and low-level wrappers.

Review of attachment 761968 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good.  I cleared the review flag rather than giving r+ because I would like to see it again.

One thing to consider that hadn't occurred to me before is how we want to handle cleaning up the Nuwa data/structs/etc in the forked process.  Freeing that stuff is the natural thing to do but it means writing to pages and possibly defeating copy-on-write.  If we leave things alone then we're technically using more memory but because of the copy-on-write sharing it may not hurt us.  Maybe we should try to allocate all of our memory for Nuwa on separate pages from the regular heap and then just drop it all after we fork()?  We can deal with this in a followup bug though.

::: mozglue/build/Makefile.in
@@ +121,5 @@
>  
> +ifdef MOZ_NUWA_PROCESS
> +EXPORTS_NAMESPACES += ipc
> +EXPORTS_ipc = Nuwa.h
> +endif

You should convert this to the new moz.build format before landing.

::: mozglue/build/Nuwa.cpp
@@ +91,5 @@
> +#define NUWA_STACK_SIZE (1024 * 32)
> +#endif
> +
> +typedef struct thread_info {
> +  struct thread_info *next, **prev;

Can we change this struct to use mozilla/LinkedList.h, and then avoid implementing a linked list ourselves here?

@@ +101,5 @@
> +
> +  int flags;
> +
> +  void *(*startup)(void *arg);
> +  void *arg;

Maybe this should be named startupFunc and startupArg (to match recrFunc/recrArg below)?

@@ +150,5 @@
> +  void *arg;
> +} nuwa_construct_t;
> +
> +static nuwa_construct_t *sConstructors = NULL;
> +static nuwa_construct_t *sFinalConstructors = NULL;

Maybe these should just be std::vector<nuwa_construct_t>?  Then you don't have to worry about the leak I point out later.

@@ +163,5 @@
> + * calling thread.
> + */
> +static pthread_mutex_t sThreadFreezeLock = PTHREAD_MUTEX_INITIALIZER;
> +
> +static thread_info_t *sAllThreads = NULL;

I pointed this out somewhere else, but this list is never cleaned up.

@@ +197,5 @@
> + *
> + * @return thread_info_t which has threadID == specified threadID
> + */
> +static thread_info_t *
> +GetThreadInfo(pthread_t threadID) {

I think this function (and several others with complexing locking paths) would be better written as:

static thread_info_t *
GetThreadInfoInner(pthread_t threadID) {
  // Search the list
}

static thread_info_t *
GetThreadInfo(pthread_t threadID) {
  if (sIsNuwaProcess) {
    // Lock
  }
  thread_info_t *info = GetThreadInfoInner(threadID);
  if (sIsNuwaProcess) {
    // Unlock
  }
  return info;
}

@@ +204,5 @@
> +  }
> +  thread_info_t *tinfo = sAllThreads;
> +
> +  while (tinfo) {
> +    if (tinfo->origThreadID == threadID) {

pthread_equal?

@@ +219,5 @@
> +
> +  return tinfo;
> +}
> +
> +#if !defined(HAVE_THREAD_TLS_KEYWORD)

Does B2G HAVE_THREAD_TLS_KEYWORD or not?  Why are we implementing whichever one is not B2G?  For testing on desktop linux?

@@ +226,5 @@
> + *
> + * @return thread_info_t for the current thread.
> + */
> +static thread_info_t *
> +GetCurThreadInfo() {

This function would be much cleaner with only one locking path and one loop (using pointer-to-member).  I *think* pointer-to-member works on struct members too.  Something like

GetCurThreadInfo() {
  REAL(pthread_mutex_loc)(&sThreadCountLock);
  thread_info_t *tinfo = sAllThreads;
  pthread_t selfThreadID = REAL(pthread_self)();

  pthread_t* thread_info_t::*threadIDptr = sIsNuwaProcess ? &thread_info_t::origThreadID : &thread_info_t::recreatedThreadID;

  while (tinfo) {
    if (pthread_equal(tinfo.*threadIDptr, threadID)) {
      break;
    }
  }

  pthread_mutex_unlock(&sThreadCountLock);
  return tinfo;
}

@@ +284,5 @@
> +    void AddEvents(int aFd, Events &aEvents) {
> +      if (mEvents.find(aFd) != mEvents.end()) {
> +        abort();
> +      }
> +      mEvents[aFd] = aEvents;

std::pair<iterator, bool> pair = mEvents.insert(std::makepair(aFd, aEvents));
if (!pair.second) {
  abort();
}

avoid searching twice.  Or something like that anyways.

@@ +291,5 @@
> +    void RemoveEvents(int aFd) {
> +      int n = mEvents.erase(aFd);
> +      if (n == 0) {
> +        abort();
> +      }

You could also just do if (!mEvents.erase(aFd))

@@ +301,5 @@
> +      if (it == mEvents.end()) {
> +        abort();
> +      }
> +
> +      mEvents[aFd] = aEvents;

iterator it = mEvents.find(aFd);

if (it == mEvents.end()) {
  abort();
}

it->second = aEvents;

That avoids searching the map twice for aFd.

@@ +309,5 @@
> +      const_iterator it = mEvents.find(aFd);
> +      if (it == mEvents.end()) {
> +        abort();
> +      }
> +      return it->second;

I don't really care whether you have newlines between the const_iterator declaration and the if statement, or between the '}' and the return statement, but try to keep it consistent between the different functions in this class.

@@ +339,5 @@
> +      abort();
> +    }
> +
> +    mEpollFdsInfo[aEpollFd] = EpollInfo(aBackSize);
> +  }

The same comments I made above apply to this function too.

@@ +345,5 @@
> +  EpollInfo *FindEpollInfo(int aEpollFd) {
> +    iterator it = mEpollFdsInfo.find(aEpollFd);
> +    if (it == mEpollFdsInfo.end()) {
> +      return NULL;
> +    }

Does this branch ever happen?  Most of the functions that call FindEpollInfo abort() if they get NULL.  Some of the other functions might be cleaner if this returned an EpollInfo& or maybe even an iterator&.

@@ +354,5 @@
> +  void RemoveEpollInfo(int aEpollFd) {
> +    int n = mEpollFdsInfo.erase(aEpollFd);
> +    if (n == 0) {
> +      abort();
> +    }

And this one.

@@ +367,5 @@
> +  static EpollManager *Singleton() {
> +    static EpollManager *man = NULL;
> +    if (man == NULL) {
> +      man = new EpollManager();
> +    }

Do we ever shut down and delete the EpollManager?  I guess not.

@@ +403,5 @@
> +  *prev = tinfo;
> +
> +  sThreadCount++;
> +  pthread_mutex_unlock(&sThreadCountLock);
> +  pthread_cond_signal(&sThreadChangeCond);

Hmm, shouldn't we signal and then unlock?  I know it doesn't really matter but it feels more natural.  There are a bunch of places we do this so if we decide to change it we should change all of them.

@@ +489,5 @@
> +  }
> +
> +  thread_info_t *tinfo = CUR_THREAD_INFO;
> +  if (!sIsNuwaProcess) {
> +    longjmp(tinfo->retEnv, 1);

Since longjmp never returns maybe add an abort() afterwards for sanity checking?

@@ +559,5 @@
> +       it++) {
> +    pthread_key_t key = it->first;
> +    const void *value = it->second;
> +    rv = pthread_setspecific(key, value);
> +    if (rv != 0) {

just if (!pthread_setspecific....)?

@@ +619,5 @@
> +/**
> + * The following are used to synchronize between the main thread and the
> + * thread being recreated. The main thread will wait until the thread is woken
> + * up from the freeze points or the blocking intercepted functions and then
> + * proceed to recreated the next frozen thread.

recreate, not recreated.

@@ +633,5 @@
> + * (like pthread_cond_wait()).
> + */
> +
> +// Used to synchronize the main thread and the recreated thread to only allow
> +// one thread being recreated.

Used to synchronize the main thread and the thread being recreated so that only one thread is allowed to be recreated at a time.

@@ +635,5 @@
> +
> +// Used to synchronize the main thread and the recreated thread to only allow
> +// one thread being recreated.
> +static pthread_mutex_t sRecreateWaitLock = PTHREAD_MUTEX_INITIALIZER;
> +// Used to let block recreated threads until the main thread "opens the gate".

Used to block ...  no need for 'let' here.

@@ +638,5 @@
> +static pthread_mutex_t sRecreateWaitLock = PTHREAD_MUTEX_INITIALIZER;
> +// Used to let block recreated threads until the main thread "opens the gate".
> +static pthread_mutex_t sRecreateGateLock = PTHREAD_MUTEX_INITIALIZER;
> +// Used to let the main thread to wait all VIP threads recreated before opening
> +// the gate.

Used to block the main thread from "opening the gate" until all VIP threads have been recreated

@@ +654,5 @@
> + * 1. RECREATE_START() is first called in the beginning of thread
> + *    recreation to set sRecreateWaitLock and sRecreateGateLock in locked
> + *    state.
> + * 2. For each frozen thread:
> + *    2.1. RECREATE_BEFORE() to set current thread in recreation.

to set the thread being recreated

@@ +670,5 @@
> + */
> +#define RECREATE_START()                          \
> +  do {                                            \
> +    REAL(pthread_mutex_lock)(&sRecreateWaitLock); \
> +    REAL(pthread_mutex_lock)(&sRecreateGateLock); \

Switch these (acquire the gate lock then the wait lock) to maintain a lock hierarchy.

@@ +712,5 @@
> +/**
> + * Thread freeze points. Note that the freeze points are implemented as macros
> + * so as not to garble the content of the stack after setjmp().
> + *
> + * In the nuwa process, when the threads supporting nuwa calls the wrapper

when a thread supporting nuwa calls a wrapper function

@@ +714,5 @@
> + * so as not to garble the content of the stack after setjmp().
> + *
> + * In the nuwa process, when the threads supporting nuwa calls the wrapper
> + * functions, freeze point 1 setjmp()s to save the state. We only allow the
> + * threads to be frozen in the wrapper functions. If thread freezing is not

thread

@@ +717,5 @@
> + * functions, freeze point 1 setjmp()s to save the state. We only allow the
> + * threads to be frozen in the wrapper functions. If thread freezing is not
> + * enabled yet, the wrapper functions act like their wrapped counterparts,
> + * except for the extra actions in the freeze points. If thread freezing is
> + * enabled, the threads will be frozen by calling one of the wrapper functions.

thread

@@ +828,5 @@
> + * pthread_cond_wait() and pthread_cond_timedwait():
> + *
> + * These functions are wrapped by the above freeze point macros. By the time a
> + * new process is forked, the recreated thread will block in one of the wrapper
> + * functions. When recreating the thread, we longjmp() to

Once the a new process is forked, the recreated thread will be blocked in one of the wrapper functions.

@@ +870,5 @@
> +  }
> +  if (recreated && mtx) {
> +    if (!freezePoint1 && pthread_mutex_trylock(mtx)) {
> +      // The thread was frozen in pthread_cond_wait() in the Nuwa process and
> +      // released mtx. The following pthread_cond_wait() call will release mtx

s/and released mtx/after releasing mtx/.  After the first sentence, add a sentence about how mtx is now locked by another thread.

@@ +873,5 @@
> +      // The thread was frozen in pthread_cond_wait() in the Nuwa process and
> +      // released mtx. The following pthread_cond_wait() call will release mtx
> +      // and we need the main thread's help to reacquire mtx to make it in a
> +      // valid state.
> +      tinfo->reacquireMutex = mtx;

It took me and Justin some time to realize how this actually works, and I have to say I'm a little scared.  This is pretty clever, but it relies on a lot of behavior that's technically undefined. I'm willing to go along with it if it works in b2g ...

Have you tried this in a debug build? I wouldn't be surprised if pthreads has assertions about unlocking mutexes on threads that didn't lock them, etc.

@@ +1215,5 @@
> +  nuwa_construct_t *ctr = sConstructors;
> +  while (ctr) {
> +    ctr->construct(ctr->arg);
> +    ctr = ctr->next;
> +  }

You need to free(ctr) after you're done with it to avoid leaking, no?

@@ +1248,5 @@
> +
> +  ctr = sFinalConstructors;
> +  while (ctr) {
> +    ctr->construct(ctr->arg);
> +    ctr = ctr->next;

Same here.

@@ +1447,5 @@
> + * Let IPC thread wait until fork action on the main thread has completed.
> + */
> +void NS_EXPORT
> +NuwaSpawnWait() {
> +  REAL(pthread_cond_wait)(&sForkWaitCond, &sForkLock);

This needs to check some sort of condition, no?  Otherwise pthread_cond_wait could get a spurious wakeup and we might proceed too early.

@@ +1525,5 @@
> +StopNuwaProcess() {
> +  abort();                      // never call this function
> +  sIsNuwaProcess = false;
> +  pthread_mutex_unlock(&sThreadFreezeLock);
> +}

Why do we have this function if we're never supposed to call it?

@@ +1533,5 @@
> + * the spawned process.
> + */
> +void NS_EXPORT
> +NuwaMarkCurrentThread(void (*recreate)(void *), void *arg) {
> +  if (!sIsNuwaProcess) return;

if (cond) {
  // one line thing
}

please

::: mozglue/build/Nuwa.h
@@ +8,5 @@
> +#define __NUWA_H_
> +
> +#include <unistd.h>
> +
> +

Nit: just one \n instead of two.

@@ +44,5 @@
> + *                                                 5. NuwaSpawn() to clone all
> + *                                                 the threads in the Nuwa
> + *                                                 process (including main & IPC
> + *                                                 threads).
> + *   6. NuwaSpawn() finishes.

I would write

(under the main thread)
6. NuwaSpawn finishes and signals NuwaSpanWait.

(under the IPC thread)

7. NuwaSpawnWait returns.

since NuwaSpawn() does not ever run on the IPC thread.

@@ +81,5 @@
> + * by calling a blocking method such as pthread_mutex_lock() or epoll_wait().
> + *
> + * @param recreate The custom function that will be called in the spawned
> + *                 process, after the thread is recreated.
> + * @param arg The argument passed to the custom function.

Note that nullptr is acceptable for both recreate and arg.

@@ +86,5 @@
> + */
> +void NuwaMarkCurrentThread(void (*recreate)(void *), void *arg) __attribute__((weak));
> +
> +/**
> + * Marks the current thread as not supporting Nuwa. The calling thread will not

Mark, not marks.

@@ +87,5 @@
> +void NuwaMarkCurrentThread(void (*recreate)(void *), void *arg) __attribute__((weak));
> +
> +/**
> + * Marks the current thread as not supporting Nuwa. The calling thread will not
> + * be cloned from the Nuwa process to spawned process. If this thread needs to

not be automatically cloned

@@ +95,5 @@
> +void NuwaSkipCurrentThread() __attribute__((weak));
> +
> +/**
> + * Force to freeze the current thread explicitly.
> + */

Force the current thread to freeze explicitly at the current point.

@@ +99,5 @@
> + */
> +void NuwaFreezeCurrentThread() __attribute__((weak));
> +
> +/**
> + * Sets the point to recover after thread recreation. The calling thread is not

Set, not sets

@@ +123,5 @@
> +
> +/**
> + * Register a method to be invoked after a new process is spawned.
> + *
> + * @param construct The method to be invoked.

State explicitly that the constructor is invoked on the main thread.

@@ +133,5 @@
> +/**
> + * Register a method to be invoked after a new process is spawned and threads
> + * are recreated.
> + *
> + * @param construct The method to be invoked.

Here too.
Attachment #761968 - Flags: review?(khuey)
Attachment #761968 - Flags: review?(justin.lebar+bug)
Comment on attachment 761969 [details] [diff] [review]
Part 2: IPC and glue changes.

Review of attachment 761969 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/chromium/src/base/process_util_posix.cc
@@ +256,5 @@
>  bool DidProcessCrash(bool* child_exited, ProcessHandle handle) {
>    int status;
>    const int result = HANDLE_EINTR(waitpid(handle, &status, WNOHANG));
>    if (result == -1) {
> +    // XXXX: always be here for Nuwa

Do we need this comment?  I don't think it adds anything.
Attachment #761969 - Flags: review?(khuey)
Attachment #761969 - Flags: review?(justin.lebar+bug)
Attachment #761969 - Flags: review?(bent.mozilla)
Attachment #761970 - Flags: review?(khuey)
Attachment #761970 - Flags: review?(justin.lebar+bug)
Attachment #761970 - Flags: review?(bent.mozilla)
Comment on attachment 757837 [details] [diff] [review]
Part 4: clone IPC protocol objects that will be up when the template process is ready

Review of attachment 757837 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, but bent should look at this too.

How was this list of protocols determined?  By trial and error?  Do we have code that will abort or something if we need to clone a protocol that doesn't implement CloneProtocol?

::: dom/indexedDB/ipc/IndexedDBParent.cpp
@@ +137,5 @@
> +
> +    return parent;
> +#else
> +    return nullptr;
> +#endif

CloneProtocol is never called #ifndef MOZ_NUWA_PROCESS, right?  If so, let's add MOZ_CRASH() calls.

::: dom/ipc/CrashReporterParent.cpp
@@ +33,5 @@
> +mozilla::ipc::IProtocolRPC*
> +CrashReporterParent::CloneProtocol(Channel* aChannel,
> +                                   mozilla::ipc::ProtocolCloneContext *aCtx) {
> +    return new CrashReporterParent();
> +}

Do we not need to run the code in ContentParent::RecvPCrashReporterConstructor here?  The thread ID on the child side will have changed after the call to fork(), no?

::: gfx/layers/ipc/CompositorParent.h
@@ +53,5 @@
> +  // IToplevelProtocol::CloneToplevel()
> +  virtual IToplevelProtocol *
> +  CloneToplevel(const InfallibleTArray<mozilla::ipc::ProtoFdMapping> &aFds,
> +                base::ProcessHandle aPeerProcess,
> +                mozilla::ipc::ProtocolCloneContext *aCtx);

Why are some of these functions #ifdef MOZ_NUWA_PROCESS and others aren't?  Seems messy.
Attachment #757837 - Flags: review?(justin.lebar+bug)
Attachment #757837 - Flags: review?(bent.mozilla)
Attachment #757837 - Flags: review+
Comment on attachment 761971 [details] [diff] [review]
Part 5: PContent protocol changes

Review of attachment 761971 [details] [diff] [review]:
-----------------------------------------------------------------

r- for the ContentParent refcounting issue

::: dom/ipc/ContentParent.cpp
@@ +235,5 @@
> +static StaticRefPtr<ContentParent> sNuwaProcess;
> +// Nuwa process is ready for creating new process.
> +static bool sNuwaReadyOnParent = false;
> +static Monitor *sSpareMonitor = nullptr;
> +static nsAutoTArray<nsAutoPtr<ContentParent>, 4> sSpareProcesses;

ContentParent is refcounted.  You should be using nsRefPtr, not nsAutoPtr here.  Properly refcounting ContentParent should also get rid of the need for all the manual NS_RELEASE, .forget(), etc.

@@ +368,1 @@
>  ContentParent::StartUp()

Please don't add several blank lines here.
Attachment #761971 - Flags: review?(khuey)
Attachment #761971 - Flags: review?(justin.lebar+bug)
Attachment #761971 - Flags: review-
Comment on attachment 761972 [details] [diff] [review]
Part 6: mark threads created in the template process.

Review of attachment 761972 [details] [diff] [review]:
-----------------------------------------------------------------

r=me

I'd like somebody from the JS team to sign off on the JS changes even though they're pretty simple.

::: js/src/configure.in
@@ +4232,5 @@
>  AC_SUBST(NSPR_LIBS)
>  AC_SUBST(MOZ_NATIVE_NSPR)
>  
> +if test -n "$MOZ_NUWA_PROCESS"; then
> +	AC_DEFINE(MOZ_NUWA_PROCESS)

No 8 space indenting!

::: js/xpconnect/src/XPCJSRuntime.cpp
@@ +1116,5 @@
> +    if (IsNuwaProcess()) {
> +        NS_ASSERTION(MarkThreadNuwa != nullptr,
> +                     "MarkThreadNuwa is undefined!");
> +        NuwaMarkCurrentThread(nullptr, nullptr);
> +        NuwaFreezeCurrentThread();

How come we need to explicitly freeze this thread, but not the timer thread?

::: netwerk/base/src/nsStreamTransportService.cpp
@@ +445,5 @@
> +NS_IMETHODIMP
> +STSThreadPoolListener::OnThreadCreated()
> +{
> +    if (IsNuwaProcess()) {
> +        NuwaMarkCurrentThread((void (*)(void *))nullptr, nullptr);

Do you really need the cast here?

::: xpcom/base/nsCycleCollector.cpp
@@ +1130,5 @@
> +            NS_ASSERTION(MarkThreadNuwa != nullptr,
> +                         "MarkThreadNuwa is undefined!");
> +            NuwaMarkCurrentThread(nullptr, nullptr);
> +        }
> +#endif

No 8 space indenting please!
Attachment #761972 - Flags: review?(khuey)
Attachment #761972 - Flags: review?(justin.lebar+bug)
Attachment #761972 - Flags: review?(jorendorff)
Attachment #761972 - Flags: review+
Comment on attachment 755327 [details] [diff] [review]
Part 7: don't schedule timer in the message pump if template process is ready.

Review of attachment 755327 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/MessagePump.cpp
@@ +103,5 @@
>  
>      if (did_work && delayed_work_time_.is_null())
> +#ifdef MOZ_NUWA_PROCESS
> +      if (!IsNuwaReady() || !IsNuwaProcess())
> +#endif

I'd write this as

if (did_work && delayed_work_time_.is_null()
#ifdef MOZ_NUWA_PROCESS
    && (!IsNuwaReady() || !IsNuwaProcess())
#endif
    )

So that the indenting is correct.

@@ +130,1 @@
>    mDelayedWorkTimer->Cancel();

Indent mDelayedWorkTimer->Cancel();
Attachment #755327 - Flags: review?(justin.lebar+bug) → review+
Comment on attachment 761973 [details] [diff] [review]
Part 8: process initialization flow changes

Review of attachment 761973 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/app/MozillaRuntimeMain.cpp
@@ +62,5 @@
> +#ifdef MOZ_NUWA_PROCESS
> +    if (!isNuwa) {
> +        android::ProcessState::self()->startThreadPool();
> +    } else {
> +        NuwaAddFinalConstructor((void (*)(void *))&InitializeBinder, nullptr);

Do you really need the cast here?

::: toolkit/xre/nsEmbedFunctions.cpp
@@ +85,5 @@
>  
> +#ifdef MOZ_NUWA_PROCESS
> +#include "nsITimer.h"
> +#define NUWA_PREPARATION_TIME 1000
> +#endif

We need to file a bug before landing on replacing the 1s timer with something more deterministic.

@@ +546,5 @@
> +	  CommandLine::ForCurrentProcess()->HasSwitch(L"nuwa")) {
> +	  // Wait the Nuwa process for NUWA_PREPARATION_TIME ms.
> +	  timer = do_CreateInstance(NS_TIMER_CONTRACTID);
> +	  NS_ASSERTION(timer != NULL,
> +		       "Can not allocate nsITimer!");

Don't bother.  Just assume it never fails and if it does crash with the null dereference.
Attachment #761973 - Flags: review?(khuey)
Attachment #761973 - Flags: review?(justin.lebar+bug)
Attachment #761973 - Flags: review+
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #98)
> Comment on attachment 761968 [details] [diff] [review]
> Part 1: the Nuwa API and low-level wrappers.
> 
> Review of attachment 761968 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> One thing to consider that hadn't occurred to me before is how we want to
> handle cleaning up the Nuwa data/structs/etc in the forked process.  Freeing
> that stuff is the natural thing to do but it means writing to pages and
> possibly defeating copy-on-write.  If we leave things alone then we're
> technically using more memory but because of the copy-on-write sharing it
> may not hurt us.  Maybe we should try to allocate all of our memory for Nuwa
> on separate pages from the regular heap and then just drop it all after we
> fork()?  We can deal with this in a followup bug though.
> 
Maybe we can leverage static variable's dtor to free the resources on process exit?

> @@ +219,5 @@
> > +
> > +  return tinfo;
> > +}
> > +
> > +#if !defined(HAVE_THREAD_TLS_KEYWORD)
> 
> Does B2G HAVE_THREAD_TLS_KEYWORD or not?  Why are we implementing whichever
> one is not B2G?  For testing on desktop linux?
We don't have HAVE_THREAD_TLS_KEYWORD on b2g. Thinker to clarify this?

> @@ +403,5 @@
> > +  *prev = tinfo;
> > +
> > +  sThreadCount++;
> > +  pthread_mutex_unlock(&sThreadCountLock);
> > +  pthread_cond_signal(&sThreadChangeCond);
> 
> Hmm, shouldn't we signal and then unlock?  I know it doesn't really matter
> but it feels more natural.  There are a bunch of places we do this so if we
> decide to change it we should change all of them.
> 
Signal and unlock will be slower on platforms that doesn't support wait morphing optimization. NPTL has a similar optimization. I need to check if b2g has this. Anyway since this is not performance critical part of code I will change that to make it feel more natural.

> @@ +670,5 @@
> > + */
> > +#define RECREATE_START()                          \
> > +  do {                                            \
> > +    REAL(pthread_mutex_lock)(&sRecreateWaitLock); \
> > +    REAL(pthread_mutex_lock)(&sRecreateGateLock); \
> 
> Switch these (acquire the gate lock then the wait lock) to maintain a lock
> hierarchy.
> 
Actually sRecreateWaitLock is unlocked by REACREATE_STOP() in the method later then sRecreateGateLock. I will rename the macro to RECREATE_FINISH() to make it clearer.

> @@ +873,5 @@
> > +      // The thread was frozen in pthread_cond_wait() in the Nuwa process and
> > +      // released mtx. The following pthread_cond_wait() call will release mtx
> > +      // and we need the main thread's help to reacquire mtx to make it in a
> > +      // valid state.
> > +      tinfo->reacquireMutex = mtx;
> 
> It took me and Justin some time to realize how this actually works, and I
> have to say I'm a little scared.  This is pretty clever, but it relies on a
> lot of behavior that's technically undefined. I'm willing to go along with
> it if it works in b2g ...
> 
> Have you tried this in a debug build? I wouldn't be surprised if pthreads
> has assertions about unlocking mutexes on threads that didn't lock them, etc.
> 
Bionic doens't assert this if the mutex type is PTHREAD_MUTEX_NORMAL. It returns an error if the mutex type is PTHREAD_MUTEX_RECURSIVE or PTHREAD_MUTEX_ERRORCHECK. I know we depend on the implementation that is technically undefined, but we haven't seen an error on b2g. On a platform with a more rigorous pthread implementation this could backfire. We could make it a followup.

> @@ +1447,5 @@
> > + * Let IPC thread wait until fork action on the main thread has completed.
> > + */
> > +void NS_EXPORT
> > +NuwaSpawnWait() {
> > +  REAL(pthread_cond_wait)(&sForkWaitCond, &sForkLock);
> 
> This needs to check some sort of condition, no?  Otherwise pthread_cond_wait
> could get a spurious wakeup and we might proceed too early.
> 
You mean pthread_cond_wait() interrupted by a signal. That's tricky. From the man page this call doesn't return EINTR, but actually a signal can interrupt a blocking call and result in a spurious wakeup. Then we could need to introduce status variables and use a while loop to do this. This could significantly complicate the code if we need to do this wherever pthread_cond_wait() is called...
Flags: needinfo?(tlee)
(In reply to Cervantes Yu from comment #105)
> Maybe we can leverage static variable's dtor to free the resources on
> process exit?

Freeing it on process exit seems like a waste of time.  It isn't much different than just leaking it.

> Bionic doens't assert this if the mutex type is PTHREAD_MUTEX_NORMAL. It
> returns an error if the mutex type is PTHREAD_MUTEX_RECURSIVE or
> PTHREAD_MUTEX_ERRORCHECK. I know we depend on the implementation that is
> technically undefined, but we haven't seen an error on b2g. On a platform
> with a more rigorous pthread implementation this could backfire. We could
> make it a followup.

I don't care if it works on other pthreads implementations as long as it works on b2g in opt and debug builds :-)

> You mean pthread_cond_wait() interrupted by a signal. That's tricky. From
> the man page this call doesn't return EINTR, but actually a signal can
> interrupt a blocking call and result in a spurious wakeup. Then we could
> need to introduce status variables and use a while loop to do this. This
> could significantly complicate the code if we need to do this wherever
> pthread_cond_wait() is called...

I don't think it matters for any pthread_cond_wait() call except this one.  The others are either already checking some condition or are inside __wrap_pthread_cond_wait() where the caller needs to deal with the wakeups.  Seems like we should just be able to stick a boolean static variable in somewhere, set it just above pthread_cond_signal(&sForkWaitCond), and poll it in NuwaSpawnWait.
Changes:

* Use moz.build for exporting Nuwa.h.
* Use mfbt linked list for sAllThreads.
* Clean up sConstructors and sFinalConstructors after forking from the Nuwa process.
* Cleanup EpollManager after forking from the Nuwa process.
* Handle spurious wakeup of pthread_cond_wait() with sForkWaitCond.
* GetThreadInfo() and GetCurThreadInfo() refactoring.
* pthread_cond_signal() with mutex locked.
* Fixing the nits.
Attachment #761968 - Attachment is obsolete: true
Attachment #768140 - Flags: review?(khuey)
> @@ +163,5 @@
> > + * calling thread.
> > + */
> > +static pthread_mutex_t sThreadFreezeLock = PTHREAD_MUTEX_INITIALIZER;
> > +
> > +static thread_info_t *sAllThreads = NULL;
> 
> I pointed this out somewhere else, but this list is never cleaned up.
> 
Freeing sAllThreads is kind of tricky in the spawned process. The threads we recreate are expected to live as long as the process. Cleaning up an thread_info_t instance on thread termination might not be much better than during process exit. I will deal with it in a followup.
> @@ +345,5 @@
> > +  EpollInfo *FindEpollInfo(int aEpollFd) {
> > +    iterator it = mEpollFdsInfo.find(aEpollFd);
> > +    if (it == mEpollFdsInfo.end()) {
> > +      return NULL;
> > +    }
> 
> Does this branch ever happen?  Most of the functions that call FindEpollInfo
> abort() if they get NULL.  Some of the other functions might be cleaner if
> this returned an EpollInfo& or maybe even an iterator&.
> 
Yes, it happens in the close() wrapper. The Nuwa process can close() Fds, most of which might not be epoll Fds.
(In reply to Cervantes Yu from comment #105)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #98)
> > Comment on attachment 761968 [details] [diff] [review]
> > Part 1: the Nuwa API and low-level wrappers.
> > 
> > Review of attachment 761968 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > @@ +219,5 @@
> > > +
> > > +  return tinfo;
> > > +}
> > > +
> > > +#if !defined(HAVE_THREAD_TLS_KEYWORD)
> > 
> > Does B2G HAVE_THREAD_TLS_KEYWORD or not?  Why are we implementing whichever
> > one is not B2G?  For testing on desktop linux?
> We don't have HAVE_THREAD_TLS_KEYWORD on b2g. Thinker to clarify this?

I really like to leverage TLS if it is there.  That is why it is there.
Flags: needinfo?(tlee)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #99)
> Comment on attachment 761969 [details] [diff] [review]
> Part 2: IPC and glue changes.
> 
> Review of attachment 761969 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: ipc/chromium/src/base/process_util_posix.cc
> @@ +256,5 @@
> >  bool DidProcessCrash(bool* child_exited, ProcessHandle handle) {
> >    int status;
> >    const int result = HANDLE_EINTR(waitpid(handle, &status, WNOHANG));
> >    if (result == -1) {
> > +    // XXXX: always be here for Nuwa
> 
> Do we need this comment?  I don't think it adds anything.
Will add comment about its -1 and errno for processes spawned from the Nuwa process.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #100)
> Comment on attachment 757837 [details] [diff] [review]
> Part 4: clone IPC protocol objects that will be up when the template process
> is ready
> 
> Review of attachment 757837 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r=me, but bent should look at this too.
> 
> How was this list of protocols determined?  By trial and error?  Do we have
> code that will abort or something if we need to clone a protocol that
> doesn't implement CloneProtocol?
> 
It's by observing which protocols are created in the Nuwa process (and basically, trial and error). Because the Nuwa process is only used for forking new processes, it only uses a small set of protocols and this is OK. We will fallback to the default implementation in ProtocolUtils.cpp and abort() if some protocol needs to be cloned but the implementation is absent.

> ::: dom/indexedDB/ipc/IndexedDBParent.cpp
> @@ +137,5 @@
> > +
> > +    return parent;
> > +#else
> > +    return nullptr;
> > +#endif
> 
> CloneProtocol is never called #ifndef MOZ_NUWA_PROCESS, right?  If so, let's
> add MOZ_CRASH() calls.
> 
We don't have to. Class IProtocolManager declares pure virtual function CloneRouted() and it is implemented by generated code. For example, PContentParent::CloneRouted() has the following snippet:

        InfallibleTArray<PIndexedDBParent*>& kids = (other)->mManagedPIndexedDBParent;
        PIndexedDBParent* actor;
        for (uint32_t i = 0; (i) < ((kids).Length()); (++(i))) {
            actor = static_cast<PIndexedDBParent*>((kids[i])->CloneProtocol((&(mChannel)), aCtx));
            if ((actor) == (0)) {
                NS_RUNTIMEABORT("can not clone an PIndexedDB actor");
                return;
            }

So if CloneProtocol() of the managed protocol returns null, either by default or from the overridden version, it aborts from the generated code.

> ::: dom/ipc/CrashReporterParent.cpp
> @@ +33,5 @@
> > +mozilla::ipc::IProtocolRPC*
> > +CrashReporterParent::CloneProtocol(Channel* aChannel,
> > +                                   mozilla::ipc::ProtocolCloneContext *aCtx) {
> > +    return new CrashReporterParent();
> > +}
> 
> Do we not need to run the code in
> ContentParent::RecvPCrashReporterConstructor here?  The thread ID on the
> child side will have changed after the call to fork(), no?
> 
Yes, this is a bug. I will fix it.

> ::: gfx/layers/ipc/CompositorParent.h
> @@ +53,5 @@
> > +  // IToplevelProtocol::CloneToplevel()
> > +  virtual IToplevelProtocol *
> > +  CloneToplevel(const InfallibleTArray<mozilla::ipc::ProtoFdMapping> &aFds,
> > +                base::ProcessHandle aPeerProcess,
> > +                mozilla::ipc::ProtocolCloneContext *aCtx);
> 
> Why are some of these functions #ifdef MOZ_NUWA_PROCESS and others aren't? 
> Seems messy.
Will make it consistent if there are no build-related issues.
Attached patch Part 1: the Nuwa API and low-level wrappers. (obsolete) — — Splinter Review
Updated for the changes to part 4. Carry over r+
Attachment #768140 - Attachment is obsolete: true
Attachment #770738 - Flags: review+
Move class ProtocolCloneContext to ProtocolUtils.h
Attachment #761970 - Attachment is obsolete: true
Attachment #761970 - Flags: review?(bent.mozilla)
Attachment #770739 - Flags: review?(bent.mozilla)
Attachment #757837 - Attachment is obsolete: true
Attachment #757837 - Flags: review?(bent.mozilla)
Attachment #770742 - Flags: review?(bent.mozilla)
Comment on attachment 770742 [details] [diff] [review]
Part 4: clone IPC protocol objects that will be up when the template process is ready

Carry over r+ from khuey
Attachment #770742 - Flags: review+
(In reply to Cervantes Yu from comment #111)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #100)
> > ::: dom/ipc/CrashReporterParent.cpp
> > @@ +33,5 @@
> > > +mozilla::ipc::IProtocolRPC*
> > > +CrashReporterParent::CloneProtocol(Channel* aChannel,
> > > +                                   mozilla::ipc::ProtocolCloneContext *aCtx) {
> > > +    return new CrashReporterParent();
> > > +}
> > 
> > Do we not need to run the code in
> > ContentParent::RecvPCrashReporterConstructor here?  The thread ID on the
> > child side will have changed after the call to fork(), no?
> > 
> Yes, this is a bug. I will fix it.
After tracing the flow of crash reporter parent, I found that we don't need to set thread ID for the cloned CrashReporterParent. It's used for the crash reporter for the plugin process. B2G doesn't go over this path: info about a crash in the content process will be sent from CrashGenerationClient to the CrashGenerationServer in the chrome process. I put a comment here to say that for Nuwa the cloned instance doesn't need the info set in SetChildData().

> 
> > ::: gfx/layers/ipc/CompositorParent.h
> > @@ +53,5 @@
> > > +  // IToplevelProtocol::CloneToplevel()
> > > +  virtual IToplevelProtocol *
> > > +  CloneToplevel(const InfallibleTArray<mozilla::ipc::ProtoFdMapping> &aFds,
> > > +                base::ProcessHandle aPeerProcess,
> > > +                mozilla::ipc::ProtocolCloneContext *aCtx);
> > 
> > Why are some of these functions #ifdef MOZ_NUWA_PROCESS and others aren't? 
> > Seems messy.
> Will make it consistent if there are no build-related issues.
I move class ProtocolCloneContext to ProtocolUtils.h, which is more natural, and remove the #ifdef blocks for CloneProtocol() and CloneToplevel(). The overloaded ContentParent ctor already aborts if MOZ_NUWA_PROCESS is not enabled. The protocols doesn't have to have #ifdef blocks.
Attached patch Part 5: PContent protocol changes (obsolete) — — Splinter Review
Use nsRefPtr in nsTArray for sSpareProcesses.
Attachment #761971 - Attachment is obsolete: true
Attachment #771250 - Flags: review?(khuey)
Carry over r+
Attachment #761972 - Attachment is obsolete: true
Attachment #761972 - Flags: review?(jorendorff)
Attachment #771296 - Flags: review+
Attachment #771296 - Flags: review?(jorendorff)
Attached patch Part 2: IPC and glue changes. (obsolete) — — Splinter Review
Nit fix.
Attachment #761969 - Attachment is obsolete: true
Attachment #761969 - Flags: review?(bent.mozilla)
Attachment #771298 - Flags: review?(bent.mozilla)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #102)
> Comment on attachment 761972 [details] [diff] [review]
> Part 6: mark threads created in the template process.
> 
> Review of attachment 761972 [details] [diff] [review]:
> ::: js/xpconnect/src/XPCJSRuntime.cpp
> @@ +1116,5 @@
> > +    if (IsNuwaProcess()) {
> > +        NS_ASSERTION(MarkThreadNuwa != nullptr,
> > +                     "MarkThreadNuwa is undefined!");
> > +        NuwaMarkCurrentThread(nullptr, nullptr);
> > +        NuwaFreezeCurrentThread();
> 
> How come we need to explicitly freeze this thread, but not the timer thread?
> 
I think we need to freeze it explicitly so it doesn't get frozen in the pthread_cond_timedwait() for 1 sec. and triggers the operation callback in the spawned process incorrectly.
The timer thread cannot be frozen explicitly because we schedule a 1 sec. delay to signal the start of thread freezing.
(In reply to Cervantes Yu from comment #120)
> I think we need to freeze it explicitly so it doesn't get frozen in the
> pthread_cond_timedwait() for 1 sec. and triggers the operation callback in
> the spawned process incorrectly.

Ah, that makes sense.
Carry over r+
Attachment #755327 - Attachment is obsolete: true
Attachment #772016 - Flags: review+
Attached patch Part 8: process initialization flow changes (obsolete) — — Splinter Review
Carry over r+
Attachment #761973 - Attachment is obsolete: true
Attachment #772017 - Flags: review+
Blocks: 890870
Comment on attachment 771250 [details] [diff] [review]
Part 5: PContent protocol changes

Review of attachment 771250 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, though I think bent should review some of the IPC stuff here.

::: dom/ipc/ContentChild.cpp
@@ +1114,5 @@
>  {
> +#ifdef MOZ_NUWA_PROCESS
> +    if (IsNuwaProcess()) {
> +      // Don't flush memory in the nuwa process: the GC thread could be frozen.
> +      return true;

4 space indents

@@ +1248,5 @@
> +#ifdef MOZ_NUWA_PROCESS
> +    if (IsNuwaProcess()) {
> +      // Don't minimize memory in the nuwa process: it will perform GC, but the
> +      // GC thread could be frozen.
> +        return true;

here too

@@ +1285,5 @@
>  }
>  
> +#include <setjmp.h>
> +#ifdef MOZ_NUWA_PROCESS
> +#include "ipc/Nuwa.h"

You're already including ipc/Nuwa.h.  Move setjmp.h up to the top of the file too.

@@ +1315,5 @@
> +    }
> +};
> +#endif
> +
> +// This number should be carefully choosed.

should be chosen carefully

@@ +1323,5 @@
> +static void
> +DoParasitePoint(jmp_buf *aEnv)
> +{
> +    // Dark Art!!
> +    // Reserve enough stack space for DoNuwaFork() to avoid corrupting

corrupting the stack frame

@@ +1326,5 @@
> +    // Dark Art!!
> +    // Reserve enough stack space for DoNuwaFork() to avoid corrupting
> +    // stack frame of NuwaCheckpointCurrentThread(), or we don't return
> +    // from NuwaCheckpointCurrentThread() correctly on the parasited
> +    // thread.

I think we decided to say "recreated thread" instead of "parasited thread"?

@@ +1338,5 @@
> +    NS_ASSERTION(ParasitePointThreadNuwa != nullptr,
> +                 "NuwaCheckpointCurrentThread() is not available!");
> +    rv = NuwaCheckpointCurrentThread();
> +    if (!rv) {
> +        // On the parasited thread

Same here.

@@ +1381,5 @@
> +    jmp_buf env;
> +    int rv;
> +
> +    rv = setjmp(env);
> +    if(rv == 0) {

if (

@@ +1396,5 @@
> +ContentChild::RecvNuwaFork()
> +{
> +#ifdef MOZ_NUWA_PROCESS
> +    if (sNuwaForking)           // no reentry
> +        return true;

Brace single line ifs.

@@ +1412,5 @@
>  } // namespace mozilla
> +
> +extern "C" {
> +/**
> + * AddNewIPCProcess() are called by Nuwa process to note the parent

to tell the parent

@@ +1415,5 @@
> +/**
> + * AddNewIPCProcess() are called by Nuwa process to note the parent
> + * process there is a new process was created.
> + *
> + * For new created process, it call ResetContentChildTransport() to

calls

@@ +1417,5 @@
> + * process there is a new process was created.
> + *
> + * For new created process, it call ResetContentChildTransport() to
> + * reset fd for the IPC Channel and the session.
> + */

Shouldn't this comment be in front of AddNewIPCProcess?

@@ +1435,5 @@
> +    for (IToplevelProtocol *actor = content->GetFirstOpenedActors();
> +         actor != nullptr;
> +         actor = actor->NextToplevel()) {
> +        if (i >= size)
> +            NS_RUNTIMEABORT("To much top level protocols!");

Braces.

@@ +1443,5 @@
> +            actor->GetTransport()->GetPipeFileDescriptor();
> +        i++;
> +    }
> +
> +    *aInfoSize = i;

We should add an assertion here, or in the function that calls GetProtoFdInfos (which I think is back in part 1) that i is less than or equal to NUWA_TOPLEVEL_MAX.

@@ +1453,5 @@
> +    ~RunAddNewIPCProcess() {};
> +    NS_IMETHOD Run();
> +
> +    pid_t mPid;
> +    InfallibleTArray<mozilla::ipc::ProtoFdMapping> mMaps;

nsTArray is infallible by default, so there's no need to explicitly use InfallibleTArray here.

@@ +1462,5 @@
> +{
> +    bool sendok = mozilla::dom::ContentChild::GetSingleton()->
> +        SendAddNewProcess(mPid, mMaps);
> +    if (!sendok)
> +        NS_RUNTIMEABORT("fail to call SendAddNewProcess()");

Braces.

@@ +1485,5 @@
> +
> +    nsCOMPtr<nsIThread> mainThread = do_GetMainThread();
> +    mainThread->Dispatch(runner,
> +                         NS_DISPATCH_NORMAL);
> +    return true;

NS_DispatchToMainThread(runner);
return true;

Also this function doesn't ever return false so you could just make it return void ...

::: dom/ipc/ContentParent.cpp
@@ +1212,5 @@
> +    IToplevelProtocol::SetTransport(mSubprocess->GetChannel());
> +
> +    std::vector<std::string> extraArgs;
> +    if (aIsNuwaProcess)
> +        extraArgs.push_back("-nuwa");

Braces
Attachment #771250 - Flags: review?(khuey)
Attachment #771250 - Flags: review?(bent.mozilla)
Attachment #771250 - Flags: review+
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #124)
> Comment on attachment 771250 [details] [diff] [review]
> Part 5: PContent protocol changes
> 
> Review of attachment 771250 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +1326,5 @@
> > +    // Dark Art!!
> > +    // Reserve enough stack space for DoNuwaFork() to avoid corrupting
> > +    // stack frame of NuwaCheckpointCurrentThread(), or we don't return
> > +    // from NuwaCheckpointCurrentThread() correctly on the parasited
> > +    // thread.
> 
> I think we decided to say "recreated thread" instead of "parasited thread"?
> 
Yes. Will change that.
> 
> @@ +1453,5 @@
> > +    ~RunAddNewIPCProcess() {};
> > +    NS_IMETHOD Run();
> > +
> > +    pid_t mPid;
> > +    InfallibleTArray<mozilla::ipc::ProtoFdMapping> mMaps;
> 
> nsTArray is infallible by default, so there's no need to explicitly use
> InfallibleTArray here.
> 
nsTArray and InfallibleTArray are now synonyms. I change most occurrences but leave overrides of IPDL-generated methods, which still use InfallibleTArray.
Attached patch Part 5: PContent protocol changes (obsolete) — — Splinter Review
Fix the nits and carry over r+.
Attachment #771250 - Attachment is obsolete: true
Attachment #771250 - Flags: review?(bent.mozilla)
Attachment #773213 - Flags: review+
Attachment #773213 - Flags: review?(bent.mozilla)
Comment on attachment 771298 [details] [diff] [review]
Part 2: IPC and glue changes.

Review of attachment 771298 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/chromium/src/chrome/common/child_process_host.cc
@@ +84,5 @@
>  
>    return true;
>  }
>  
> +bool ChildProcessHost::CreateChannel(FileDescriptor &aFile) {

Shouldn't we make sure the old channel is closed before replacing it here?

@@ +87,5 @@
>  
> +bool ChildProcessHost::CreateChannel(FileDescriptor &aFile) {
> +  FileDescriptor::PlatformHandleType fd = aFile.PlatformHandle();
> +  channel_id_ = GenerateRandomChannelID(this);
> +  // XXXX: use OpenDescriptor()

This needs to be fixed or a followup bug needs to be filed and referenced here.

::: ipc/chromium/src/chrome/common/ipc_channel.h
@@ +119,5 @@
> +  // Reset the client side of the socketpair.
> +  void ResetPipeFileDescriptor(int fd);
> +
> +  // Get fd of this side of the socketpair.
> +  int GetPipeFileDescriptor();

What's the difference between this and GetServerFileDescriptor() ? It seems like this is identical with the possible exception of an assertion that might not work.

::: ipc/chromium/src/chrome/common/ipc_channel_posix.cc
@@ +353,5 @@
> +/**
> + * Reset the file descriptor for communication with the peer.
> + */
> +void Channel::ChannelImpl::ResetPipeFileDescriptor(int fd) {
> +  if (fd != pipe_) {

Can we assert this instead? At least we can assert that fd is valid, right?

@@ +354,5 @@
> + * Reset the file descriptor for communication with the peer.
> + */
> +void Channel::ChannelImpl::ResetPipeFileDescriptor(int fd) {
> +  if (fd != pipe_) {
> +    int rv = dup2(fd, pipe_);

The docs I have see say that this can be interrupted, maybe need to do HANDLE_EINTR

@@ +355,5 @@
> + */
> +void Channel::ChannelImpl::ResetPipeFileDescriptor(int fd) {
> +  if (fd != pipe_) {
> +    int rv = dup2(fd, pipe_);
> +    NS_ASSERTION(rv != -1, "dup2 fails");

This can't really be asserted, can it? Are you just assuming that if we get to the point where we have too many fds open we are doomed anyway? We probably need to call Close() if anything in here fails.

@@ +363,5 @@
> +    flags |= FD_CLOEXEC;
> +    rv = fcntl(pipe_, F_SETFL, flags);
> +    NS_ASSERTION(rv != -1, "fcntl fails");
> +
> +    close(fd);

Hm, if we're going to dup and then immediately close the fd that we're passed why not just use it directly?

::: ipc/glue/GeckoChildProcessHost.cpp
@@ +427,5 @@
>    return retval;
>  }
>  
> +bool
> +GeckoChildProcessHost::RunPerformAsyncLaunch(std::vector<std::string> aExtraOpts, base::ProcessArchitecture arch) {

Nit: please wrap to make this shorter, and { goes on its own line.

@@ +848,5 @@
> +                         const FileDescriptor &aFile,
> +                         ChildPrivileges aPrivileges)
> +  : mExistingProcessHandle(aProcess)
> +  , mExistingFile(aFile)
> +  , GeckoChildProcessHost(aProcessType, aPrivileges)

Please initialize your base class first, and at least assert that aFile is valid.

@@ +852,5 @@
> +  , GeckoChildProcessHost(aProcessType, aPrivileges)
> +{
> +}
> +
> +GeckoExistingProcessHost::~GeckoExistingProcessHost() {

Nit: { on its own line

@@ +863,5 @@
> +  SetHandle(mExistingProcessHandle);
> +
> +  OpenPrivilegedHandle(base::GetProcId(mExistingProcessHandle));
> +  {
> +    MonitorAutoLock lock(mMonitor);

Nit: The braces aren't really needed here.

@@ +882,5 @@
> +  lock.Notify();
> +}
> +
> +#ifdef OS_POSIX
> +typedef nsUint64HashKey ProcessHostHashKeyType;

This doesn't seem to buy us much and makes the code harder to read... We should probably just have something like this at the top of the header file:

  #if defined(MOZ_NUWA_PROCESS) && !defined(OS_POSIX)
  #error Nuwa only supported on posix!
  #endif

And then use nsUint64HashKey directly.

@@ +888,5 @@
> +#error "Unknow hash key type for process host"
> +#endif
> +typedef nsClassHashtable<ProcessHostHashKeyType,
> +                         GeckoExistingProcessHost> ProcessHostMap;
> +typedef nsAutoTArray<nsAutoPtr<GeckoExistingProcessHost>, 4> ProcessHostArray;

Why 4?

@@ +891,5 @@
> +                         GeckoExistingProcessHost> ProcessHostMap;
> +typedef nsAutoTArray<nsAutoPtr<GeckoExistingProcessHost>, 4> ProcessHostArray;
> +
> +static ProcessHostMap sNotReadyProcesses;
> +static ProcessHostArray sSpareProcesses;

static XPCOM objects are not allowed. These need to be lazily constructed and then cleared at shutdown.

@@ +914,5 @@
> +  GeckoExistingProcessHost *host;
> +  bool success = sNotReadyProcesses.Get(aProcess, &host);
> +  if (!success)
> +    return false;
> +  sNotReadyProcesses.Remove(aProcess);

Woah, this will delete the object. You need to use RemoveAndForget().

@@ +916,5 @@
> +  if (!success)
> +    return false;
> +  sNotReadyProcesses.Remove(aProcess);
> +
> +  sSpareProcesses.AppendElement(host);

Hm, I don't see this being used anywhere. Is this all dead code?

::: ipc/glue/GeckoChildProcessHost.h
@@ +173,5 @@
> +{
> +public:
> +  GeckoExistingProcessHost(GeckoProcessType aProcessType,
> +			   base::ProcessHandle aProcess,
> +			   const FileDescriptor &aFile,

Nit: s/aFile/aFileDescriptor/ here and below

@@ +174,5 @@
> +public:
> +  GeckoExistingProcessHost(GeckoProcessType aProcessType,
> +			   base::ProcessHandle aProcess,
> +			   const FileDescriptor &aFile,
> +			   ChildPrivileges aPrivileges= base::PRIVILEGES_DEFAULT);

Nit: no space after =, here and below.

@@ +179,5 @@
> +
> +  ~GeckoExistingProcessHost();
> +
> +  virtual bool PerformAsyncLaunch(StringVector aExtraOpts=StringVector(),
> +          base::ProcessArchitecture arch=base::GetCurrentProcessArchitecture());

Nit: s/arch/aArch/ (I know this is just copied, but we shouldn't copy bad patterns).

And please mark all your overrides as MOZ_OVERRIDE

@@ +185,5 @@
> +  virtual void InitializeChannel();
> +
> +  static bool RegisterSpawnedProcess(GeckoProcessType aProcessType,
> +				     base::ProcessHandle aProcess,
> +				     const FileDescriptor &aFile,

Nit: ipc/glue puts the * and & on the left everywhere, please update to match existing style.

@@ +195,5 @@
> +      GetSpawnedProcess(base::ProcessHandle aProcess);
> +
> +private:
> +  base::ProcessHandle mExistingProcessHandle;
> +  mozilla::ipc::FileDescriptor mExistingFile;

Nit: s/mExistingFile/mExistingFileDescriptor/
Attachment #771298 - Flags: review?(bent.mozilla) → review-
Comment on attachment 770739 [details] [diff] [review]
Part 3: IPC glue and IPDL codegen to support protocol cloning.

Review of attachment 770739 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/ProtocolParams.ipdlh
@@ +6,5 @@
> +
> +namespace mozilla {
> +namespace ipc {
> +
> +struct ProtoFdMapping {

Nit: Let's call this 'ProtocolFdMapping', 'Proto' is a common thing in JS that we don't want to confuse here.

Also, the file should be renamed. ProtocolTypes.ipdlh is a bit better I think.

::: ipc/glue/ProtocolUtils.h
@@ +19,5 @@
>  #include "mozilla/ipc/Shmem.h"
>  #include "mozilla/ipc/Transport.h"
> +#include "mozilla/ipc/RPCChannel.h"
> +
> +#include "nsArray.h"

Not needed I hope...

@@ +78,5 @@
>      Action mAction;
>      int32_t mMsg;
>  };
>  
> +class ProtocolCloneContext {

Nit: All classes here and elsewhere should have the { on its own line.

@@ +79,5 @@
>      int32_t mMsg;
>  };
>  
> +class ProtocolCloneContext {
> +  mozilla::dom::ContentParent *mContentParent;

Nit: typedef'ing this would be cleaner

@@ +85,5 @@
> +public:
> +  ProtocolCloneContext() : mContentParent(NULL)
> +  {}
> +
> +  void SetContentParent(mozilla::dom::ContentParent *aContentParent) {

Nit: ipc/glue puts * and & on the left, so please update everywhere to match existing style.

@@ +135,5 @@
> +/**
> + * All async protocols should implement this interface.
> + */
> +class IProtocolAsync :
> +        protected AsyncChannel::AsyncListener

Why can't you just extend AsyncListener/SyncListener/RPCListener instead of adding three new interfaces?

@@ +146,5 @@
> +     * new content process forking from a template process (Nuwa).
> +     */
> +    virtual IProtocolAsync *
> +    CloneProtocol(AsyncChannel *aChannel,
> +                  ProtocolCloneContext *aCtx);

I understand that your current model is for all actors to have this base implementation and then override if they support cloning. However, I think we're going to override this on a fair number of classes anyway, and using a base method like this means that we won't be able to tell from a crash report which protocol object failed to support cloning. I'd much prefer that we we make these pure-virtual and then override with *every* protocol so that we have good crash reports and maybe even good abort messages.

@@ +203,5 @@
> +
> +    ~IToplevelProtocol();
> +
> +    /**
> +     * Add an actor to the list of actors had been opened by this protocol.

Nit: s/had/that have/

@@ +228,5 @@
> +
> +    virtual IToplevelProtocol *
> +    CloneToplevel(const InfallibleTArray<ProtoFdMapping> &aFds,
> +                  base::ProcessHandle aPeerProcess,
> +                  ProtocolCloneContext *aCtx);

Similar to above, I think this should be pure-virtual too.

@@ +238,5 @@
> +
> +private:
> +    IToplevelProtocol *mOpenActors; // All protocol actors opened by this.
> +
> +    IToplevelProtocol *mNext, **mPrev;

We have mixin classes for linked lists, please use mozilla::LinkedList instead of rolling your own.

::: ipc/glue/ipdl.mk
@@ +4,5 @@
>  
>  IPDLSRCS = \
>    InputStreamParams.ipdlh \
>    URIParams.ipdlh \
> +  ProtocolParams.ipdlh \

Nit: Please alphabetize

::: ipc/ipdl/ipdl/lower.py
@@ +1197,5 @@
>          return ExprVar('mId')
>  
> +    def stateVar(self, actorThis=None):
> +        if actorThis is not None:
> +            return ExprSelect(actorThis, '->', 'mState')

I'm curious why this is needed. Do we have some kind of name collision somewhere?

@@ +3761,5 @@
>                  p.callOtherProcess(p.managerVar())))
>              getchannel.addstmt(StmtReturn(p.channelVar()))
>  
> +        cloneprotocol.addstmts([
> +            StmtExpr(ExprCall(ExprVar('NS_ASSERTION'),

Let's make this use _runtimeAbort()

@@ +3882,5 @@
>                   destroyshmem,
>                   otherprocess,
>                   getchannel,
> +                 clonerouted,
> +                 cloneprotocol,

I think this looks ok but I'd prefer to look over the generated code to make sure. Can you attach the output for both these functions in toplevel/non-toplevel actors?

@@ +4140,4 @@
>                  iffailopen,
>                  iffailalloc,
> +                settrans,
> +                addopened,

Same here.
Attachment #770739 - Flags: review?(bent.mozilla) → review-
Comment on attachment 770742 [details] [diff] [review]
Part 4: clone IPC protocol objects that will be up when the template process is ready

Review of attachment 770742 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/ipc/IndexedDBParent.cpp
@@ +31,5 @@
>  #include "IDBObjectStore.h"
>  #include "IDBTransaction.h"
>  
> +#ifdef MOZ_NUWA_PROCESS
> +#include "ipc/Nuwa.h"

Why is this needed?

@@ +116,5 @@
>  }
>  
> +mozilla::ipc::IProtocolRPC *
> +IndexedDBParent::CloneProtocol(Channel *aChannel,
> +                               mozilla::ipc::ProtocolCloneContext *aCtx)

Assuming that this is defined in ipc/Nuwa.h, this won't compile if MOZ_NUWA_PROCESS isn't defined, right?

I also worry that this is insufficient. IndexedDB makes lots of subactors very quickly, so if you're not adding CloneProtocol to them then I expect we will have intermittent crashes.

@@ +119,5 @@
> +IndexedDBParent::CloneProtocol(Channel *aChannel,
> +                               mozilla::ipc::ProtocolCloneContext *aCtx)
> +{
> +    MOZ_ASSERT(mManagerContent != nullptr);
> +    MOZ_ASSERT(mManagerTab == nullptr);

I don't know how this can be asserted... IDB can be used from both PContent and PBrowser, how can we know for sure that this is a content managee?

@@ +120,5 @@
> +                               mozilla::ipc::ProtocolCloneContext *aCtx)
> +{
> +    MOZ_ASSERT(mManagerContent != nullptr);
> +    MOZ_ASSERT(mManagerTab == nullptr);
> +    MOZ_ASSERT(!mDisconnected);

This too is probably not assertable... It is likely a race.

@@ +121,5 @@
> +{
> +    MOZ_ASSERT(mManagerContent != nullptr);
> +    MOZ_ASSERT(mManagerTab == nullptr);
> +    MOZ_ASSERT(!mDisconnected);
> +

We should be able to add this here:

  MOZ_ASSERT(IndexedDatabaseManager::Get());
  MOZ_ASSERT(IndexedDatabaseManager::IsMainProcess());

@@ +127,5 @@
> +    IndexedDBParent *parent = new IndexedDBParent(contentParent);
> +
> +    nsRefPtr<IDBFactory> factory;
> +    nsresult rv = IDBFactory::Create(contentParent, getter_AddRefs(factory));
> +    MOZ_ASSERT(rv == NS_OK);

There are lots of ways that IDBFactory::Create could fail (OOM is particularly likely). This can't be asserted and must be handled instead.

You should make 'parent' an nsAutoPtr once you handle this.

Or, better yet, why not just do:

  nsAutoPtr<IndexedDBParent> actor = contentParent->AllocPIndexedDBParent();
  if (!actor || !contentParent->RecvPIndexedDBConstructor(actor)) {
    return nullptr;
  }
  return actor.forget();

That way we go through the same code paths for future-proofing and you don't have to worry about partial-initialization.

@@ +128,5 @@
> +
> +    nsRefPtr<IDBFactory> factory;
> +    nsresult rv = IDBFactory::Create(contentParent, getter_AddRefs(factory));
> +    MOZ_ASSERT(rv == NS_OK);
> +    MOZ_ASSERT(factory != nullptr);

This can only be asserted if we're managed by PContent. Otherwise it could actually be null and we'd need to handle it.

::: dom/indexedDB/ipc/IndexedDBParent.h
@@ +193,5 @@
>  
>    bool
>    CheckWritePermission(const nsAString& aDatabaseName);
>  
> +  mozilla::ipc::IProtocolRPC *

Nit: all of dom/ has * and & on the left...

@@ +195,5 @@
>    CheckWritePermission(const nsAString& aDatabaseName);
>  
> +  mozilla::ipc::IProtocolRPC *
> +  CloneProtocol(Channel *aChannel,
> +                mozilla::ipc::ProtocolCloneContext *aCtx);

Please add MOZ_OVERRIDE to everything that should be an override, here and everywhere.

Also, shouldn't this (and all other CloneProtocol functions) be #ifdef MOZ_NUWA_PROCESS? Otherwise we're adding a bunch of dead code.

::: dom/ipc/Blob.h
@@ +232,5 @@
>  
>  } // namespace dom
>  } // namespace mozilla
>  
> +#endif // mozilla_dom_ipc_Blob_h

Weird, is this a line ending change or something?

::: dom/ipc/CrashReporterParent.cpp
@@ +34,5 @@
> +CrashReporterParent::CloneProtocol(Channel* aChannel,
> +                                   mozilla::ipc::ProtocolCloneContext *aCtx) {
> +    // For Nuwa, we don't need the info set in SetChildData(). Just return a new
> +    // instance of CrashReporterParent.
> +    return new CrashReporterParent();

This seems fragile to me. Why isn't this information important for the Nuwa process?

FWIW it looks like mInitialized needs to be taken care of because it could abort the parent if the Nuwa process crashes I think.

::: dom/src/storage/DOMStorageIPC.cpp
@@ +369,5 @@
> +DOMStorageDBParent::CloneProtocol(Channel *aChannel,
> +                                  mozilla::ipc::ProtocolCloneContext *aCtx)
> +{
> +  DOMStorageDBParent *newactor = new DOMStorageDBParent();
> +  return newactor;

This looks fine for now but it will silently break if someone ever overrides ContentParent::RecvPStorageConstructor... I think we should try to always follow the pattern I suggested for IDB. Otherwise we're just begging for trouble down the road.

I'm actually going to wait for your comments on this idea before continuing with this review.

::: ipc/glue/Transport.h
@@ +29,5 @@
>  
>  Transport* OpenDescriptor(const TransportDescriptor& aTd,
>                            Transport::Mode aMode);
>  
> +Transport* OpenDescriptor(const FileDescriptor &aFd,

Nit: * and & on the left in ipc/glue
Attachment #770742 - Flags: review?(bent.mozilla) → review-
Comment on attachment 773213 [details] [diff] [review]
Part 5: PContent protocol changes

Review of attachment 773213 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ContentChild.cpp
@@ +334,5 @@
>      if (mIsForApp && !mIsForBrowser) {
>          SetProcessName(NS_LITERAL_STRING("(Preallocated app)"));
> +#ifdef MOZ_NUWA_PROCESS
> +        if (IsNuwaProcess()) {
> +            SetProcessName(NS_LITERAL_STRING("(Nuwa)"));

This seems unrelated to the 'mIsForApp' and 'mIsForBrowser' branch.

@@ +1285,5 @@
>      return true;
>  }
>  
> +#ifdef MOZ_NUWA_PROCESS
> +class CallNuwaSpawn : public nsRunnable {

Nit: { on its own line, for the Run method too.

@@ +1293,5 @@
> +        if (IsNuwaProcess()) {
> +            return NS_OK;
> +        }
> +
> +        // In the ew process.

Nit: "ew"?

@@ +1311,5 @@
> +        }
> +        return NS_OK;
> +    }
> +};
> +#endif

No reason to #endif, there's another identical block just below.

@@ +1313,5 @@
> +    }
> +};
> +#endif
> +
> +// This number should be carefully choosen.

... based on what? The comment by itself isn't all that useful.

@@ +1314,5 @@
> +};
> +#endif
> +
> +// This number should be carefully choosen.
> +#define RESERVED_INT_STACK 128

This should go at the top of the file.

@@ +1327,5 @@
> +    // from NuwaCheckpointCurrentThread() correctly on the recreated
> +    // thread.
> +    unsigned reserved[RESERVED_INT_STACK];
> +    bool rv;
> +    jmp_buf * volatile env = aEnv;

What does this do?

@@ +1330,5 @@
> +    bool rv;
> +    jmp_buf * volatile env = aEnv;
> +    unsigned *sentinel = reserved;
> +
> +    *sentinel = (unsigned)&sentinel ^ 0xdeadbeef;

This is just to prevent the array from being optimized away, right? Isn't there a better way to tell gcc that it must keep 'reserved' around?

@@ +1333,5 @@
> +
> +    *sentinel = (unsigned)&sentinel ^ 0xdeadbeef;
> +
> +    NS_ASSERTION(NuwaCheckpointCurrentThread != nullptr,
> +                 "NuwaCheckpointCurrentThread() is not available!");

This assertion doesn't tell us anything.

@@ +1358,5 @@
> +        nsCOMPtr<nsIThread> mainThread = do_GetMainThread();
> +        nsCOMPtr<nsIRunnable> callSpawn(new CallNuwaSpawn());
> +        NS_ASSERTION(NuwaSpawn != nullptr,
> +                     "NuwaSpawn() is not available!");
> +        mainThread->Dispatch(callSpawn, NS_DISPATCH_NORMAL);

NS_DispatchToMainThread

@@ +1379,5 @@
> +    jmp_buf env;
> +    int rv;
> +
> +    rv = setjmp(env);
> +    if (rv == 0) {

Nit: No need for the rv variable.

@@ +1401,5 @@
> +
> +    MessageLoop *ioloop = XRE_GetIOMessageLoop();
> +    ioloop->PostTask(FROM_HERE, NewRunnableFunction(RunNuwaFork));
> +#else
> +    abort();

Please just return false. The abort will happen automatically, along with logging and error messages and all that normal stuff.

@@ +1410,4 @@
>  } // namespace dom
>  } // namespace mozilla
> +
> +extern "C" {

I don't understand why this is needed.

@@ +1412,5 @@
> +
> +extern "C" {
> +
> +#if defined(MOZ_NUWA_PROCESS)
> +void NS_EXPORT

Or this.

@@ +1413,5 @@
> +extern "C" {
> +
> +#if defined(MOZ_NUWA_PROCESS)
> +void NS_EXPORT
> +GetProtoFdInfos(NuwaProtoFdInfo *aInfoList, int size, int *aInfoSize)

Where is this used?

@@ +1443,5 @@
> +    }
> +    *aInfoSize = i;
> +}
> +
> +class RunAddNewIPCProcess : public nsRunnable {

Nit: { on its own line.

@@ +1445,5 @@
> +}
> +
> +class RunAddNewIPCProcess : public nsRunnable {
> +public:
> +    RunAddNewIPCProcess(pid_t aPid) : mPid(aPid) {};

Nit: No ; here.

@@ +1446,5 @@
> +
> +class RunAddNewIPCProcess : public nsRunnable {
> +public:
> +    RunAddNewIPCProcess(pid_t aPid) : mPid(aPid) {};
> +    ~RunAddNewIPCProcess() {};

This is not needed.

@@ +1447,5 @@
> +class RunAddNewIPCProcess : public nsRunnable {
> +public:
> +    RunAddNewIPCProcess(pid_t aPid) : mPid(aPid) {};
> +    ~RunAddNewIPCProcess() {};
> +    NS_IMETHOD Run();

You inlined the Run method in CallNuwaSpawn, why not do the same here?

@@ +1459,5 @@
> +{
> +    bool sendok = mozilla::dom::ContentChild::GetSingleton()->
> +        SendAddNewProcess(mPid, mMaps);
> +    if (!sendok) {
> +        NS_RUNTIMEABORT("fail to call SendAddNewProcess()");

Child messages can never fail to send (if they do the process will be automatically aborted).

@@ +1462,5 @@
> +    if (!sendok) {
> +        NS_RUNTIMEABORT("fail to call SendAddNewProcess()");
> +    }
> +
> +    sNuwaForking = false;

Assert that this is true before setting.

@@ +1475,5 @@
> + * In the newly created process, ResetContentChildTransport() is called to
> + * reset fd for the IPC Channel and the session.
> + */
> +void NS_EXPORT
> +AddNewIPCProcess(pid_t aPid, NuwaProtoFdInfo *aInfoList, int aInfoSize)

Here and elsewhere sizes should be size_t, not int.

::: dom/ipc/ContentChild.h
@@ +194,5 @@
>                                        const nsString& aVolumeName,
>                                        const int32_t& aState,
>                                        const int32_t& aMountGeneration);
>  
> +    virtual bool RecvNuwaFork();

MOZ_OVERRIDE on all your overrides.

::: dom/ipc/ContentParent.cpp
@@ +207,5 @@
> +
> +// XXX Moved to PreallocatedProcessManager. Temporarily add them back. Should be
> +// removed after Nuwa preallocation is integrated into
> +// PreallocatedProcessManager.
> +// initialization off the critical path of app startup.

Is this cleaned up in some later patch? I don't think we should check this kind of comment in.

@@ +233,5 @@
> +// sNuwaProcess points to the Nuwa process which is used for forking new
> +// processes later.
> +static StaticRefPtr<ContentParent> sNuwaProcess;
> +// Nuwa process is ready for creating new process.
> +static bool sNuwaReadyOnParent = false;

Nit: The 'OnParent' is pretty redundant here.

@@ +234,5 @@
> +// processes later.
> +static StaticRefPtr<ContentParent> sNuwaProcess;
> +// Nuwa process is ready for creating new process.
> +static bool sNuwaReadyOnParent = false;
> +static Monitor *sSpareMonitor = nullptr;

This will show up in our leak stats... I think we need a StatucAutoPtr<Monitor> with a corresponding ClearAtShutdown to be correct. We could theoretically use OffTheBooksMutex but we'd need a OffTheBooksConvVar to go with it so, meh...

@@ +235,5 @@
> +static StaticRefPtr<ContentParent> sNuwaProcess;
> +// Nuwa process is ready for creating new process.
> +static bool sNuwaReadyOnParent = false;
> +static Monitor *sSpareMonitor = nullptr;
> +static nsAutoTArray<nsRefPtr<ContentParent>, 4> sSpareProcesses;

This needs to be StaticAutoPtr<...> with ClearAtShutdown too.

Also, why 4?

@@ +236,5 @@
> +// Nuwa process is ready for creating new process.
> +static bool sNuwaReadyOnParent = false;
> +static Monitor *sSpareMonitor = nullptr;
> +static nsAutoTArray<nsRefPtr<ContentParent>, 4> sSpareProcesses;
> +static std::list<CancelableTask*> sNuwaForkWaitTasks;

We never expect this to have more than a handful, right? I'd just use nsTArray again I think.

@@ +238,5 @@
> +static Monitor *sSpareMonitor = nullptr;
> +static nsAutoTArray<nsRefPtr<ContentParent>, 4> sSpareProcesses;
> +static std::list<CancelableTask*> sNuwaForkWaitTasks;
> +
> +#define NUWA_FORK_WAIT_DURATION 2000 // 2 seconds.

#defines and statics that aren't tied to a class should probably get moved back up to the top of the file right after the #includes.

@@ +243,5 @@
> +
> +static Monitor &
> +GetSpareMonitor() {
> +    if (sSpareMonitor == nullptr) {
> +        sSpareMonitor = new Monitor("mozilla.com.ContentParent.sSpareMonitor");

You should assert that the first time you get here you're always on the main thread. Otherwise this is a race.

@@ +282,5 @@
> +
> +    if (sPreallocateAppProcessTask) {
> +        // We were called directly while a delayed task was scheduled.
> +        sPreallocateAppProcessTask->Cancel();
> +        sPreallocateAppProcessTask = nullptr;

This leaks.

@@ +286,5 @@
> +        sPreallocateAppProcessTask = nullptr;
> +    }
> +
> +    sNuwaProcess =
> +        new ContentParent(/*aApp=*/nullptr,

Nit: Let's do it like this:

  new ContentParent(/* aApp = */ nullptr,

@@ +300,5 @@
> +
> +/*static*/ void
> +ContentParent::DelayedNuwaProcess()
> +{
> +    sPreallocateAppProcessTask = nullptr;

This leaks, and you should probably call Cancel too

@@ +309,5 @@
> +
> +/*static*/ void
> +ContentParent::ScheduleDelayedNuwaProcess()
> +{
> +    if (!sKeepAppProcessPreallocated || sPreallocateAppProcessTask) {

What about if sNuwaProcess already exists? You don't want to start this sequence all over again.

@@ +329,5 @@
> +    }
> +}
> +
> +/*static*/ void
> +ContentParent::ScheduleDelayedNuwaFork()

What prevents you from calling this more than once?

@@ +334,5 @@
> +{
> +    MessageLoop::current()->
> +        PostDelayedTask(FROM_HERE,
> +                        NewRunnableFunction(DelayedNuwaFork),
> +                        sPreallocateDelayMs);

Doesn't this mean that you're actually going to wait for |2*sPreallocateDelayMs| here?

@@ +364,5 @@
>  }
>  
>  /*static*/ void
> +
> +

Nit: Remove these extra lines

@@ +371,5 @@
>      if (XRE_GetProcessType() != GeckoProcessType_Default) {
>          return;
>      }
>  
> +    sKeepAppProcessPreallocated = true; // XXX: remove!

What's this all about?

@@ +630,5 @@
>           cp = cp->getNext()) {
>          aArray.AppendElement(cp);
>      }
> +
> +    if (sSpareProcesses.Length()) {

Nit: This is fine but IsEmpty() is actually what you care about.

@@ +901,5 @@
>  
>  void
> +ContentParent::OnNuwaForkTimeout()
> +{
> +    sNuwaForkWaitTasks.pop_front();

This will leak the task, won't it? And how do you know that the front one is the one that timed out?

@@ +905,5 @@
> +    sNuwaForkWaitTasks.pop_front();
> +
> +    // We haven't RecvAddNewProcess() after SendNuwaFork(). Maybe the main
> +    // thread of the Nuwa process is in deadlock.
> +    MOZ_ASSERT(!"Can't fork from the nuwa process.");

In general you should not assert things that we know can happen. This should at most be a warning.

@@ +1171,5 @@
>                               bool aIsForBrowser,
>                               bool aIsForPreallocated,
>                               ChildPrivileges aOSPrivileges,
> +                             ProcessPriority aInitialPriority /* = PROCESS_PRIORITY_FOREGROUND */,
> +                             bool aIsNuwaProcess)

Nit: Go ahead and add /* = false */

@@ +1280,5 @@
> + * This constructor is used for new content process cloned from a template.
> + *
> + * For Nuwa.
> + */
> +ContentParent::ContentParent(ContentParent *aTemplate,

I don't see you adding this to the sContentParents list. Why not?

@@ +1297,5 @@
> +    , mNumDestroyingTabs(0)
> +    , mIsAlive(true)
> +    , mIsDestroyed(false)
> +    , mSendPermissionUpdates(false)
> +    , mIsForBrowser(false)

I get nervous with multiple constructors on objects that have lots of initialization to do (the chance that they get out-of-sync is pretty high). Can you make an 'InitializeMembers' method, inlined in the header, that can take care of these and then make each constructor call it first thing?

@@ +1300,5 @@
> +    , mSendPermissionUpdates(false)
> +    , mIsForBrowser(false)
> +{
> +#ifndef MOZ_NUWA_PROCESS
> +    NS_RUNTIMEABORT("Nuwa process not enabled!");

Hm, why isn't this whole constructor #ifdef MOZ_NUWA_PROCESS? We'll never call this if it's not defined, right?

@@ +1331,5 @@
> +
> +    // Set the subprocess's priority (bg if we're a preallocated process, fg
> +    // otherwise).  We do this first because we're likely /lowering/ its CPU and
> +    // memory priority, which it has inherited from this process.
> +    if (Preferences::GetBool("dom.ipc.processPriorityManager.enabled")) {

You should double check with jlebar but I don't think you need to worry about checking this preference yourself. You should just be able to call SetProcessPriority and it will silently do nothing if the preferences aren't set properly.

@@ +1333,5 @@
> +    // otherwise).  We do this first because we're likely /lowering/ its CPU and
> +    // memory priority, which it has inherited from this process.
> +    if (Preferences::GetBool("dom.ipc.processPriorityManager.enabled")) {
> +        ProcessPriority priority;
> +        if (aAppManifestURL == MAGIC_PREALLOCATED_APP_MANIFEST_URL) {

Nit: Please use IsPreallocated()

@@ +1371,5 @@
>          MOZ_ASSERT(!sAppContentParents ||
>                     sAppContentParents->Get(mAppManifestURL) != this);
>      }
> +
> +    delete mSubprocess;

This seems dangerous... There's a pretty complicated destruction process for mSubprocess, are you sure that this is safe?

If it is then we should change mSubprocess into an nsAutoPtr to make this automatic.

@@ +1592,5 @@
>  bool
>  ContentParent::RecvFirstIdle()
>  {
> +#ifdef MOZ_NUWA_PROCESS
> +    if (sSpareProcesses.Length() == 0 && sNuwaReadyOnParent) {

Nit: IsEmpty()

@@ +1593,5 @@
>  ContentParent::RecvFirstIdle()
>  {
> +#ifdef MOZ_NUWA_PROCESS
> +    if (sSpareProcesses.Length() == 0 && sNuwaReadyOnParent) {
> +        sNuwaProcess->SendNuwaFork();

The old code waits a bit here... Should this be DelayedNuwaFork?

@@ +1706,5 @@
> +    NS_ASSERTION(!sNuwaReadyOnParent, "Nuwa process have created twice or more!");
> +    ProcessPriorityManager::SetProcessPriority(sNuwaProcess,
> +                                               hal::PROCESS_PRIORITY_FOREGROUND);
> +    sNuwaReadyOnParent = true;
> +    SendNuwaFork();

Hm, what if we haven't received the first idle? Is this order guaranteed to be consistent or is there a race here?

Also, we're basically doing waiting for the child to say "I'm ready" and then telling the child to start the fork sequence. Why do we need that latency? Couldn't the child just start forking itself as soon as it's ready?

@@ +1725,5 @@
> +    content->Init();
> +    PublishSpareProcess(content);
> +    return true;
> +#else
> +    NS_RUNTIMEABORT("Not implemented!");

This means that a mischievous child process could kill the main process. Warning and then returning false is all you need to do here.

@@ +2863,5 @@
> +ContentParent::PublishSpareProcess(ContentParent *aContent)
> +{
> +    MonitorAutoLock lock(GetSpareMonitor());
> +
> +    sNuwaForkWaitTasks.front()->Cancel();

Please modify sNuwaForkWaitTasks before grabbing the mutex. Otherwise helgrind will warn that sometimes you use the lock and sometimes you don't.

Also, what guarantees the ordering here? How do you know that the front task is the one that should be canceled?

@@ +2864,5 @@
> +{
> +    MonitorAutoLock lock(GetSpareMonitor());
> +
> +    sNuwaForkWaitTasks.front()->Cancel();
> +    sNuwaForkWaitTasks.pop_front();

This will leak the task.

@@ +2868,5 @@
> +    sNuwaForkWaitTasks.pop_front();
> +
> +    sSpareProcesses.AppendElement(aContent);
> +
> +    lock.Notify();

Hm, I can't find anywhere where this lock is useful. All of the calls to lock it seem to be on the main thread, and I don't see a matching Wait() for this Notify() anywhere. Is there something in another patch that needs this? Otherwise we should be able to get rid of the lock entirely!

@@ +2879,5 @@
> +ContentParent::GetSpareProcess()
> +{
> +    MonitorAutoLock lock(GetSpareMonitor());
> +
> +    if (sSpareProcesses.Length() == 0) {

Nit: IsEmpty(), below too.

@@ +2884,5 @@
> +        return nullptr;
> +    }
> +
> +    nsRefPtr<ContentParent> process = sSpareProcesses.ElementAt(0);
> +    sSpareProcesses.RemoveElementAt(0);

Shouldn't you use the most recently spare process? That one is most likely to be in the cpu cache I'd think.

@@ +2898,5 @@
> +
> +void
> +ContentParent::MaybeForgetSpare(ContentParent *aContent)
> +{
> +    if (sSpareProcesses.RemoveElement(aContent)) {

If the lock turns out to be important then you would need to lock here.

::: dom/ipc/ContentParent.h
@@ +165,5 @@
>       * ContentParent based on the value returned here.
>       */
>      void FriendlyName(nsAString& aName);
>  
> +    virtual void OnChannelError();

MOZ_OVERRIDE on all your overrides

@@ +182,5 @@
>      static void JoinProcessesIOThread(const nsTArray<ContentParent*>* aProcesses,
>                                        Monitor* aMonitor, bool* aDone);
>  
> +    static void RunNuwaProcess();
> +    static void DelayedNuwaProcess();

This seems like it should be hidden. It's only called by ScheduleDelayedNuwaProcess, right?

@@ +184,5 @@
>  
> +    static void RunNuwaProcess();
> +    static void DelayedNuwaProcess();
> +    static void ScheduleDelayedNuwaProcess();
> +    static void DelayedNuwaFork();

Same here.

@@ +185,5 @@
> +    static void RunNuwaProcess();
> +    static void DelayedNuwaProcess();
> +    static void ScheduleDelayedNuwaProcess();
> +    static void DelayedNuwaFork();
> +    static void ScheduleDelayedNuwaFork();

Nit: newline after this.

@@ +208,5 @@
>                    bool aIsForBrowser,
>                    bool aIsForPreallocated,
>                    ChildPrivileges aOSPrivileges = base::PRIVILEGES_DEFAULT,
> +                  hal::ProcessPriority aInitialPriority = hal::PROCESS_PRIORITY_FOREGROUND,
> +                  bool aIsNuwaProcess = false);

Nit: newline here too.

@@ +209,5 @@
>                    bool aIsForPreallocated,
>                    ChildPrivileges aOSPrivileges = base::PRIVILEGES_DEFAULT,
> +                  hal::ProcessPriority aInitialPriority = hal::PROCESS_PRIORITY_FOREGROUND,
> +                  bool aIsNuwaProcess = false);
> +    ContentParent(ContentParent *aTemplate,

Nit: * and & on the left.

@@ +252,5 @@
>      void ShutDownProcess();
>  
> +    static void PublishSpareProcess(ContentParent *aContent);
> +    static already_AddRefed<ContentParent> GetSpareProcess();
> +    static void MaybeForgetSpare(ContentParent *aContent);

Why are these static methods? I think they could probably be just static functions in the file...

Maybe the problem is that there isn't a whole lot of consistency here. You have lots of file-level static variables in ContentParent.cpp, and then several static methods here in ContentParent.h, but they're all just used in ContentParent.cpp. In general I'm a fan of hiding implementation details (rather than exposing them) so can we just move all of this to file-level statics in ContentParent.cpp only?

@@ +481,5 @@
>  
>      nsRefPtr<nsConsoleService>  mConsoleService;
>      nsConsoleService* GetConsoleService();
> +
> +    friend class nsAutoPtr<ContentParent>; // for sSpareProcesses

Please remove.

::: dom/ipc/ContentProcess.cpp
@@ +7,5 @@
>  #include "mozilla/ipc/IOThreadChild.h"
>  
>  #include "ContentProcess.h"
>  
> +#if defined(OS_POSIX)

Is any of this necessary? You don't seem to use any of these things...

::: dom/ipc/PContent.ipdl
@@ +357,5 @@
>      FileSystemUpdate(nsString fsName, nsString mountPoint, int32_t fsState,
>                       int32_t mountGeneration);
>  
> +    // Ask the Nuwa process to create a new child process.
> +    async NuwaFork();

Nit: async is the default, no need to specify it really. Below too.
Attachment #773213 - Flags: review?(bent.mozilla) → review-
Can we enable this by default on Linux and Mac as well?
(In reply to Andreas Gal :gal from comment #131)
> Can we enable this by default on Linux and Mac as well?

We'd have to test it extensively to make sure that all the undefined behavior it's relying on in pthreads works there too.
Attached patch Part 2: IPC and glue changes. (obsolete) — — Splinter Review
Updated patch per review comments.
Attachment #771298 - Attachment is obsolete: true
Attachment #783648 - Flags: review?(bent.mozilla)
(In reply to ben turner [:bent] from comment #127)
> Comment on attachment 771298 [details] [diff] [review]
> Part 2: IPC and glue changes.
> 
> Review of attachment 771298 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: ipc/chromium/src/chrome/common/child_process_host.cc
> 
> ::: ipc/chromium/src/chrome/common/ipc_channel.h
> @@ +119,5 @@
> > +  // Reset the client side of the socketpair.
> > +  void ResetPipeFileDescriptor(int fd);
> > +
> > +  // Get fd of this side of the socketpair.
> > +  int GetPipeFileDescriptor();
> 
> What's the difference between this and GetServerFileDescriptor() ? It seems
> like this is identical with the possible exception of an assertion that
> might not work.
> 
It's called on the client side. Using GetServerFileDescriptor() will produce an error log in debug build.

> ::: ipc/chromium/src/chrome/common/ipc_channel_posix.cc
> @@ +353,5 @@
> > +/**
> > + * Reset the file descriptor for communication with the peer.
> > + */
> > +void Channel::ChannelImpl::ResetPipeFileDescriptor(int fd) {
> > +  if (fd != pipe_) {
> 
> Can we assert this instead? At least we can assert that fd is valid, right?
> 
The whole if block here is actually dead code and will be removed. We don't need to reset fd here. It's done in the lower level wrappers after a process is forked (ReplaceIPC() in Nuwa.cpp in part 1 patch).

> @@ +354,5 @@
> > + * Reset the file descriptor for communication with the peer.
> > + */
> > +void Channel::ChannelImpl::ResetPipeFileDescriptor(int fd) {
> > +  if (fd != pipe_) {
> > +    int rv = dup2(fd, pipe_);
> 
> The docs I have see say that this can be interrupted, maybe need to do
> HANDLE_EINTR
> 
This is removed (dead code).

> @@ +355,5 @@
> > + */
> > +void Channel::ChannelImpl::ResetPipeFileDescriptor(int fd) {
> > +  if (fd != pipe_) {
> > +    int rv = dup2(fd, pipe_);
> > +    NS_ASSERTION(rv != -1, "dup2 fails");
> 
> This can't really be asserted, can it? Are you just assuming that if we get
> to the point where we have too many fds open we are doomed anyway? We
> probably need to call Close() if anything in here fails.
>
Same here.
 
> @@ +363,5 @@
> > +    flags |= FD_CLOEXEC;
> > +    rv = fcntl(pipe_, F_SETFL, flags);
> > +    NS_ASSERTION(rv != -1, "fcntl fails");
> > +
> > +    close(fd);
> 
> Hm, if we're going to dup and then immediately close the fd that we're
> passed why not just use it directly?
> 
Same here.

> @@ +882,5 @@
> > +  lock.Notify();
> > +}
> > +
> > +#ifdef OS_POSIX
> > +typedef nsUint64HashKey ProcessHostHashKeyType;
> 
> This doesn't seem to buy us much and makes the code harder to read... We
> should probably just have something like this at the top of the header file:
> 
>   #if defined(MOZ_NUWA_PROCESS) && !defined(OS_POSIX)
>   #error Nuwa only supported on posix!
>   #endif
> 
> And then use nsUint64HashKey directly.
> 
This is actually dead code. Caching of preallocated child process is done in ContentChild.cpp. The static methods here to cache GeckoExistingProcessHost instances are not used.

> @@ +888,5 @@
> > +#error "Unknow hash key type for process host"
> > +#endif
> > +typedef nsClassHashtable<ProcessHostHashKeyType,
> > +                         GeckoExistingProcessHost> ProcessHostMap;
> > +typedef nsAutoTArray<nsAutoPtr<GeckoExistingProcessHost>, 4> ProcessHostArray;
> 
> Why 4?
> 
This is removed (dead code).

> @@ +891,5 @@
> > +                         GeckoExistingProcessHost> ProcessHostMap;
> > +typedef nsAutoTArray<nsAutoPtr<GeckoExistingProcessHost>, 4> ProcessHostArray;
> > +
> > +static ProcessHostMap sNotReadyProcesses;
> > +static ProcessHostArray sSpareProcesses;
> 
> static XPCOM objects are not allowed. These need to be lazily constructed
> and then cleared at shutdown.
> 
Same here.
> @@ +914,5 @@
> > +  GeckoExistingProcessHost *host;
> > +  bool success = sNotReadyProcesses.Get(aProcess, &host);
> > +  if (!success)
> > +    return false;
> > +  sNotReadyProcesses.Remove(aProcess);
> 
> Woah, this will delete the object. You need to use RemoveAndForget().
> 
Same here.

> @@ +916,5 @@
> > +  if (!success)
> > +    return false;
> > +  sNotReadyProcesses.Remove(aProcess);
> > +
> > +  sSpareProcesses.AppendElement(host);
> 
> Hm, I don't see this being used anywhere. Is this all dead code?
> 
Same here.
Status: NEW → ASSIGNED
blocking-b2g: --- → koi?
Keywords: perf
Whiteboard: [MemShrink:P1] → [MemShrink:P1] [c= ]
(In reply to ben turner [:bent] from comment #128)
> Comment on attachment 770739 [details] [diff] [review]
> Part 3: IPC glue and IPDL codegen to support protocol cloning.
> 
> Review of attachment 770739 [details] [diff] [review]:
> -----------------------------------------------------------------
> @@ +135,5 @@
> > +/**
> > + * All async protocols should implement this interface.
> > + */
> > +class IProtocolAsync :
> > +        protected AsyncChannel::AsyncListener
> 
> Why can't you just extend AsyncListener/SyncListener/RPCListener instead of
> adding three new interfaces?
> 
Logically 'listener' and 'protocol' are two concepts. The channel listeners should be used to handle what happens with the channel, while things like getting protocol Id or cloning protocols should be handled in a separate concept. We know the status quo is that they are mixed and we see the following method in class AsyncListener:

// FIXME/bug 792652: this doesn't really belong here, but a
// large refactoring is needed to put it where it belongs.
virtual int32_t GetProtocolTypeId() = 0;

It's not the best thing to do and here we don't want to add more responsibility that doesn't belong to a listener. IprotocolAsync and the like should be itself a mixin class, but we don't have the QueryInterface() luxury of XPCOM. So to avoid type casting problems we just let it inherit the listeners. If you'd agree,  we can handle the GetProtocolTypeId() in a followup.

> @@ +146,5 @@
> > +     * new content process forking from a template process (Nuwa).
> > +     */
> > +    virtual IProtocolAsync *
> > +    CloneProtocol(AsyncChannel *aChannel,
> > +                  ProtocolCloneContext *aCtx);
> 
> I understand that your current model is for all actors to have this base
> implementation and then override if they support cloning. However, I think
> we're going to override this on a fair number of classes anyway, and using a
> base method like this means that we won't be able to tell from a crash
> report which protocol object failed to support cloning. I'd much prefer that
> we we make these pure-virtual and then override with *every* protocol so
> that we have good crash reports and maybe even good abort messages.
> 
We observed that when the Nuwa process is ready for spawning new processes, the number of active protocols is relatively small. That's why we provide a default implementation that aborts. In addidion, if we make these methods pure virtual, we will also need to implement them in ipdl tests. 

We will make the methods here pure virtual because the code generator implements these functions like:

mozilla::ipc::IProtocolRPC*
PContentParent::CloneProtocol(
        Channel* aChannel,
        mozilla::ipc::ProtocolCloneContext* aCtx)
{
    NS_RUNTIMEABORT("The actor of PContent is not allowed to be cloned");
    return 0;
}

Here is a sample crash stack (assuming PindexedDBParent doesn't implement cloning). We can tell which protocol doesn't implement the necessary clone function from the stack:

> #0  0x41c1a1ae in mozalloc_abort (msg=<value optimized out>) at memory/mozalloc/mozalloc_abort.cpp:30
> #1  0x4156fda8 in Abort (aSeverity=<value optimized out>, aStr=0x41c9db0a "The actor of PIndexedDB is not allowed to be cloned", aExpr=<value optimized out>, aFile=<value ptimized out>, aLine=260) at xpcom/base/nsDebugImpl.cpp:430
> #2  NS_DebugBreak (aSeverity=<value optimized out>, aStr=0x41c9db0a "The actor of PIndexedDB is not allowed to be cloned", aExpr=<value optimized out>, aFile=<value optimized out>, aLine=260) at xpcom/base/nsDebugImpl.cpp:387
> #3  0x41384f8a in mozilla::dom::indexedDB::PIndexedDBParent::CloneProtocol (this=<value optimized out>, aChannel=<value optimized out>, aCtx=<value optimized out>) at ipc/ipdl/PIndexedDBParent.cpp:260
> #4  0x413a84ba in mozilla::dom::PContentParent::CloneRouted (this=0x48b7ac00, aSource=0x478de400, aCtx=0xbecae548) at ipc/ipdl/PContentParent.cpp:1640
> #5  0x4130021c in ContentParent (this=0x48b7ac00, aTemplate=0x478de400, aAppManifestURL=..., aPid=11101, aFds=..., aOSPrivileges=base::PRIVILEGES_DEFAULT) at dom/ipc/ContentParent.cpp:1326
> #6  0x4130129a in mozilla::dom::ContentParent::RecvAddNewProcess (this=0x478de400, aPid=<value optimized out>, aFds=...) at dom/ipc/ContentParent.cpp:1724

> ::: ipc/ipdl/ipdl/lower.py
> @@ +1197,5 @@
> >          return ExprVar('mId')
> >  
> > +    def stateVar(self, actorThis=None):
> > +        if actorThis is not None:
> > +            return ExprSelect(actorThis, '->', 'mState')
> 
> I'm curious why this is needed. Do we have some kind of name collision
> somewhere?
> 
The added parameter is actually never used. We'll remove that.

> @@ +3882,5 @@
> >                   destroyshmem,
> >                   otherprocess,
> >                   getchannel,
> > +                 clonerouted,
> > +                 cloneprotocol,
> 
> I think this looks ok but I'd prefer to look over the generated code to make
> sure. Can you attach the output for both these functions in
> toplevel/non-toplevel actors?
> 
> @@ +4140,4 @@
> >                  iffailopen,
> >                  iffailalloc,
> > +                settrans,
> > +                addopened,
> 
> Same here.
Please refer to the attached PContentParent header and cpp file.
(In reply to ben turner [:bent] from comment #129)
> ::: dom/indexedDB/ipc/IndexedDBParent.cpp
> @@ +31,5 @@
> >  #include "IDBObjectStore.h"
> >  #include "IDBTransaction.h"
> >  
> > +#ifdef MOZ_NUWA_PROCESS
> > +#include "ipc/Nuwa.h"
> 
> Why is this needed?
> 
This isn't needed and will be removed. 

> @@ +116,5 @@
> >  }
> >  
> > +mozilla::ipc::IProtocolRPC *
> > +IndexedDBParent::CloneProtocol(Channel *aChannel,
> > +                               mozilla::ipc::ProtocolCloneContext *aCtx)
> 
> Assuming that this is defined in ipc/Nuwa.h, this won't compile if
> MOZ_NUWA_PROCESS isn't defined, right?
> 
Originally it needs Nuwa.h because class ProtocolCloneContext is misplaced in Nuwa.h and IndexedDBParent::CloneProtocol() needs to access aCtx. Now protocol cloning doesn't need Nuwa.h any more.

> I also worry that this is insufficient. IndexedDB makes lots of subactors
> very quickly, so if you're not adding CloneProtocol to them then I expect we
> will have intermittent crashes.
>
I don't think subactors are needed in the Nuwa process. IndexedDB is created indirectly in PreloadSlowThings() (SettingsManager.js calls IndexedDatabaseManager::InitWindowless). No further IDB operations are performed.

> @@ +119,5 @@
> > +IndexedDBParent::CloneProtocol(Channel *aChannel,
> > +                               mozilla::ipc::ProtocolCloneContext *aCtx)
> > +{
> > +    MOZ_ASSERT(mManagerContent != nullptr);
> > +    MOZ_ASSERT(mManagerTab == nullptr);
> 
> I don't know how this can be asserted... IDB can be used from both PContent
> and PBrowser, how can we know for sure that this is a content managee?
> 
Same as above, in the template process, IndexedDB is initialized in PreloadSlowThings(). It's manager is always PContent.

> @@ +120,5 @@
> > +                               mozilla::ipc::ProtocolCloneContext *aCtx)
> > +{
> > +    MOZ_ASSERT(mManagerContent != nullptr);
> > +    MOZ_ASSERT(mManagerTab == nullptr);
> > +    MOZ_ASSERT(!mDisconnected);
> 
> This too is probably not assertable... It is likely a race.
> 
Same here. IndexedDB is only initialized, not connected.

> @@ +127,5 @@
> > +    IndexedDBParent *parent = new IndexedDBParent(contentParent);
> > +
> > +    nsRefPtr<IDBFactory> factory;
> > +    nsresult rv = IDBFactory::Create(contentParent, getter_AddRefs(factory));
> > +    MOZ_ASSERT(rv == NS_OK);
> 
> There are lots of ways that IDBFactory::Create could fail (OOM is
> particularly likely). This can't be asserted and must be handled instead.
> 
> You should make 'parent' an nsAutoPtr once you handle this.
> 
> Or, better yet, why not just do:
> 
>   nsAutoPtr<IndexedDBParent> actor = contentParent->AllocPIndexedDBParent();
>   if (!actor || !contentParent->RecvPIndexedDBConstructor(actor)) {
>     return nullptr;
>   }
>   return actor.forget();
> 
> That way we go through the same code paths for future-proofing and you don't
> have to worry about partial-initialization.
> 
We'll try to use this pattern (at least IDB and CrashReporter works). We'll need to make AllocPXXX() and RecvPXXXConstructor() public for this pattern to work. There is still a problem with this pattern: the CloneProtocol() methods only has access to ContentParent directly from ProtocolCloneContext. If the protocol is not created by PcontenParent (e.g. PCookieServiceParent) then we can only dynamic_cast to get its manager.

> @@ +195,5 @@
> >    CheckWritePermission(const nsAString& aDatabaseName);
> >  
> > +  mozilla::ipc::IProtocolRPC *
> > +  CloneProtocol(Channel *aChannel,
> > +                mozilla::ipc::ProtocolCloneContext *aCtx);
> 
> Please add MOZ_OVERRIDE to everything that should be an override, here and
> everywhere.
> 
> Also, shouldn't this (and all other CloneProtocol functions) be #ifdef
> MOZ_NUWA_PROCESS? Otherwise we're adding a bunch of dead code.
> 
The ipdl codegen will generate default CloneProtocol() implementations for each protocol that default aborts (please refer to the attachment for generated code). Since its default implementation is generated anyway, I don't think surrounding it by #ifdef MOZ_NUWA_PROCESS a good idea.

> ::: dom/ipc/Blob.h
> @@ +232,5 @@
> >  
> >  } // namespace dom
> >  } // namespace mozilla
> >  
> > +#endif // mozilla_dom_ipc_Blob_h
> 
> Weird, is this a line ending change or something
It adds a blank new line at the end of this file.

> 
> ::: dom/ipc/CrashReporterParent.cpp
> @@ +34,5 @@
> > +CrashReporterParent::CloneProtocol(Channel* aChannel,
> > +                                   mozilla::ipc::ProtocolCloneContext *aCtx) {
> > +    // For Nuwa, we don't need the info set in SetChildData(). Just return a new
> > +    // instance of CrashReporterParent.
> > +    return new CrashReporterParent();
> 
> This seems fragile to me. Why isn't this information important for the Nuwa
> process?
> 
> FWIW it looks like mInitialized needs to be taken care of because it could
> abort the parent if the Nuwa process crashes I think.
> 
Crash reporter for content process doesn't use thread ID in SetChildData() (it's only for plugins). But we need to set mInitialized or it crashes in debug build. We'll change this use the IDB pattern.

> ::: dom/src/storage/DOMStorageIPC.cpp
> @@ +369,5 @@
> > +DOMStorageDBParent::CloneProtocol(Channel *aChannel,
> > +                                  mozilla::ipc::ProtocolCloneContext *aCtx)
> > +{
> > +  DOMStorageDBParent *newactor = new DOMStorageDBParent();
> > +  return newactor;
> 
> This looks fine for now but it will silently break if someone ever overrides
> ContentParent::RecvPStorageConstructor... I think we should try to always
> follow the pattern I suggested for IDB. Otherwise we're just begging for
> trouble down the road.
> 
> I'm actually going to wait for your comments on this idea before continuing
> with this review.
This is managed by PContent. We can make it use AllocPDOMStorageDB() and RecvPDOMStorageDBConstructor().
(In reply to Cervantes Yu from comment #140)
> (In reply to ben turner [:bent] from comment #129)
> 
> > @@ +127,5 @@
> > > +    IndexedDBParent *parent = new IndexedDBParent(contentParent);
> > > +
> > > +    nsRefPtr<IDBFactory> factory;
> > > +    nsresult rv = IDBFactory::Create(contentParent, getter_AddRefs(factory));
> > > +    MOZ_ASSERT(rv == NS_OK);
> > 
> > There are lots of ways that IDBFactory::Create could fail (OOM is
> > particularly likely). This can't be asserted and must be handled instead.
> > 
> > You should make 'parent' an nsAutoPtr once you handle this.
> > 
> > Or, better yet, why not just do:
> > 
> >   nsAutoPtr<IndexedDBParent> actor = contentParent->AllocPIndexedDBParent();
> >   if (!actor || !contentParent->RecvPIndexedDBConstructor(actor)) {
> >     return nullptr;
> >   }
> >   return actor.forget();
> > 
> > That way we go through the same code paths for future-proofing and you don't
> > have to worry about partial-initialization.
> > 
> We'll try to use this pattern (at least IDB and CrashReporter works). We'll
> need to make AllocPXXX() and RecvPXXXConstructor() public for this pattern
> to work. There is still a problem with this pattern: the CloneProtocol()
> methods only has access to ContentParent directly from ProtocolCloneContext.
> If the protocol is not created by PcontenParent (e.g. PCookieServiceParent)
> then we can only dynamic_cast to get its manager.

The code generated by the IPDL is arranged carefully. so all protocols can get their managers from the instance of ProtocolCloneContext.  If a manager is not in the ProtocolCloneContext, all you have to do is adding a new member variable at ProcolCloneContext, and assign a value to the variable during cloning the manager.
(In reply to Cervantes Yu from comment #140)
> (In reply to ben turner [:bent] from comment #129)
> 
> > @@ +120,5 @@
> > > +                               mozilla::ipc::ProtocolCloneContext *aCtx)
> > > +{
> > > +    MOZ_ASSERT(mManagerContent != nullptr);
> > > +    MOZ_ASSERT(mManagerTab == nullptr);
> > > +    MOZ_ASSERT(!mDisconnected);
> > 
> > This too is probably not assertable... It is likely a race.
> > 
> Same here. IndexedDB is only initialized, not connected.

I would like to start with more serious checks, rather than run into unknown error without check.  For now, we only check this scenario for B2G, so I left these checks here to make sure all following the scenario.  If some one later move this to more complicate scenario, he should check if all is fine for the new scenario.
Attachment #770739 - Attachment is obsolete: true
Attachment #786881 - Flags: review?(bent.mozilla)
Carry over r+ from :khuey.
Attachment #786886 - Flags: review+
Attachment #786886 - Flags: review?(bent.mozilla)
Attachment #770742 - Attachment is obsolete: true
Fix CookieServiceParent::CloneProtocol(): AllocPCookieService() and RecvPCookieServiceConstructor() should be called against the PNeckoParent instance in the same protocol tree.
Attachment #786886 - Attachment is obsolete: true
Attachment #786886 - Flags: review?(bent.mozilla)
Attachment #787395 - Flags: review+
Attachment #787395 - Flags: review?(bent.mozilla)
Comment on attachment 771296 [details] [diff] [review]
Part 6: support thread parasite of threads created in the template process.

Review of attachment 771296 [details] [diff] [review]:
-----------------------------------------------------------------

Uh... Well... yes, those are all the JS threads I know about.

I can't say for sure that we don't need a recreate function for any of these, but I can't think of anything that would need it. I assume condvars don't, and mutexes don't, as long as the thread calling NuwaMarkCurrentThread is not holding any mutexes at the time.  I assume NuwaMarkCurrentThread prevents the calling thread from continuing in the parent process.  If these assumptions are correct... well, that's all I can think of.  r=me.

The JS engine does use TLS (not in the two helper threads; only in threads that call JSAPI entry points), just in case that requires extra care.

Sorry for the long delay here.
Attachment #771296 - Flags: review?(jorendorff) → review+
Comment on attachment 783648 [details] [diff] [review]
Part 2: IPC and glue changes.

Review of attachment 783648 [details] [diff] [review]:
-----------------------------------------------------------------

This looks pretty good! Just need some answers to this:

::: ipc/chromium/src/chrome/common/child_process_host.cc
@@ +86,5 @@
>    return true;
>  }
>  
> +bool ChildProcessHost::CreateChannel(FileDescriptor& aFileDescriptor) {
> +  FileDescriptor::PlatformHandleType fd = aFileDescriptor.PlatformHandle();

fd is unused here?

::: ipc/chromium/src/chrome/common/child_process_host.h
@@ +14,5 @@
>  #include "base/waitable_event_watcher.h"
>  #include "chrome/common/child_process_info.h"
>  #include "chrome/common/ipc_channel.h"
>  
> +#include "mozilla/ipc/FileDescriptor.h"

Nit: Please forward-declare this only.

::: ipc/chromium/src/chrome/common/ipc_channel.h
@@ +116,5 @@
>    // Return the server side of the socketpair.
>    int GetServerFileDescriptor() const;
> +
> +  // Reset the client side of the socketpair.
> +  void ResetPipeFileDescriptor(int fd);

If this is the client side then please rename to 'ResetClientFileDescriptor'

@@ +119,5 @@
> +  // Reset the client side of the socketpair.
> +  void ResetPipeFileDescriptor(int fd);
> +
> +  // Get fd of this side of the socketpair.
> +  int GetPipeFileDescriptor();

My point before is that we should just remove the assertion in GetServerFileDescriptor so that we only have one method. The Client/Server distinction is pretty obvious here, but the we're adding this "Pipe" thing that doesn't make a whole lot of sense.

::: ipc/chromium/src/chrome/common/ipc_channel_posix.cc
@@ +355,5 @@
> +/**
> + * Reset the file descriptor for communication with the peer.
> + */
> +void Channel::ChannelImpl::ResetPipeFileDescriptor(int fd) {
> +  NS_ASSERTION(fd > 0 && fd == pipe_, "Invalid file descriptor");

I thought fd being 0 was valid sometimes? Shouldn't this be >= 0?

@@ +357,5 @@
> + */
> +void Channel::ChannelImpl::ResetPipeFileDescriptor(int fd) {
> +  NS_ASSERTION(fd > 0 && fd == pipe_, "Invalid file descriptor");
> +  mozilla::DebugOnly<struct stat> fdStat;
> +  NS_ASSERTION(fstat(fd, &fdStat) == 0, "Invalid file descriptor");

I don't think we should have a debug-only fstat call here. Checking that fd != -1 should be enough.

::: ipc/glue/GeckoChildProcessHost.h
@@ +168,5 @@
>    std::queue<IPC::Message> mQueue;
>  };
>  
> +#ifdef MOZ_NUWA_PROCESS
> +class GeckoExistingProcessHost : public GeckoChildProcessHost

Nit: Mark as MOZ_FINAL
Attachment #783648 - Flags: review?(bent.mozilla) → review-
Attached patch Part 5: PContent protocol changes (obsolete) — — Splinter Review
Updated patch per comment #130
Attachment #773213 - Attachment is obsolete: true
Attachment #791237 - Flags: review?(bent.mozilla)
Comment on attachment 791237 [details] [diff] [review]
Part 5: PContent protocol changes

carry over r+ from :khuey
Attachment #791237 - Flags: review+
(In reply to ben turner [:bent] from comment #130)
> Comment on attachment 773213 [details] [diff] [review]
> Part 5: PContent protocol changes
> 
> @@ +1327,5 @@
> > +    // from NuwaCheckpointCurrentThread() correctly on the recreated
> > +    // thread.
> > +    unsigned reserved[RESERVED_INT_STACK];
> > +    bool rv;
> > +    jmp_buf * volatile env = aEnv;
> 
> What does this do?
> 
We need to save aEnv on the stack for the use in the if() block below. aEnv can (and on Arm, is expected to) be  passed using a register, while the content of the register will be invalid when we longjmp() back to rebuild the stack.  That's why we use the copy on the stack instead of the argument directly.

> @@ +1330,5 @@
> > +    bool rv;
> > +    jmp_buf * volatile env = aEnv;
> > +    unsigned *sentinel = reserved;
> > +
> > +    *sentinel = (unsigned)&sentinel ^ 0xdeadbeef;
> 
> This is just to prevent the array from being optimized away, right? Isn't
> there a better way to tell gcc that it must keep 'reserved' around?
> 
It really serves as the sentinel for runtime check whether we reserve enough stack. RunNuwaFork() calls DoCheckpoint(), which snapshots the stack for recreating the thread after forking a new process. RunNuwaFork() then calls DoNuwaFork() to fork a new process, which will use the reserved space we allocate here. If we don't reserve enough stack, the snapshot are garbled by DoNuwaFork() and recreated thread stack contains garbage and will lead to undefined behavior. That's why we place a sentinel here to check the validity of the snapshot.

> @@ +1333,5 @@
> > +
> > +    *sentinel = (unsigned)&sentinel ^ 0xdeadbeef;
> > +
> > +    NS_ASSERTION(NuwaCheckpointCurrentThread != nullptr,
> > +                 "NuwaCheckpointCurrentThread() is not available!");
> 
> This assertion doesn't tell us anything.
> 
Methods in Nuwa.h are declared as having weak linkage. We check whether these functions are available during runtime.

> @@ +1410,4 @@
> >  } // namespace dom
> >  } // namespace mozilla
> > +
> > +extern "C" {
> 
> I don't understand why this is needed.
> 
The functions here are called from Nuwa.cpp. They perform IPC work and do not belong in Nuwa.cpp (in libmozglue.so) so we place them here.

> @@ +1412,5 @@
> > +
> > +extern "C" {
> > +
> > +#if defined(MOZ_NUWA_PROCESS)
> > +void NS_EXPORT
> 
> Or this.
> 
Same here.

> @@ +1413,5 @@
> > +extern "C" {
> > +
> > +#if defined(MOZ_NUWA_PROCESS)
> > +void NS_EXPORT
> > +GetProtoFdInfos(NuwaProtoFdInfo *aInfoList, int size, int *aInfoSize)
> 
> Where is this used?
> 
Same here.

> @@ +235,5 @@
> > +static StaticRefPtr<ContentParent> sNuwaProcess;
> > +// Nuwa process is ready for creating new process.
> > +static bool sNuwaReadyOnParent = false;
> > +static Monitor *sSpareMonitor = nullptr;
> > +static nsAutoTArray<nsRefPtr<ContentParent>, 4> sSpareProcesses;
> 
> This needs to be StaticAutoPtr<...> with ClearAtShutdown too.
> 
> Also, why 4?
> 
4 is a heuristic that should be both small that we don't waste too much space and large enough that we will only use the inline storage in nsAutoTArray.

> @@ +282,5 @@
> > +
> > +    if (sPreallocateAppProcessTask) {
> > +        // We were called directly while a delayed task was scheduled.
> > +        sPreallocateAppProcessTask->Cancel();
> > +        sPreallocateAppProcessTask = nullptr;
> 
> This leaks.
> 
We don't need the if(sPreallocateAppProcessTask) block here. It's only called in the delayed task callback, ContentParent::DelayedNuwaProcess() and it doesn't leak because the message loop deletes the task after it's been run.

> @@ +300,5 @@
> > +
> > +/*static*/ void
> > +ContentParent::DelayedNuwaProcess()
> > +{
> > +    sPreallocateAppProcessTask = nullptr;
> 
> This leaks, and you should probably call Cancel too
> 
Same as above, this is the delayed task callback. We are already scheduled and the message loop will delete the task.

> @@ +309,5 @@
> > +
> > +/*static*/ void
> > +ContentParent::ScheduleDelayedNuwaProcess()
> > +{
> > +    if (!sKeepAppProcessPreallocated || sPreallocateAppProcessTask) {
> 
> What about if sNuwaProcess already exists? You don't want to start this
> sequence all over again.
> 
> @@ +329,5 @@
> > +    }
> > +}
> > +
> > +/*static*/ void
> > +ContentParent::ScheduleDelayedNuwaFork()
> 
> What prevents you from calling this more than once?
> 
> @@ +334,5 @@
> > +{
> > +    MessageLoop::current()->
> > +        PostDelayedTask(FROM_HERE,
> > +                        NewRunnableFunction(DelayedNuwaFork),
> > +                        sPreallocateDelayMs);
> 
> Doesn't this mean that you're actually going to wait for
> |2*sPreallocateDelayMs| here?
> 
For the above 3 comments, we simplify the design regarding these functions. DelayedNuwaFork() and ScheduleDelayedNuwaFork() are removed to simplify the design and resolve the double delay problem. ScheduleDelayedNuwaFork() becomes the entry for requesting delayed fork request, and there will only be one scheduled delayed task.

> @@ +371,5 @@
> >      if (XRE_GetProcessType() != GeckoProcessType_Default) {
> >          return;
> >      }
> >  
> > +    sKeepAppProcessPreallocated = true; // XXX: remove!
> 
> What's this all about?
> 
It's useless and is removed.

> @@ +901,5 @@
> >  
> >  void
> > +ContentParent::OnNuwaForkTimeout()
> > +{
> > +    sNuwaForkWaitTasks.pop_front();
> 
> This will leak the task, won't it? And how do you know that the front one is
> the one that timed out?
> 
Same as above, tasks will be freed by the message loop so no leak concerns.

The fork requests are processed in a FIFO manner in the message loop, IPC, and in the Nuwa process. Even this is not the case, it doesn't hurt because what we care here is that “some fork request times out”, which means some unknow IPC request is made to the Nuwa process and causes deadlock in it. That's what we worry about and thus use the tasks to check that.

> @@ +905,5 @@
> > +    sNuwaForkWaitTasks.pop_front();
> > +
> > +    // We haven't RecvAddNewProcess() after SendNuwaFork(). Maybe the main
> > +    // thread of the Nuwa process is in deadlock.
> > +    MOZ_ASSERT(!"Can't fork from the nuwa process.");
> 
> In general you should not assert things that we know can happen. This should
> at most be a warning.
> 
If the task times out then we consider the Nuwa process has deadlock in the main thread or in the IPC thread (they are the only 2 threads that are kept alive for the Nuwa process to work correctly), which we consider a bug. Instead of generating warnings that are easily ignored, we fail in an obvious manner and force the user to notice this bug.

> @@ +1280,5 @@
> > + * This constructor is used for new content process cloned from a template.
> > + *
> > + * For Nuwa.
> > + */
> > +ContentParent::ContentParent(ContentParent *aTemplate,
> 
> I don't see you adding this to the sContentParents list. Why not?
> 
It's an error. We'll change that.

> @@ +1297,5 @@
> > +    , mNumDestroyingTabs(0)
> > +    , mIsAlive(true)
> > +    , mIsDestroyed(false)
> > +    , mSendPermissionUpdates(false)
> > +    , mIsForBrowser(false)
> 
> I get nervous with multiple constructors on objects that have lots of
> initialization to do (the chance that they get out-of-sync is pretty high).
> Can you make an 'InitializeMembers' method, inlined in the header, that can
> take care of these and then make each constructor call it first thing?
> 
I add  InitializeMembers() for the common member initialization lists. There are still redundancy in both ctors. Maybe we need to include them.

> @@ +1300,5 @@
> > +    , mSendPermissionUpdates(false)
> > +    , mIsForBrowser(false)
> > +{
> > +#ifndef MOZ_NUWA_PROCESS
> > +    NS_RUNTIMEABORT("Nuwa process not enabled!");
> 
> Hm, why isn't this whole constructor #ifdef MOZ_NUWA_PROCESS? We'll never
> call this if it's not defined, right?
> 
Right. We'll change that.

> @@ +1371,5 @@
> >          MOZ_ASSERT(!sAppContentParents ||
> >                     sAppContentParents->Get(mAppManifestURL) != this);
> >      }
> > +
> > +    delete mSubprocess;
> 
> This seems dangerous... There's a pretty complicated destruction process for
> mSubprocess, are you sure that this is safe?
> 
> If it is then we should change mSubprocess into an nsAutoPtr to make this
> automatic.
> 
This is not necessary. mSubprocess is deleted on the IO thread using a task and mSubprocess is set to nullptr. That is, the deletion here is just a noop.

> @@ +1593,5 @@
> >  ContentParent::RecvFirstIdle()
> >  {
> > +#ifdef MOZ_NUWA_PROCESS
> > +    if (sSpareProcesses.Length() == 0 && sNuwaReadyOnParent) {
> > +        sNuwaProcess->SendNuwaFork();
> 
> The old code waits a bit here... Should this be DelayedNuwaFork?
> 
Changed to ScheduleDelayedNuwaFork() to not compete with the launching app.

> @@ +1706,5 @@
> > +    NS_ASSERTION(!sNuwaReadyOnParent, "Nuwa process have created twice or more!");
> > +    ProcessPriorityManager::SetProcessPriority(sNuwaProcess,
> > +                                               hal::PROCESS_PRIORITY_FOREGROUND);
> > +    sNuwaReadyOnParent = true;
> > +    SendNuwaFork();
> 
> Hm, what if we haven't received the first idle? Is this order guaranteed to
> be consistent or is there a race here?
> 
I think there is confusion here. This is sNuwaProcess receiving 'NuwaReady' message, not 'FirstIdle'. The Nuwa process doesn't have PBrowser protocol up and running.

> Also, we're basically doing waiting for the child to say "I'm ready" and
> then telling the child to start the fork sequence. Why do we need that
> latency? Couldn't the child just start forking itself as soon as it's ready?
> 
This is the initial NuwaFork operation: 1) we create the Nuwa process, 2) it runs until it's ready to receive NuwaFork request, 3) it sends out NuwaReady message, 4) parent sends out the NuwaFork reques. The delay compared to not having 3) and 4) where the child just forks without notifying the parent that it's ready is of 2 IPC messages, but the design is more complicated: we could receive the 'AddNewProcess' message when the Nuwa process is ready or when the parent sends out the 'NuwaFork' request.

Instead, we opt to make each request do one small thing and make the interaction clear and simple. The 'AddNewProcess' response is always the result of 'NuwaFork' request. The added delay is the cost we are willing to pay.

> 
> @@ +2863,5 @@
> > +ContentParent::PublishSpareProcess(ContentParent *aContent)
> > +{
> > +    MonitorAutoLock lock(GetSpareMonitor());
> > +
> > +    sNuwaForkWaitTasks.front()->Cancel();
> 
> Please modify sNuwaForkWaitTasks before grabbing the mutex. Otherwise
> helgrind will warn that sometimes you use the lock and sometimes you don't.
> 
> Also, what guarantees the ordering here? How do you know that the front task
> is the one that should be canceled?
> 
Same as above, tasks are processed in a FIFO manner. Even they aren't we are just fine.

> @@ +2864,5 @@
> > +{
> > +    MonitorAutoLock lock(GetSpareMonitor());
> > +
> > +    sNuwaForkWaitTasks.front()->Cancel();
> > +    sNuwaForkWaitTasks.pop_front();
> 
> This will leak the task.
> 
Same as above. The task will be deleted by the message loop.

> @@ +2868,5 @@
> > +    sNuwaForkWaitTasks.pop_front();
> > +
> > +    sSpareProcesses.AppendElement(aContent);
> > +
> > +    lock.Notify();
> 
> Hm, I can't find anywhere where this lock is useful. All of the calls to
> lock it seem to be on the main thread, and I don't see a matching Wait() for
> this Notify() anywhere. Is there something in another patch that needs this?
> Otherwise we should be able to get rid of the lock entirely!
> 
I don't think we need this monitor. It'll be removed.

> @@ +252,5 @@
> >      void ShutDownProcess();
> >  
> > +    static void PublishSpareProcess(ContentParent *aContent);
> > +    static already_AddRefed<ContentParent> GetSpareProcess();
> > +    static void MaybeForgetSpare(ContentParent *aContent);
> 
> Why are these static methods? I think they could probably be just static
> functions in the file...
> 
> Maybe the problem is that there isn't a whole lot of consistency here. You
> have lots of file-level static variables in ContentParent.cpp, and then
> several static methods here in ContentParent.h, but they're all just used in
> ContentParent.cpp. In general I'm a fan of hiding implementation details
> (rather than exposing them) so can we just move all of this to file-level
> statics in ContentParent.cpp only?
> 
Because in the end of the calling chain some ContentParent's private methods (ctor and SendNuwaFork()) are called. I removed almost all these static methods from the header and make them static in ContentParent.cpp, with the cost of 1) making ContentParent::RunNuwaProcess() public and place a stronger assertion in it and 2) making ContentParent::SendNuwaFork() public. But I think this is the cost we are willing to pay.
(In reply to ben turner [:bent] from comment #130)
> Comment on attachment 773213 [details] [diff] [review]
> Part 5: PContent protocol changes
> @@ +1331,5 @@
> > +
> > +    // Set the subprocess's priority (bg if we're a preallocated process, fg
> > +    // otherwise).  We do this first because we're likely /lowering/ its CPU and
> > +    // memory priority, which it has inherited from this process.
> > +    if (Preferences::GetBool("dom.ipc.processPriorityManager.enabled")) {
> 
> You should double check with jlebar but I don't think you need to worry
> about checking this preference yourself. You should just be able to call
> SetProcessPriority and it will silently do nothing if the preferences aren't
> set properly.
> 
Justin, I think the check of "dom.ipc.processPriorityManager.enabled" property is already done in calling ProcessPriorityManager::SetProcessPriority(), right?
Flags: needinfo?(justin.lebar+bug)
Attachment #784947 - Attachment mime type: text/x-chdr → text/plain
Attachment #784948 - Attachment mime type: text/x-c++src → text/plain
Comment on attachment 786881 [details] [diff] [review]
Part 3: IPC glue and IPDL codegen to support protocol cloning.

Review of attachment 786881 [details] [diff] [review]:
-----------------------------------------------------------------

I have been over this code a bunch of times now and I can't figure out why ITopLevelProtocol exists. It looks like CloneToplevel is unused and CloneOpenedToplevels basically just duplicates the work that the IPDL-generated CloneRouted already does. What am I missing here?

::: ipc/glue/ProtocolUtils.cpp
@@ +16,5 @@
>  
>  namespace mozilla {
>  namespace ipc {
>  
> +IToplevelProtocol::~IToplevelProtocol() {

Nit: { on its own line, everywhere.

::: ipc/glue/ProtocolUtils.h
@@ +45,5 @@
> +class ContentParent;
> +}
> +}
> +
> +namespace mozilla {

Nit: No need to close and then re-enter the mozilla namespace here.

@@ +77,5 @@
>      Action mAction;
>      int32_t mMsg;
>  };
>  
> +class ProtocolCloneContext

Why does this need to exist if it only holds a single ContentParent* ? You can just pass it instead.

@@ +84,5 @@
> +
> +  ContentParent* mContentParent;
> +
> +public:
> +  ProtocolCloneContext() : mContentParent(NULL)

Nit: nullptr

@@ +137,5 @@
> +/**
> + * All async protocols should implement this interface.
> + */
> +class IProtocolAsync :
> +        protected AsyncChannel::AsyncListener

I understand your reasoning for keeping this in a separate interface so I'm ok with going this route if you want. However, please be aware of bug 901789 which will merge these different channel types. Please try to do whatever makes future changes easier for everyone.

Also, please fit this all on one line, and do it for the other classes too.

@@ +143,5 @@
> +public:
> +    /**
> +     * This function is used to clone this protocol actor.
> +     *
> +     * The function is called for recreate a new copy of actors for a

Nit: s/for recreate/to recreate/

@@ +169,5 @@
> +                  ProtocolCloneContext* aCtx) = 0;
> +};
> +
> +
> +/**

Nit: Only one newline above.

@@ +230,5 @@
> +        return mOpenActors.getFirst();
> +    }
> +
> +    virtual IToplevelProtocol*
> +    CloneToplevel(const InfallibleTArray<ProtocolFdMapping>& aFds,

I can't see where this is ever called. Dead code?

::: ipc/ipdl/ipdl/lower.py
@@ +1111,5 @@
>              fn = ExprSelect(actorThis, '->', fn.name)
>          return ExprCall(fn)
>  
> +    def cloneRouted(self):
> +        return ExprVar('CloneRouted')

This name should just be 'Clone'. The routed part isn't really helpful.

@@ +1113,5 @@
>  
> +    def cloneRouted(self):
> +        return ExprVar('CloneRouted')
> +
> +    def callCloneRouted(self, actorThis=None):

This is not used.

@@ +3761,5 @@
>                  p.callOtherProcess(p.managerVar())))
>              getchannel.addstmt(StmtReturn(p.channelVar()))
>  
> +        cloneprotocol.addstmts([
> +            _runtimeAbort('The actor of ' + 

Nit: trailing whitespace

@@ +3763,5 @@
>  
> +        cloneprotocol.addstmts([
> +            _runtimeAbort('The actor of ' + 
> +                          p.name +
> +                          ' is not allowed to be cloned'),

This should say "Clone() for 'PFooParent' has not yet been implemented".

"Not allowed" sounds like some kind of security warning.

@@ +3784,5 @@
> +            manageecxxtype = _cxxBareType(actortype, self.side)
> +            manageearray = p.managedVar(manageeipdltype, self.side)
> +            abortstmt = StmtIf(ExprBinary(actorvar, '==', ExprLiteral.NULL))
> +            abortstmt.addifstmts([
> +                _runtimeAbort('can not clone an ' + actortype.name() + ' actor'),

This isn't ok, we have to be able to handle failures in the parent process in some way other than aborting (otherwise your phone reboots!).

In general failures experienced in the parent result in the child process being killed, but I'm not sure what the best idea here is. Perhaps we should just kill the nuwa process and fall back to launching a new process from the scratch?

Regardless we can't simply abort.

@@ +3809,5 @@
> +                StmtExpr(ExprAssn(
> +                    _actorChannel(actorvar),
> +                    p.channelForSubactor())),
> +                StmtExpr(_callCxxArrayInsertSorted(manageearray, actorvar)),
> +                StmtExpr(ExprAssn(_actorState(actorvar), _actorState(ithkid))),

Please assign state before inserting into the array.

@@ +3821,5 @@
> +                ])
> +            block.addstmts([
> +                StmtDecl(Decl(_cxxArrayType(manageecxxtype, ref=1),
> +                              kidsvar.name),
> +                         init=ExprSelect(othervar, '->', manageearray.name)),

This is only questionably safe. You're assuming that the calling CloneProtocol() and CloneRouted() on 'kids[i]' can't modify the 'kids' array (you're iterating a reference to the array here, not a copy). Can we guarantee that? Given that each protocol object is going to implement their own CloneProtocol() function I think this assumption is going to be violated eventually. We either need strong debugging help here or we should just copy the array.
Attachment #786881 - Flags: review?(bent.mozilla) → review-
Attachment #784949 - Attachment mime type: text/x-chdr → text/plain
Attachment #784950 - Attachment mime type: text/x-c++src → text/plain
(In reply to Cervantes Yu from comment #151)
> (In reply to ben turner [:bent] from comment #130)
> > Comment on attachment 773213 [details] [diff] [review]
> > Part 5: PContent protocol changes
> > @@ +1331,5 @@
> > > +
> > > +    // Set the subprocess's priority (bg if we're a preallocated process, fg
> > > +    // otherwise).  We do this first because we're likely /lowering/ its CPU and
> > > +    // memory priority, which it has inherited from this process.
> > > +    if (Preferences::GetBool("dom.ipc.processPriorityManager.enabled")) {
> > 
> > You should double check with jlebar but I don't think you need to worry
> > about checking this preference yourself. You should just be able to call
> > SetProcessPriority and it will silently do nothing if the preferences aren't
> > set properly.
> > 
> Justin, I think the check of "dom.ipc.processPriorityManager.enabled"
> property is already done in calling
> ProcessPriorityManager::SetProcessPriority(), right?

It's pretty circuitous, but the only place that ProcessPriorityManager calls hal::SetProcessPriority from are ProcessPriorityManagerImpl::Init() and ParticularProcessPriorityManager::SetPriorityNow().

The latter is guarded by a call to ProcessPriorityManagerImpl::PrefsEnabled(), and the former is also guarded by that check, via StaticInit().

So yes, I think bent is right.
Flags: needinfo?(justin.lebar+bug)
(In reply to ben turner [:bent] (needinfo? encouraged) from comment #147)
> Comment on attachment 783648 [details] [diff] [review]
> Part 2: IPC and glue changes.
> 
> Review of attachment 783648 [details] [diff] [review]:
> -----------------------------------------------------------------
I'll update the patch once we're fine with how the changes to the IPC::Channel should be made.

> ::: ipc/chromium/src/chrome/common/ipc_channel.h
> @@ +116,5 @@
> >    // Return the server side of the socketpair.
> >    int GetServerFileDescriptor() const;
> > +
> > +  // Reset the client side of the socketpair.
> > +  void ResetPipeFileDescriptor(int fd);
> 
> If this is the client side then please rename to 'ResetClientFileDescriptor'
> 
Currently it is only on the client side, but for nested content process to work a content process may act as both a client and a server. Since file descriptor acts as the communication channel to the peer, we opt to use a name regardless of which side it is on.

> @@ +119,5 @@
> > +  // Reset the client side of the socketpair.
> > +  void ResetPipeFileDescriptor(int fd);
> > +
> > +  // Get fd of this side of the socketpair.
> > +  int GetPipeFileDescriptor();
> 
> My point before is that we should just remove the assertion in
> GetServerFileDescriptor so that we only have one method. The Client/Server
> distinction is pretty obvious here, but the we're adding this "Pipe" thing
> that doesn't make a whole lot of sense.
> 
Same as above, since we are going to get the file descriptor from the channel in both client and server, what about we rename GetServerFileDescriptor() to a neutral one and remove the check inside it?

> ::: ipc/chromium/src/chrome/common/ipc_channel_posix.cc
> @@ +355,5 @@
> > +/**
> > + * Reset the file descriptor for communication with the peer.
> > + */
> > +void Channel::ChannelImpl::ResetPipeFileDescriptor(int fd) {
> > +  NS_ASSERTION(fd > 0 && fd == pipe_, "Invalid file descriptor");
> 
> I thought fd being 0 was valid sometimes? Shouldn't this be >= 0?
> 
Theoretically yes. But practically file descriptor 0, 1, and 2 are better not used in the POSIX system, or we are running the risk of mixing valid data with anything going through stdin/stdout/stderr. Channel::ChannelImpl::CreatePipe() also has a check of fd > 0, but a checking of fd != -1 is enough here.
(In reply to ben turner [:bent] (needinfo? encouraged) from comment #152)
> Comment on attachment 786881 [details] [diff] [review]
> Part 3: IPC glue and IPDL codegen to support protocol cloning.
> 
> Review of attachment 786881 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I have been over this code a bunch of times now and I can't figure out why
> ITopLevelProtocol exists. It looks like CloneToplevel is unused and
> CloneOpenedToplevels basically just duplicates the work that the
> IPDL-generated CloneRouted already does. What am I missing here?
> 
IToplevelProtocol allows us to keep track of the toplevel protocols so that we can clone them. For the nuwa process, the opened toplevel actors will be added to PContentParent's mOpenedActors in the generated code.

The protocols forms a tree (incomplete) like this:

 PContent
   +
   |--->(managees)
   |        +
   |        |-->PNecko
   |        |      +
   |        |      +->(managees)
   |        |            +
   |        |            +----->PCookieService
   |        +-->PHal
   |
   +--->(mOpenedActors)
            +
            |--->PCompositor
            |
            +--->PImageBridge

The 'clone' methods are used in ContentParent::RecvAddNewProcess(), which is the beginning of protocol cloning. It calls the overloaded ContentParent ctor, which:
- calls PcontentParent::CloneRouted(): for each managee, calls CloneProtocol() to clone the managee and then CloneRouted() to descend into the subtree.
- calls CloneOpenedToplevels(), which iterates over the linked list consisting of toplevel protocols and calls CloneToplevel() against each of them.
- in CloneToplevel(), the protocol clones itself and then callls CloneRouted() to descend into its subtree.

> @@ +230,5 @@
> > +        return mOpenActors.getFirst();
> > +    }
> > +
> > +    virtual IToplevelProtocol*
> > +    CloneToplevel(const InfallibleTArray<ProtocolFdMapping>& aFds,
> 
> I can't see where this is ever called. Dead code?
> 
No, it's used in ProtocolUtils::CloneOpenedToplevels(), which is called in ContentParent's overloaded ctor to iterate over opened toplevel actors and initiates cloning with each of them.

> ::: ipc/ipdl/ipdl/lower.py
> @@ +1111,5 @@
> >              fn = ExprSelect(actorThis, '->', fn.name)
> >          return ExprCall(fn)
> >  
> > +    def cloneRouted(self):
> > +        return ExprVar('CloneRouted')
> 
> This name should just be 'Clone'. The routed part isn't really helpful.
> 
'Clone' doesn't fully reflect what this method does. Generated 'CloneRouted' is used to clone the managees, which calls into managees' CloneProtocol() and recursively

> @@ +3784,5 @@
> > +            manageecxxtype = _cxxBareType(actortype, self.side)
> > +            manageearray = p.managedVar(manageeipdltype, self.side)
> > +            abortstmt = StmtIf(ExprBinary(actorvar, '==', ExprLiteral.NULL))
> > +            abortstmt.addifstmts([
> > +                _runtimeAbort('can not clone an ' + actortype.name() + ' actor'),
> 
> This isn't ok, we have to be able to handle failures in the parent process
> in some way other than aborting (otherwise your phone reboots!).
> 
> In general failures experienced in the parent result in the child process
> being killed, but I'm not sure what the best idea here is. Perhaps we should
> just kill the nuwa process and fall back to launching a new process from the
> scratch?
> 
> Regardless we can't simply abort.
> 
The runtime abort here is used as an assertion. It's a bug (some protocol needs to be cloned but doesn't implement it). This is considered as a bug and cannot/shouldn't be handled at runtime; restarting the nuwa process is not going to help. If we are to bail out from this error then we can only fall back to default forking mechanism (likely to be unnoticed). That's why we choose to abort here. Can't we abort even for this reason?

> @@ +3821,5 @@
> > +                ])
> > +            block.addstmts([
> > +                StmtDecl(Decl(_cxxArrayType(manageecxxtype, ref=1),
> > +                              kidsvar.name),
> > +                         init=ExprSelect(othervar, '->', manageearray.name)),
> 
> This is only questionably safe. You're assuming that the calling
> CloneProtocol() and CloneRouted() on 'kids[i]' can't modify the 'kids' array
> (you're iterating a reference to the array here, not a copy). Can we
> guarantee that? Given that each protocol object is going to implement their
> own CloneProtocol() function I think this assumption is going to be violated
> eventually. We either need strong debugging help here or we should just copy
> the array.
It's hard to imagine that the actor would ever change the managee array in cloning, but we can make a copy of the array to make it safe.
Attachment #786881 - Attachment is obsolete: true
Attachment #795372 - Flags: review?(bent.mozilla)
Attached patch Part 2: IPC and glue changes. (obsolete) — — Splinter Review
Attachment #783648 - Attachment is obsolete: true
Attachment #796485 - Flags: review?(bent.mozilla)
Attachment #787395 - Attachment is obsolete: true
Attachment #787395 - Flags: review?(bent.mozilla)
Attachment #796490 - Flags: review?(bent.mozilla)
Comment on attachment 796490 [details] [diff] [review]
Part 4: clone IPC protocol objects that will be up when the template process is ready

carry over r+ from khuey
Attachment #796490 - Flags: review+
Attached patch Part 5: PContent protocol changes (obsolete) — — Splinter Review
Attachment #791237 - Attachment is obsolete: true
Attachment #791237 - Flags: review?(bent.mozilla)
Attachment #796492 - Flags: review?(bent.mozilla)
Comment on attachment 796492 [details] [diff] [review]
Part 5: PContent protocol changes

carry over r+ from khuey.
Attachment #796492 - Flags: review+
Attached patch Part 5: PContent protocol changes (obsolete) — — Splinter Review
Update per comment #130 plus changes rippled from other parts.
Attachment #796492 - Attachment is obsolete: true
Attachment #796492 - Flags: review?(bent.mozilla)
Attachment #796501 - Flags: review?(bent.mozilla)
Comment on attachment 796485 [details] [diff] [review]
Part 2: IPC and glue changes.

Review of attachment 796485 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great! r=me with these addressed:

::: ipc/chromium/Makefile.in
@@ +117,3 @@
>    $(NULL)
> +ifneq ($(MOZ_WIDGET_TOOLKIT),gonk)
> +CSRCS += epoll_sub.c

Hm, a very similar change is working its way through bug 915129. Can you remove this here and suggest any changes you'd like in that bug?

::: ipc/chromium/src/chrome/common/ipc_channel_posix.cc
@@ +31,5 @@
>  #include "chrome/common/file_descriptor_set_posix.h"
>  #include "chrome/common/ipc_logging.h"
>  #include "chrome/common/ipc_message_utils.h"
>  
> +#include "mozilla/DebugOnly.h"

It looks like this is unused now?
Attachment #796485 - Flags: review?(bent.mozilla) → review+
Comment on attachment 795372 [details] [diff] [review]
Part 3: IPC glue and IPDL codegen to support protocol cloning.

Review of attachment 795372 [details] [diff] [review]:
-----------------------------------------------------------------

I'm still nervous about a runtime abort when we hit an actor that we can't clone... Hopefully the crash reports will be enough to know if we ever mess this up!

r=me with these addressed:

::: ipc/glue/ProtocolUtils.cpp
@@ +27,5 @@
> +#ifdef DEBUG
> +  for (const IToplevelProtocol* actor = mOpenActors.getFirst();
> +       actor;
> +       actor = actor->getNext()) {
> +    NS_ASSERTION(actor->GetProtocolId() != aActor->GetProtocolId(),

I'm not sure that this is correct... Isn't it possible to have multiple actors per protocol id? Maybe you should just assert that the same aActor pointer doesn't get inserted more than once?

@@ +40,5 @@
> +IToplevelProtocol::CloneToplevel(const InfallibleTArray<ProtocolFdMapping>& aFds,
> +                                 base::ProcessHandle aPeerProcess,
> +                                 ProtocolCloneContext* aCtx)
> +{
> +  NS_ASSERTION(false, "Clone() for this protocol actor is not implementde");

Nit: NS_NOTREACHED instead of NS_ASSERTION

@@ +51,5 @@
> +                                        base::ProcessHandle aPeerProcess,
> +                                        ProtocolCloneContext* aCtx)
> +{
> +  for (IToplevelProtocol* actor = aTemplate->GetFirstOpenedActors();
> +       actor != nullptr;

Nit: No '!= null' needed, just check 'actor' like above.

::: ipc/glue/ProtocolUtils.h
@@ +205,5 @@
> +class IToplevelProtocol : public LinkedListElement<IToplevelProtocol>
> +{
> +protected:
> +    IToplevelProtocol(ProtocolId aProtoId): mProtocolId(aProtoId)
> +                                          , mTrans(nullptr) {

Nit: Please move the initializer list back to the left margin.

@@ +218,5 @@
> +    void AddOpenedActor(IToplevelProtocol* aActor);
> +
> +public:
> +    IToplevelProtocol* NextToplevel() { return getNext(); }
> +    const IToplevelProtocol* NextToplevel() const { return getNext(); }

Nit: Can you remove these and just call getNext() directly?

::: ipc/ipdl/ipdl/lower.py
@@ +1111,5 @@
>              fn = ExprSelect(actorThis, '->', fn.name)
>          return ExprCall(fn)
>  
> +    def cloneRouted(self):
> +        return ExprVar('CloneRouted')

How about we call this 'CloneManagees' then? 

(The reason 'routed' is bad is that all IPC messages have a routing id buried inside them, so 'Routed' gets a little confusing.)

@@ +3463,5 @@
> +        sourcevar = ExprVar('aSource')
> +        ivar = ExprVar('i')
> +        kidsvar = ExprVar('kids')
> +        ithkid = ExprIndex(kidsvar, ivar)
> +        clonecontexttype = Type('mozilla::ipc::ProtocolCloneContext', ptr=1)

Can you add 'mozilla::ipc::ProtocolCloneContext' to the list of typedefs that get added to the header, and then just use 'ProtocolCloneContext' here? That way subclasses can just use 'ProtocolCloneContext' rather than having to know the namespace info.
Attachment #795372 - Flags: review?(bent.mozilla) → review+
Comment on attachment 796490 [details] [diff] [review]
Part 4: clone IPC protocol objects that will be up when the template process is ready

Review of attachment 796490 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good! r=me with this:

::: dom/ipc/ContentParent.h
@@ +167,5 @@
>      void FriendlyName(nsAString& aName);
>  
> +    virtual PIndexedDBParent* AllocPIndexedDB();
> +    virtual bool
> +    RecvPIndexedDBConstructor(PIndexedDBParent* aActor);

Nit: Please add MOZ_OVERRIDE to these two.

::: dom/ipc/CrashReporterParent.cpp
@@ +43,5 @@
> +    nsAutoPtr<PCrashReporterParent> actor(
> +        contentParent->AllocPCrashReporter(childThreadId, childProcessType)
> +    );
> +    if (!actor ||
> +        !contentParent->RecvPCrashReporterConstructor(actor.get(),

Nit: no need for .get()

::: gfx/layers/ipc/CompositorParent.cpp
@@ +800,5 @@
>  
> +IToplevelProtocol*
> +CompositorParent::CloneToplevel(const InfallibleTArray<mozilla::ipc::ProtocolFdMapping>& aFds,
> +                                base::ProcessHandle aPeerProcess,
> +                                mozilla::ipc::ProtocolCloneContext* aCtx) {

Nit: { on its own line, here, below, and in ImageBridgeParent.

::: ipc/glue/Transport_win.cpp
@@ +60,5 @@
>  
> +Transport*
> +OpenDescriptor(const FileDescriptor& aFd, Transport::Mode aMode)
> +{
> +  NS_RUNTIMEABORT("Not implemented!");

NS_NOTREACHED should be ok here.
Attachment #796490 - Flags: review?(bent.mozilla) → review+
Comment on attachment 796501 [details] [diff] [review]
Part 5: PContent protocol changes

Review of attachment 796501 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with these addressed:

::: dom/ipc/ContentChild.cpp
@@ +1322,5 @@
> +    }
> +};
> +
> +static void
> +DoCheckpoint(jmp_buf *aEnv)

Nit: Please go through and move all your * to the left (next to the type).

@@ +1334,5 @@
> +    bool rv;
> +    jmp_buf * volatile env = aEnv;
> +    unsigned *sentinel = reserved;
> +
> +    *sentinel = (unsigned)&sentinel ^ 0xdeadbeef;

Nit: since you use 0xdeadbeef twice please #define it somewhere and then use that.

@@ +1360,5 @@
> +
> +    {
> +        nsCOMPtr<nsIRunnable> callSpawn(new CallNuwaSpawn());
> +        NS_ASSERTION(NuwaSpawn != nullptr,
> +                     "NuwaSpawn() is not available!");

Nit: new is infallible, no need to assert its result here.

@@ +1380,5 @@
> +RunNuwaFork()
> +{
> +    jmp_buf env;
> +
> +    if (setjmp(env) == 0) {

I don't understand why we wouldn't use sigsetjmp/siglongjmp. Don't we want our signal mask to persist in the forked process?

@@ +1414,5 @@
> +extern "C" {
> +
> +#if defined(MOZ_NUWA_PROCESS)
> +void NS_EXPORT
> +GetProtoFdInfos(NuwaProtoFdInfo *aInfoList, size_t size, size_t *aInfoSize)

Nit: Can you change 'size' to 'aInfoListSize'?

@@ +1447,5 @@
> +
> +class RunAddNewIPCProcess : public nsRunnable
> +{
> +public:
> +    RunAddNewIPCProcess(pid_t aPid) : mPid(aPid) {}

The normal way we pass nsTArray to other classes is like this:

  class RunAddNewIPCProcess : public nsRunnable
  {
    RunAddNewIPCProcess(pid_t aPid,
                        nsTArray<mozilla::ipc::ProtocolFdMapping>& aMap)
    : mPid(aPid)
    {
      mMap.SwapElements(aMap);
    }
    // ...
  };

Then you just create a nsTArray on the stack, fill it, and then pass it to the constructor.

@@ +1450,5 @@
> +public:
> +    RunAddNewIPCProcess(pid_t aPid) : mPid(aPid) {}
> +    NS_IMETHOD Run()
> +    {
> +      bool sendok = mozilla::dom::ContentChild::GetSingleton()->

Nit: 'sendok' in unused.

@@ +1460,5 @@
> +      return NS_OK;
> +    }
> +
> +    pid_t mPid;
> +    nsTArray<mozilla::ipc::ProtocolFdMapping> mMaps;

These should be private.

::: dom/ipc/ContentParent.cpp
@@ +131,5 @@
>  
>  static NS_DEFINE_CID(kCClipboardCID, NS_CLIPBOARD_CID);
>  static const char* sClipboardTextFlavors[] = { kUnicodeMime };
>  
> +#define NUWA_FORK_WAIT_DURATION 2000 // 2 seconds.

Nit: NUWA_FORK_WAIT_DURATION_MS makes it clear what the time is.

@@ +234,5 @@
>  // Can't be a static constant.
>  #define MAGIC_PREALLOCATED_APP_MANIFEST_URL NS_LITERAL_STRING("{{template}}")
>  
> +void
> +ContentParent::RunNuwaProcess()

Please add main thread assertions here (and, really, in all these other static functions - DelayedNuwaFork, ScheduleDelayedNuwaFork, maybe others?) so that we know we're not racing.

@@ +258,5 @@
> +static CancelableTask* sPreallocateAppProcessTask;
> +// This number is fairly arbitrary ... the intention is to put off
> +// launching another app process until the last one has finished
> +// loading its content, to reduce CPU/memory/IO contention.
> +static int sPreallocateDelayMs;

This is not set to anything so it will end up 0... Is this intentional?

@@ +323,5 @@
> +{
> +    MOZ_ASSERT(NS_IsMainThread());
> +
> +    sNuwaForkWaitTasks.ElementAt(0)->Cancel();
> +    sNuwaForkWaitTasks.RemoveElementAt(0);

Are you sure it's not possible for sNuwaForkWaitTasks to be empty here? Even if the timeout removes an element from this array?
Attachment #796501 - Flags: review?(bent.mozilla) → review+
Comment on attachment 796501 [details] [diff] [review]
Part 5: PContent protocol changes

Review of attachment 796501 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ContentParent.cpp
@@ +927,5 @@
> +    sNuwaForkWaitTasks.RemoveElementAt(0);
> +
> +    // We haven't RecvAddNewProcess() after SendNuwaFork(). Maybe the main
> +    // thread of the Nuwa process is in deadlock.
> +    MOZ_ASSERT(!"Can't fork from the nuwa process.");

Yikes, missed this. Please MOZ_ASSERT(false, "")
Cervantes,

What is the current plan for the pre-allocated process?  Is that still going to be forked from the Nuwa template?  Or will the pre-allocated process be removed in favor of creating the end process directly from the template?

I'm curious because keeping the pre-allocated/pre-forked process would mean we probably want to do some additional work to address bug 892965.  If the pre-allocated process goes away then bug 892965 essentially goes away for free.

By the way, this is awesome work and I can't wait to see this land!

Thanks!
Ben,

If bug 890870 is fixed, it would be more easy to fix bug 892965 by forking every content process from the Nuwa process.  Without bug 890870, Nuwa has a more conservative number (longer) of seconds for waiting the Nuwa process ready.  I think bug 890870 can shorten the waiting time, and is an ideal solution for your bug.
(In reply to Thinker Li [:sinker] from comment #169)
> If bug 890870 is fixed, it would be more easy to fix bug 892965 by forking
> every content process from the Nuwa process.  Without bug 890870, Nuwa has a
> more conservative number (longer) of seconds for waiting the Nuwa process
> ready.  I think bug 890870 can shorten the waiting time, and is an ideal
> solution for your bug.

Hmm, it sounds like bug 890870 would only introduce delay once at device boot time.  Is that correct?

If that is the case then I think its a related, but not quite the same issue.  In this case, it is less likely to affect our test infrastructure because we have a long settling time after booting the device.  There is lots of activity after booting the device which is somewhat non-deterministic from the test infrastructure's point of view.

Or are you saying we can't fork directly from the Nuwa template process ever because we're never sure when its frozen?

Sorry for my confusion.

Thanks!
Right now the Nuwa process freezes after N seconds (I don't remember what N is, but it's not important).  We need to switch that to a deterministic point (that's what bug 890870 is for).  Once that's done (and we fix whatever other bugs we find), we should be able to replace the preallocated app process with forking from the Nuwa process.  That should be much quicker than starting up an entire Gecko process so if bug 892965 is indeed caused by us "missing" the preallocated app process (because it takes too long) this may resolve it.

I can't imagine any reason why we would want to keep a separate preallocated app process that was not forked from the Nuwa process around once we have Nuwa.
Try submission with the patches rebased to the latest m-c: https://tbpl.mozilla.org/?tree=Try&rev=b6c541b63b71

I am fixing regressions after rebase such as:
- stlport is now using jemalloc and causes a deadlock in the Nuwa process
- new IPC protocol needs cloning

I am still testing to check for other broken things.
Attached patch Part 1: the Nuwa API and low-level wrappers. (obsolete) — — Splinter Review
Fixes after rebase:

- The vector uses malloc and free in libc to fix a deadlock.
- Nuwa API Changes:
  NuwaCheckpointCurrentThread() is implemented as a macro instead of a function to place the setjmp() call in the calling function without changing the stack pointer. This is to remove the stack dark art in ContentChild.cpp (part 5) so we are not affected by compiler's aggressive optimizations.
  NuwaSpawnBlock() renamed to NuwaSpawnPrepare() to reflect the fact that it initializes sForkWaitCondChanged to false before requesting the main thread to spawn. This fixes a scheduling-related bug that the IPC thread fails to get blocked and we get an invalid stack after fork().
- place NS_EXPORT before function return types, or the linker can remove NuwaCheckpointCurrentThread1() from libmozglue.so.
Attachment #770738 - Attachment is obsolete: true
Attachment #806701 - Flags: review?(khuey)
Attached patch Part 2: IPC and glue changes. — — Splinter Review
Patch after rebase
Attachment #796485 - Attachment is obsolete: true
Attachment #806706 - Flags: review+
Attachment #806701 - Attachment description: 770738: Part 1: the Nuwa API and low-level wrappers. → Part 1: the Nuwa API and low-level wrappers.
Patch after rebase.
Attachment #795372 - Attachment is obsolete: true
Attachment #806712 - Flags: review+
Patch after rebase. Adds cloning of JavaScriptParent.
Attachment #796490 - Attachment is obsolete: true
Attachment #806716 - Flags: review?(khuey)
Attached patch Part 5: PContent protocol changes (obsolete) — — Splinter Review
Patch after rebase. Changes:
- The dark art of stack reservation and longjmp() is removed. ContentChild.cpp is now more robust against compiler optimizations like inlining or stack layout stuffs.
- Rename of NuwaSpawnBlock() to NuwaSpawnPrepare().
- NS_EXPORT placed before return type.
- Don't remove from sContentParents in MarkAsDead().
- Initialize ContentParent's new data members.
Attachment #796501 - Attachment is obsolete: true
Attachment #806734 - Flags: review?(khuey)
Patch after rebase. Changes:
- Marking of source compressor thread and CycleCollectorRunner is removed.
- NS_DECL_THREADSAFE_ISUPPORTS in thread pool listeners.
Attachment #771296 - Attachment is obsolete: true
Attachment #806737 - Flags: review?(khuey)
Patch after rebase.
Attachment #772016 - Attachment is obsolete: true
Attachment #806739 - Flags: review+
Patch after rebase. The logic of changing thread priority when initializing the binder thread pool is refactored to InitializeBinder().
Attachment #772017 - Attachment is obsolete: true
Attachment #806748 - Flags: review+
(In reply to ben turner [:bent] (needinfo? encouraged) from comment #166)
> Comment on attachment 796501 [details] [diff] [review]
> Part 5: PContent protocol changes
> 
> @@ +1380,5 @@
> > +RunNuwaFork()
> > +{
> > +    jmp_buf env;
> > +
> > +    if (setjmp(env) == 0) {
> 
> I don't understand why we wouldn't use sigsetjmp/siglongjmp. Don't we want
> our signal mask to persist in the forked process?
> 
This dark art is removed in the latest revision.

> 
> @@ +258,5 @@
> > +static CancelableTask* sPreallocateAppProcessTask;
> > +// This number is fairly arbitrary ... the intention is to put off
> > +// launching another app process until the last one has finished
> > +// loading its content, to reduce CPU/memory/IO contention.
> > +static int sPreallocateDelayMs;
> 
> This is not set to anything so it will end up 0... Is this intentional?
> 
It's changed to 1000 ms as the original default. We need to integrate some logic here with PreallocatedProcessManager and we can make it a followup.

> @@ +323,5 @@
> > +{
> > +    MOZ_ASSERT(NS_IsMainThread());
> > +
> > +    sNuwaForkWaitTasks.ElementAt(0)->Cancel();
> > +    sNuwaForkWaitTasks.RemoveElementAt(0);
> 
> Are you sure it's not possible for sNuwaForkWaitTasks to be empty here? Even
> if the timeout removes an element from this array?

We add a check to not undderrun the array. It requires changes to the message loop and the PContent protocol if we want to be 100% precise about the
(In reply to ben turner [:bent] (needinfo? encouraged) from comment #164)
> Comment on attachment 795372 [details] [diff] [review]
> Part 3: IPC glue and IPDL codegen to support protocol cloning.
> 
> Review of attachment 795372 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'm still nervous about a runtime abort when we hit an actor that we can't
> clone... Hopefully the crash reports will be enough to know if we ever mess
> this up!
> 
> r=me with these addressed:
> 
> ::: ipc/glue/ProtocolUtils.cpp
> @@ +27,5 @@
> > +#ifdef DEBUG
> > +  for (const IToplevelProtocol* actor = mOpenActors.getFirst();
> > +       actor;
> > +       actor = actor->getNext()) {
> > +    NS_ASSERTION(actor->GetProtocolId() != aActor->GetProtocolId(),
> 
> I'm not sure that this is correct... Isn't it possible to have multiple
> actors per protocol id? Maybe you should just assert that the same aActor
> pointer doesn't get inserted more than once?
> 
This is top level protocol actors. Currently we see only one instance for each top level actor type, but it should suffice to assert the same actor pointer is not inserted into the list twice.
Comment on attachment 806701 [details] [diff] [review]
Part 1: the Nuwa API and low-level wrappers.

Review of attachment 806701 [details] [diff] [review]:
-----------------------------------------------------------------

Instead of "NS_EXPORT void" or "void NS_EXPORT" do "NS_EXPORT_(void)"

::: mozglue/build/Nuwa.cpp
@@ +88,5 @@
> +struct LibcAllocator: public std::allocator<T>
> +{
> +  LibcAllocator()
> +  {
> +    void * libcHandle = dlopen("libc.so", RTLD_LAZY);

nit: void* (no space)

@@ +92,5 @@
> +    void * libcHandle = dlopen("libc.so", RTLD_LAZY);
> +    mMallocImpl = reinterpret_cast<void*(*)(size_t)>(dlsym(libcHandle, "malloc"));
> +    mFreeImpl = reinterpret_cast<void(*)(void*)>(dlsym(libcHandle, "free"));
> +
> +    if (!(mMallocImpl && mFreeImpl)) {

If this fails, we're going to deadlock, right?  Why not just MOZ_CRASH?
Attachment #806701 - Flags: review?(khuey) → review+
Comment on attachment 806734 [details] [diff] [review]
Part 5: PContent protocol changes

Review of attachment 806734 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ContentChild.cpp
@@ +1538,5 @@
> +extern "C" {
> +
> +#if defined(MOZ_NUWA_PROCESS)
> +NS_EXPORT void
> +GetProtoFdInfos(NuwaProtoFdInfo* aInfoList, size_t size, size_t* aInfoListSize)

bent was talking about renaming 'size' to 'aInfoListSize', not 'aInfoSize'

@@ +1580,5 @@
> +    }
> +
> +    NS_IMETHOD Run()
> +    {
> +      mozilla::dom::ContentChild::GetSingleton()->

nit: 4 space indentation in this function to be consistent.

::: dom/ipc/ContentParent.cpp
@@ +1090,5 @@
> +    }
> +
> +    // We haven't RecvAddNewProcess() after SendNuwaFork(). Maybe the main
> +    // thread of the Nuwa process is in deadlock.
> +    MOZ_ASSERT(!"Can't fork from the nuwa process.");

This should be MOZ_ASSERT(false, "Can't fork ...
Attachment #806734 - Flags: review?(khuey) → review+
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #183)
> Comment on attachment 806701 [details] [diff] [review]
> Part 1: the Nuwa API and low-level wrappers.
> 
> Review of attachment 806701 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Instead of "NS_EXPORT void" or "void NS_EXPORT" do "NS_EXPORT_(void)"

In fact, stuff in mozglue/ should use MFBT_API, not NS_EXPORT.
Jonas, Vivien:  do we need this for 1.2?
Flags: needinfo?(jonas)
Flags: needinfo?(21)
This would be quite risky to take in 1.2.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #187)
> This would be quite risky to take in 1.2.

agree... too risky for 1.2, let's try to make it in 1.3.
blocking-b2g: koi? → 1.3?
We have a breakage when updating to today's m-c because of https://hg.mozilla.org/mozilla-central/rev/f36185787040

We might need to reverse its effect if there is no other concerns in it.
We need the allocation of toplevel protocols to return the actor itself so we can keep track of the open actors in the Nuwa process.
Attachment #810370 - Flags: review?(dzbarsky)
(In reply to Andrew Overholt [:overholt] from comment #186)
> Jonas, Vivien:  do we need this for 1.2?

Nope. For my part I add this as a Haida dependency because of bug 920448. Which would be a nice to have for 1.3 to improve the dialer call screen load time as well as other cases where there are multiple outer windows for the same app (like dialer/contacts), and definitively something needed for 1.4.
Flags: needinfo?(21)
Comment on attachment 810370 [details] [diff] [review]
Part 9: allocation of toplevel protocols should return the actor

Review of attachment 810370 [details] [diff] [review]:
-----------------------------------------------------------------

I wrote that patch because nobody was using the return values and it was cleaner.  If you need them, we should just bring them back. r=me
Attachment #810370 - Flags: review?(dzbarsky) → review+
I agree, this shouldn't block. We've not made any architectural changes that depend on the memory savings here.
Flags: needinfo?(jonas)
Update per comment #183 and rebase to latest.
Attachment #806701 - Attachment is obsolete: true
Attachment #811978 - Flags: review+
Rebase to 8f805d3ef377 and update per bug 901789
Attachment #806712 - Attachment is obsolete: true
Attachment #811980 - Flags: review+
Rebase to 8f805d3ef377 and update per bug 901789
Attachment #806716 - Attachment is obsolete: true
Attachment #811981 - Flags: review+
Rebase to 8f805d3ef377
Attachment #806734 - Attachment is obsolete: true
Attachment #811982 - Flags: review+
Rebase to 8f805d3ef377 and regression fixes.
Attachment #806737 - Attachment is obsolete: true
Attachment #811984 - Flags: review+
Rebase to 8f805d3ef377
Attachment #810370 - Attachment is obsolete: true
Please note that this is disabled by default. Users in interest can turn on this feature by exporting MOZ_NUWA_PROCESS in .userconfig.

Tests on try looks good. I think we are ready to land.
Keywords: checkin-needed
This break ASan builds (tier 1):
https://tbpl.mozilla.org/php/getParsedLog.php?id=28562224&tree=B2g-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=28562321&tree=B2g-Inbound

If we can have a quick (<10mins until checked in) fix, then happy to avoid backing, otherwise sadly we'll need to.
s/backing/backing out/
Also, this urgently needs an automated test. If someone adds a new thread that is active before the fork we need to adjust the nuwa code so make sure this is tested somehow.
Comment on attachment 811980 [details] [diff] [review]
Part 3: IPC glue and IPDL codegen to support protocol cloning.

>Bug 771765: support template content process, part 3: IPC glue and IPDL codegen to support protocol cloning. r=bent.
>
>diff --git a/dom/ipc/Blob.h b/dom/ipc/Blob.h
[...]
>+    // The implementation of function is generated by code generator.
>+    virtual void CloneManagees(ListenerT* aSource,
>+                               ProtocolCloneContext* aCtx) = 0;
[...]
>+    def cloneManagees(self):
>+        return ExprVar('CloneManagees')
>+
[...]
>+        clonemanagees = MethodDefn(MethodDecl(
>+            p.cloneManagees().name,
>+            params=[ Decl(protocolbase, sourcevar.name),
>+                     Decl(clonecontexttype, clonecontextvar.name) ],
>+            virtual=1))


This patch apparently ends up generating code like this:
> void
> PBlobStreamChild::CloneManagees(
>         ProtocolBase* aSource,
>         ProtocolCloneContext* aCtx)
> {
>     PBlobStreamChild* other = static_cast<PBlobStreamChild*>(aSource);
> }
(in $OBJDIR/ipc/ipdl/PBlobStreamChild.cpp)
...which is a no-op function, and which spams a build warning for unused variable 'other'.

Was this function supposed to actually do something?
\o/

Great work Cervantes.

Daniel can you file a followup bug to fix that?
(In reply to Daniel Holbert [:dholbert] from comment #208)
> Comment on attachment 811980 [details] [diff] [review]
> Part 3: IPC glue and IPDL codegen to support protocol cloning.
> 
> >Bug 771765: support template content process, part 3: IPC glue and IPDL codegen to support protocol cloning. r=bent.
> >
> 
> Was this function supposed to actually do something?

This is to clone the protocol tree from the template process. PBlobStream doesn't have managees so this method should be empty. We'll fix that in a followup bug.
(In reply to Andreas Gal :gal from comment #205)
> Also, this urgently needs an automated test. If someone adds a new thread
> that is active before the fork we need to adjust the nuwa code so make sure
> this is tested somehow.

Yes, we are working on automated tests.
Blocks: 922461
Blocks: 922465
Depends on: 925195
Regarding 1.3, can we confirm that Nuwa has landed. If so, how can this be enabled?
(In reply to Mandyam Vikram from comment #212)
> Regarding 1.3, can we confirm that Nuwa has landed. If so, how can this be
> enabled?

It's already in m-c, just not enabled by default. We'll flip the switch to make it enabled by default once the test cases are ready.

You can enable it in your build by adding the following line:
export MOZ_NUWA_PROCESS=1

to your .userconfig and then rebuild gecko.
Lets create a new bug, if it doesn't already exist, for enabling Nuwa by default in v1.3.
Blocks: 930282
Blocks: 928995
No longer depends on: 928995
(In reply to Cervantes Yu from comment #211)
> (In reply to Andreas Gal :gal from comment #205)
> > Also, this urgently needs an automated test. If someone adds a new thread
> > that is active before the fork we need to adjust the nuwa code so make sure
> > this is tested somehow.
> 
> Yes, we are working on automated tests.

So, do we have any update about the automated tests? Thank you.
(In reply to Kevin Hu [:khu] from comment #215)
> So, do we have any update about the automated tests? Thank you.

I have a patch for bug 922465 test can test creating process of nuwa. But if nuwa enabled, current test cases (not just the test case we are going to add) on try server would not be able to terminate correctly, we are still fixing it.
Depends on: 948485
Depends on: 948545
Depends on: 948774
blocking+ for 1.3 - NuWa has been made a commitment for 1.3
blocking-b2g: 1.3? → 1.3+
Depends on: 967967
Depends on: 968297
Depends on: 968991
Depends on: 969224
Depends on: 980750
Depends on: 1149098
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: