1470591 - Add a fork server for launching content processes

Reporter

Description

•

7 years ago

As we spin up more content processes we'd like to improve the amount of shared data across processes. On Linux, read-only portions of the binary such as .text and .rodata can be shared, but portions that must be relocated such as .data.rel.ro cannot. .data.rel.ro accounts for ~4MB of unsharable data. vtables account for a fair amount of this data, and while we do have bugs on file for reducing that amount of vtables, that's a rather tedious process with diminishing returns. Instead I propose implementing a system that loads a minimal content process (essentially just a main loop) that is then used to fork real content processes. This should give us a sizeable memory win for the relocations as well as other possibilities for sharing memory pages marked as copy-on-write. Prior art can be found in Chrome's zygote process [1] as well as our previous attempts of Nuwa for B2G [2]. I'm proposing a less aggressive version of Nuwa in that we would perform the fork before initializing XPCOM and avoid dealing with threading, mutex, polling, etc. We might be able to get larger wins by initializing some of our core libraries such as ICU, NSS, libav, and portions of SpiderMonkey prior to forking. Additionally if we can implement something that works for mac as well we'd see at least a 15MB improvement. I'm filing this in IPC, but it clearly has implications on sandboxing and xpcom as well. [1] https://chromium.googlesource.com/chromium/src/+/master/docs/linux_zygote.md [2] https://wiki.mozilla.org/NuwaTemplateProcess

Eric Rahm [:erahm]

Reporter

Comment 1

•

7 years ago

Jed, when you get a chance can you sketch out some of your thoughts on this?

Flags: needinfo?(jld)

Eric Rahm [:erahm]

Reporter

Comment 2

•

7 years ago

In theory this should be a perf win as there's less initialization required. The Chrome folks measured ~56ms/GHz [1]. [1] https://chromium.googlesource.com/chromium/src/+/master/docs/linux_zygote.md#appendix-a_runtime-impact-of-relocations

Whiteboard: [overhead:>4MB] → [overhead:>4MB][qf]

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 3

•

7 years ago

For a basic proof-of-concept, we should be able to hook in early in main() to check for a command line flag or env var and, without starting threads (or using XPCOM, probably) run a little server that receives packets containing: 1. a list of fds (as SCM_RIGHTS) and a list of destination fds to map them to 2. environment variable settings 3. argv 4. [reserved for future expansion] I think the IPC Pickle / ParamTraits stuff can be safely used to deserialize the data, but the fd passing would have to be hand-written. At the risk of stating the obvious: it then forks, and the child applies the fd mapping (see [1], although the CloseSuperfluousFds is a little unnecessary here) and sets the env vars (setenv is safe, because single-threaded), and continues with the provided argv; the server would send back the pid or error. This server would be launched normally with LaunchApp (maybe lazily the first time it's needed?) and GeckoChildProcessHost::PerformAsyncLaunchInternal would use it instead. On IRC I suggested adding options to LaunchApp, but on further thought I think it makes more sense just to write something specialized. Things that are broken with this: * Sandboxing as it currently exists can work, but at the moment it's factored kind of badly for this — we just want to send down the SandboxFork constructor params, but that's all abstracted inside SandboxLaunch and hidden behind the ForkDelegate abstract class. (Those params are computed by poking at a lot of XPCOM stuff in the parent process; that part needs to stay where it is.) * Sandboxing in the future was (at some point) going to allow launching processes via a setuid helper for distributions like Debian and Arch and RHEL7 that don't allow unprivileged user namespaces by default. Chrome appears to handle this by sandbox-launching the entire zygote, which also means the renderers *start* without filesystem access if I'm reading the code correctly (among other quirks). Not insurmountable, but definitely makes this harder. Alternately, those setups could take the memory overhead of per-process ASLR. * Waiting for processes to exit. On Linux the server could use CLONE_PARENT to create a sibling instead of a child; portably, it could handle it as a second RPC message. (I wouldn't mind throwing out and rewriting the child process watcher code.) * Thread creation at initializer time. This can happen if people follow NVIDIA's advice about multithreaded GL, which isn't needed for Firefox; we could detect that and scrub LD_PRELOAD. In general we'd want to be able to detect this and fall back; I don't know if there's anything more portable than interposing pthread_create. (On Linux there's a trick with the link count of /proc/self/task, but the Tor Browser people want to run with /proc unmounted. On the other hand they might also want to sacrifice memory for per-process ASLR.) TSan also creates extra threads, but we can just turn this all off. * Not exactly broken, but doing a blocking read on the I/O thread to wait for the pid isn't ideal. Making that async or moving it to a dedicated thread would be nice; this is entangled with making the main thread not sync wait to get the pid from the I/O thread. * Mac, maybe. I've heard that fork-without-exec can cause problems involving Mach ports, but I don't understand the details and whether it applies to us / if there's some initialization we could defer to prevent it. (Mac sandboxing doesn't need any magic at launch time.) A thing that is good: * This also means that we're not forking the parent process, which imposes time costs proportional to how much writable private memory it has, which is usually a lot. I wanted to do something about this anyway. (Corollary: that blocking read to get the pid might actually be less jank than forking directly.) The other idea I mentioned on IRC was using mozglue/linker to do the loading and modifying it to use shared memory (or MADV_MERGEABLE?) for the relocated things. That would be ELF-only (and Linux-only with KSM), but it avoids some of the fork-related problems. Also there might be reasons we can't or shouldn't do our own loading on desktop. [1] https://searchfox.org/mozilla-central/rev/93d2b9860b3d341258c7c5dcd4e278dea544432b/ipc/chromium/src/base/process_util_linux.cc#34-54

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Updated

•

7 years ago

Flags: needinfo?(jld)

Nathan Froyd [:froydnj]

Comment 4

•

7 years ago

I'm hearing a lot of talk about Linux (and maybe Mac), but none about Windows...and our platform priorities run in roughly the opposite direction. I guess we would win on...Android?

Kris Maglione [:kmag]

Comment 5

•

7 years ago

(In reply to Nathan Froyd [:froydnj] from comment #4) > I'm hearing a lot of talk about Linux (and maybe Mac), but none about > Windows...and our platform priorities run in roughly the opposite direction. > I guess we would win on...Android? The main issue we're trying to solve here is relocated data not being sharable across processes. That isn't a problem on Windows, because relocations are shared across separate processes. It is a big problem on Linux and OS-X, though, and we can't really ignore it there. Same goes for Android.

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 6

•

7 years ago

Android/GeckoView is… interesting. We're currently launching child processes as Android services, which means that we already have Android Runtime stuff when we're started (so, probably threads), and if we want N content processes we'd have to declare ≥N services in an XML file. At present we support only one content process. It's apparently also possible to use fork/exec, but there's concern that this isn't really supported and whatever we do with that could be arbitrarily broken by OS updates. Also, exec'ing means no Android Runtime, which means no way to get a GL context, which means we'd have to do WebGL remoting, which Chrome (last I heard) does on desktop but *not* on mobile because of the overhead. (This is all secondhand from :snorp; I hope I haven't mangled it too much.)

Daniel Holbert [:dholbert]

Comment 7

•

7 years ago

(At this point, this doesn't sound like it's in the [qf] umbrella, but feel free to renominate with more details if needed. Knee-jerk triage decision: there will be lots work around fission to avoid incurring perf regressions as we increase the number of content processes, and that's all worthwhile work, and we also don't want the [qf] project to scope-creep to encompass all of that work.)

Whiteboard: [overhead:>4MB][qf] → [overhead:>4MB][qf-]

Kris Maglione [:kmag]

Comment 8

•

7 years ago

(In reply to Daniel Holbert [:dholbert] from comment #7) > (At this point, this doesn't sound like it's in the [qf] umbrella, but feel > free to renominate with more details if needed. Knee-jerk triage decision: > there will be lots work around fission to avoid incurring perf regressions > as we increase the number of content processes, and that's all worthwhile > work, and we also don't want the [qf] project to scope-creep to encompass > all of that work.) I don't think this is scope creep. This is a project that benefits both memshrink and qf in unrelated ways. It benefits memshrink by allowing us to share relocated data (and some data touched by static initializers) between child processes. It benefits qf by making it much cheaper/faster to spawn new content processes, and, importantly, moving the janky fork() step from the parent process (where it's user-visible) to the fork server (where it's not).

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 9

•

7 years ago

For what it's worth, moving the fork() to a dedicated server with a minimal amount of private writable data should greatly decrease the amount of jank (and CPU usage), as well as moving it. There are plans (bug 1348361, bug 1461459, bug 1446161) to stop making the main thread block waiting for the I/O thread to finish the launch operation; it may also be possible to move that off the I/O thread so it doesn't block IPC message passing either. But it's a little more complicated. Profiling on Linux, I'm seeing a gap in samples from the parent process main thread in LaunchSubprocess, flanked by pthread_cond_wait blocking on the I/O thread. I'd understand that if I were profiling the I/O thread as well, because it blocks SIGPROF in order to ensure it can make progress on forking and I believe that will hang the entire profiler for the duration… but I'm not doing that. So this suggests that the entire process gets suspended (either explicitly or as a side effect of blocking in page faults) in order to remove write permissions and do TLB shootdown. In any case, I'm seeing 11ms of jank there in a test profile, and it would probably be more in a heavily used browser, and offloading the fork() to another process is the only real solution. Also, the parent process is going to take an ongoing performance hit as it incurs page faults to flip the momentarily copy-on-write memory back to writeable. I've observed this with perf(1) but I don't have numbers at the moment; I remember the total time was on the same order of magnitude as the fork itself. On Mac the situation is different: we're using posix_spawn, which in theory doesn't need to do anything like fork() and can just create the new process /de novo/, but I haven't tried profiling it yet. tl;dr: this is a jank problem on Linux (and async launch probably won't help), it may not be on Mac but there's no data yet, and Windows is out of scope for this bug (see comment #5).

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Updated

•

7 years ago

Blocks: 1474681

Kris Maglione [:kmag]

Updated

•

7 years ago

Depends on: 1440207

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 10

•

7 years ago

glandium points out in bug 1480401 that we may need SandboxFork to call pthread_atfork handers to use it like this. The fork server will definitely be single-threaded (unless we're using TSan, but in that case sandboxing is disabled and the real fork() will always be used), so the usual problems with multithreaded fork don't apply, but there might be something.

Jim Mathies [:jimm]

Updated

•

6 years ago

Priority: -- → P3

Jim Mathies [:jimm]

Updated

•

6 years ago

Priority: P3 → P2

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Updated

•

6 years ago

Updated

•

6 years ago

Updated

•

6 years ago

Blocks: 1534011

Neha Kochar [:neha]

Updated

•

6 years ago

Fission Milestone: --- → M2

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Updated

•

6 years ago

Comment 11

•

6 years ago

Eric, this is targeting M2, can you please add an assignee and also update the status of this?

Flags: needinfo?(erahm)

Eric Rahm [:erahm]

Reporter

Comment 12

•

6 years ago

(In reply to Neha Kochar [:neha] from comment #11)

Eric, this is targeting M2, can you please add an assignee and also update the status of this?

Jed's going to be looking at this. The last time we checked in a Linux PoC was still on target for M2.

Assignee: nobody → jld

Flags: needinfo?(erahm)

Neha Kochar [:neha]

Updated

•

6 years ago

Fission Milestone: M2 → M3

Neha Kochar [:neha]

Updated

•

6 years ago

Fission Milestone: M3 → M4

Thinker Li [:sinker]

Assignee

Comment 13

•

6 years ago

In comment #9, jld mentioned about page faults in the parent process. Why do we care it? Firefox creates a fixed number of processes, it doesn't create processes from time to time. Do you try to improve launch time of the browser too?

For comment#10, some libraries may create threads when it is loading. One case we had during B2G time, a graphic driver on a device creates a thread for unknown purposes. The simplest solution is to postpone its loading.

Boris Zbarsky [:bzbarsky]

Comment 14

•

6 years ago

it doesn't create processes from time to time.

It will with process-per-origin, which is a big driver for these changes.

Thinker Li [:sinker]

Assignee

Comment 15

•

6 years ago

Attached patch forkserver-WIP.diff (obsolete) — Details — Splinter Review

This a workable WIP.

The basic idea is running a single thread fork server at the head of |content_process_main()| to receive fork request from the chrome process. In the chrome process, |fork()| is replaced by a sync IPC call to the fork server, for the process type of |GeckoProcessType_Content|. Once the fork server receive the request, it forks a new process, and the child process will leave the fork server returning back to |content_process_main()| to continue the remaining code after it. Then, everything will run as a normal content process without the fork server.

There are a lot of details to deal with. They includes sandbox, the initialization of XPCOM, and shutting down the message loop on the main thread used by the fork server. Single thread also cause some problems. Trying to run IO loop on the main thread run into some cases that is not considered by current implementation of IPC.

The reason of single thread is to reduce memory usage as much as possible, and also avoid memory leaking caused by abandoned threads after fork.

Thinker Li [:sinker]

Assignee

Comment 16

•

6 years ago

The parameters, options, to create a new content process are passed to the fork server as a part of IPC. They includes args, environment variables, and file descriptors. At the fork server side, it will initialize environment variables, shuffle file descriptors, and replace args with the values from IPC.

Thinker Li [:sinker]

Assignee

Comment 17

•

6 years ago

According to smaps of the content processes, they share about 6MB, the major contribution is from .data.rel.ro & .got of libxul.

Thinker Li [:sinker]

Assignee

Comment 18

•

6 years ago

Attached patch forkserver-wip2.diff (obsolete) — Details — Splinter Review

Changes:

change binary type of |forkserver| to |Self|,
support forkserver for both |Self| & |PluginContainer|,
move --enable-forkserver from old-configure to moz.configure,
fix the problem repeating the same log message caused by the buffer of |stdout| & |stderr|,
fix the problem that deadlock detector loses information of instances of BlockingResourceBase created in the forkserver process, and
some minor bugs.

Attachment #9077833 - Attachment is obsolete: true

Eric Rahm [:erahm]

Reporter

Comment 19

•

6 years ago

This is awesome Thinker! A 6MB improvement is even better than what I was hoping for. It looks like you're making good progress on your proof of concept. At this point it might be a good idea to get some high level design feedback from kmag and jld before we get too stuck with any architectural decisions. I believe the main points they want to address are making process launching async and how we handle file descriptor sharing, but I'll let them speak more to that.

Kris & Jed, would you mind taking a moment to provide some high level feedback (I don't think we need to do a full on code review yet)?

Flags: needinfo?(kmaglione+bmo)

Flags: needinfo?(jld)

Thinker Li [:sinker]

Assignee

Comment 20

•

6 years ago

Just a bit explanation to make it easier to understand,

The fork() in base::LaunchApp() for linux is replaced by a sync IPC call,
base::LaunchApp() is supposed to be running at |IPC Launch| thread,
- AFAIK, |IPC Launch| is used only for forking new processes, so it is async for other threads.

At the fork server,

the main thread is used for both handling IO and the actor of PForkService,
there is no additional IO thread to avoid additional overhead and memory leaking caused by a thread,
- resources allocated for IO thread may leak and be difficult to recover after fork.

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 21

•

6 years ago

Attached patch fork-server-partial-wip-2019071600.diff — Details — Splinter Review

I started looking at this back in April but I didn't get very far towards something that would be testable (or buildable); I wrote some code for serialization/deserialization and most of the client side, but not the server side. I thought I'd attach it in case it's useful. A few notes:

My plan was to use nothing from IPC but the Pickle class. No IPDL, no actors, no event loops or runnables, no XPCOM; just sockets and SCM_RIGHTS used directly. Ideally, as little happens in the fork server itself as possible.
Also no hard dependencies on anything Linux-specific, because we'll want to use this on Mac if at all possible, and support for the Tier-3 Unixes would also be nice.
I have an answer for dealing with the Linux sandbox (serialization for the fork delegate and delaying the chroot server setup) but it seems a little inelegant; I have some FIXME comments about that.
The message to the fork server passes a socket that the forked/cloned child uses to read most of the serialized data, so the fork server itself handles only a fixed-size message and can respond immediately with the pid. The fds to remap should also be passed that way, I think; what I sketched out here (passing them in the initial message) means there's a fixed limit on the number of fds and the fork server has to care about closing its unnecessary copies. Kris might have some ideas about this.
I'd hoped to have the fork server report when and how a process exits, as part of cleaning up how we handle process termination, but I don't have a clear picture of how to integrate that yet, and it's not strictly necessary to solve this problem generally. The requirements we currently have, as I understand it, are that opt builds will kill a process if it hasn't exited within a timeout and debug builds wait with no timeout (so that refcount / leak logging can finish).

Thinker Li [:sinker]

Assignee

Comment 22

•

6 years ago

Some points about linux specific

There is actually no/or few linux specific in current implementation for linux.

Our process_util_linux.cc is what depends on Linux, but actually it works for all UNIX like systems as far as I can see. It implements fork-exec model, fd shuffling, signal handling. Nothing is special just for Linux.

What Mac is different is using posix_spawn() instead of fork(). But, apparently, we need fork() to make sense. So, when it comes to the fork server, it should stop using posix_spawn() and start using fork(). That means what is implemented by process_util_linux.cc.

For the fork sever, it is apparently using fork-exec model. Basing on this, so far I don't seen much problem to work with linux and mac, and other UNIX-like systems since they all work with fork-exec model and APIs.

Thinker Li [:sinker]

Assignee

Comment 23

•

6 years ago

I have implemented a new one without using message loop, ipc channel ... etc. However, one of my concerns is to duplicate a bunch of logic from existing IPC code that have been improved and fixed with a long history. Now, we have a new copy for the same purpose. The saying is more code more bugs.

Thinker Li [:sinker]

Assignee

Comment 24

•

6 years ago

Attached patch forkserver-wip3.diff (obsolete) — Details — Splinter Review

Remove dependencies of IPC::Channel, MessageChannel, MessageLoop, ...etc.

Handle transmissions of IPC messages with custom code.

Attachment #9078239 - Attachment is obsolete: true

Thinker Li [:sinker]

Assignee

Comment 25

•

6 years ago

Attached patch forkserver-wip3.diff (obsolete) — Details — Splinter Review

The previous one missed some files.

Attachment #9081627 - Attachment is obsolete: true

Thinker Li [:sinker]

Assignee

Comment 26

•

5 years ago

Comment on attachment 9081675 [details] [diff] [review] forkserver-wip3.diff Could you check if the design is what you want? Basically, there is no IPDL any more. MiniTransceiver handles all messages from/to the fork server.

Attachment #9081675 - Flags: feedback?(jld)

[:fabrice] Fabrice Desré

Comment 27

•

5 years ago

Any estimate about when someone will have time to look at this patch? Thanks!

Eric Rahm [:erahm]

Reporter

Updated

•

5 years ago

Attachment #9081675 - Flags: feedback?(kmaglione+bmo)

Thinker Li [:sinker]

Assignee

Comment 28

•

5 years ago

Comment on attachment 9081675 [details] [diff] [review] forkserver-wip3.diff Hi Gabriele, Could you check the patch and give me some feedbacks. This patch has implemented a tiny IPC protocol without IPDL, but with parcels/Messages and a tiny transport, MiniTransceiver, to handle IPC. It is implemented with single thread in mind in the fork server process, seeing class ForkServer, and blocking IPC at IPC Launch thread in the parent process, seeing class ForkServiceChild & GeckoChildProcessHost.

Attachment #9081675 - Flags: feedback?(jld) → feedback?(gsvelto)

Gabriele Svelto [:gsvelto]

Comment 29

•

5 years ago

(In reply to Thinker Li [:sinker] from comment #28)

Could you check the patch and give me some feedbacks.
This patch has implemented a tiny IPC protocol without IPDL, but with
parcels/Messages and a tiny transport, MiniTransceiver, to handle IPC.
It is implemented with single thread in mind in the fork server process,
seeing class ForkServer, and blocking IPC at IPC Launch thread in the parent
process, seeing class ForkServiceChild & GeckoChildProcessHost.

Hi Thinker,
I've rebased your patch on top of a current mozilla-central and it's crashing on startup so I fear I might have gotten something wrong in the process. Specifically it asserts somewhere in IPC where a boolean is set to a value that is neither '1' nor '0' during the call to mozilla::ipc::ForkServiceChild::SendForkNewSubprocess(). Anyway I've spent some time going through the code and from a birds eye view it looks sound to me.

I don't know exactly what :jld expected for the IPC part but your implementation is straightforward and it seems more than sufficient to handle the case at hand. To be frank I found it easier to read than our regular IPC but maybe that's just me.

As for the forking procedure I couldn't spot anything Linux-specific either and the logic seems sound, however the devil is in the details. For macOS for example one of the reasons we've moved away from fork() and switched to posix_spawn() is that the implementation seems to allocate memory when called. Since we're hooking up jemalloc at startup and tearing it down at shutdown then calling fork() too late on macOS would lead to crashes; see bug 1376567 comment 3 for an example of this. I wouldn't worry about that for now but keep in the back of your head that if we would share the code with macOS then you must be prepared to deal with these issues.

There's also some aspects of your patch I'd like to see clarified. For example I don't understand the change to void IPDLParamTraits<FileDescriptor>::Write(IPC::Message* aMsg, IProtocol* aActor, const FileDescriptor& aParam). What is it for?

Also there's quite a bit of code duplication around process startup; I'd like to see that removed and there's a few functions that are too large for my tastes like MiniTransceiver::Send(), MiniTransceiver::Recv() and ForkServer::OnMessageReceived(). Splitting those up would improve readability.

If you make this work again on top of a recent mozilla-central I'd say this is basically ready for review. I guess :jld will want to chime in for the IPC part so I'll try pinging him today.

BTW this is so much simpler to read and deal with than Nuwa! If you can make it work I'll make sure this becomes the default on as many platforms as possible.

Thinker Li [:sinker]

Assignee

Comment 30

•

5 years ago

Hi Gabriele,
Thank you for your feedback. I am making changes according your comments.
For the crashing, I can not reproduce it on the top of mozilla-central (changeset 0a0112c2cad4).
Could you show me your mozconfig and how you start it? It could be even better if you tell me where the assertion is.

Eric Rahm [:erahm]

Reporter

Updated

•

5 years ago

Attachment #9081675 - Flags: feedback?(kmaglione+bmo)

Eric Rahm [:erahm]

Reporter

Updated

•

5 years ago

Flags: needinfo?(kmaglione+bmo)

Flags: needinfo?(jld)

Flags: needinfo?(gsvelto)

Gabriele Svelto [:gsvelto]

Updated

•

5 years ago

Flags: needinfo?(gsvelto)

Attachment #9081675 - Flags: feedback?(gsvelto) → feedback+

Gabriele Svelto [:gsvelto]

Comment 31

•

5 years ago

Attached file stack_trace.txt (obsolete) — Details

This is the stack trace I'm getting. Since attachment 9081675 [details] [diff] [review] did not apply cleanly it's possible that I've done something wrong while rebasing it. I've got these options enabled in my .mozconfig:

ac_add_options --enable-debug
ac_add_options --enable-crashreporter
ac_add_options --enable-forkserver
ac_add_options --enable-application=browser
ac_add_options --enable-tests
mk_add_options MOZ_TELEMETRY_REPORTING=1

Thinker Li [:sinker]

Assignee

Comment 32

•

5 years ago

Attached patch forkserver.-wip4.diff — Details — Splinter Review

I have made changes according feedbacks, and rebased the patch to the latest m-c.
I don't get the error that Gabriele got, so far.

Gabriele, could you try this patch to see if it works.

Attachment #9081675 - Attachment is obsolete: true

Attachment #9091158 - Flags: feedback?(gsvelto)

Gabriele Svelto [:gsvelto]

Comment 33

•

5 years ago

I'm testing with your new patch, will give feedback ASAP.

Gabriele Svelto [:gsvelto]

Updated

•

5 years ago

Attachment #9090669 - Attachment is obsolete: true

Gabriele Svelto [:gsvelto]

Comment 34

•

5 years ago

Comment on attachment 9091158 [details] [diff] [review] forkserver.-wip4.diff This is looking good overall, I think it's time to move this to Phabricator for a proper review. I found only one issue in the patch which you might want to address and it's that if a content processes crashes the associated tab gets stuck. I suppose it must be a problem with IPC as it happens both with and without the crash reporter enabled. To repro just browse to a regular page and then to about:crashcontent. The page will turn white and show the waiting spinner but will never reach the "This tab has crashed" page.

Attachment #9091158 - Flags: feedback?(gsvelto) → feedback+

Thinker Li [:sinker]

Assignee

Comment 35

•

5 years ago

Attached patch forkserver-wip5.diff — Details — Splinter Review

Fix the bug that the parent process kills the forkserver process.

Gabriele Svelto [:gsvelto]

Comment 36

•

5 years ago

Thinker, the patch is in good shape, can you upload it on Phabricator for review? I'm eager to get this in mozilla-central so people can start testing it.

Flags: needinfo?(thinker.li)

Thinker Li [:sinker]

Assignee

Comment 37

•

5 years ago

Attached file Bug 1470591 - Part 1: Add a new process type for ForkServer. (obsolete) — Details

Thinker Li [:sinker]

Assignee

Comment 38

•

5 years ago

Attached file Bug 1470591 - Part 1: Add a new process type for ForkServer. (obsolete) — Details

Thinker Li [:sinker]

Assignee

Comment 39

•

5 years ago

Attached file Bug 1470591 - Part 1: Add a new process type for ForkServer. r=gsvelto — Details

Thinker Li [:sinker]

Assignee

Comment 40

•

5 years ago

Attached file Bug 1470591 - Part 2: Provide methods to recreate a delegated forker. r=gsvelto — Details

With a fork server, the parameters to fork a new content process are
passed through a socket. This patch does following tasks to adapt
sandbox to work with a fork server,

passing a FD of a chroot server,
passing flags of SandboxFork, and
setting LaunchOptions and its fork_delegate field at a fork server.

Depends on D46878

Thinker Li [:sinker]

Assignee

Comment 41

•

5 years ago

Attached file Bug 1470591 - Part 3: AppForkBuilder to ceate a new content process. r=gsvelto — Details

An instance of AppForkBuilder creates a new content process from
the passed args and LaunchOptions. It bascally does the same thing as
LaunchApp() for Linux, but it divides the procedure to two parts,

the 1st part forking a new process, and
the 2nd part initializing FDs, ENV, and message loops.

Going two parts gives fork servers a chance to clean new processes
before the initialization and running WEB content. For example, to
clean sensitive data from memory.

Depends on D46879

Thinker Li [:sinker]

Assignee

Comment 42

•

5 years ago

Attached file Bug 1470591 - Part 4: MiniTransceiver to do single-tasking IPC. r=gsvelto — Details

MiniTransceiver is a simple request-reponse transport, always waiting
for a response from the server before sending next request. The
requests are always initiated by the client.

Depends on D46880

Thinker Li [:sinker]

Assignee

Comment 43

•

5 years ago

Attached file Bug 1470591 - Part 5: ForkServer to create new processes. r=gsvelto — Details

Class ForkServer and class ForkServiceChild are implemented. The
chrome process can ask the fork server process to create content
processes. The requests are sent by MiniTransceiver over a socket.
The fork server replys with the process IDs/handles of created
processes.

LaunchOptions::use_forkserver is a boolean. With use_forkserver being
true, the chrome process sends a request to the fork server instead of
forking directly.

Depends on D46881

Thinker Li [:sinker]

Assignee

Comment 44

•

5 years ago

Attached file Bug 1470591 - Part 6: Create a fork server process. r=gsvelto — Details

This patch make changes of Gecko infrastrutures to run a fork server
process.

ForkServerLauncher is a component, which creates a fork server
process at XPCOM startup.
nsBrowserApp.cpp and related files have been chagned to start a
fork server in a process.
Logging and nsTraceRefcnt were changed to make it work with the
fork server.

Depends on D46883

Thinker Li [:sinker]

Assignee

Comment 45

•

5 years ago

Attached file Bug 1470591 - Part 7: Enable fork server by default. (obsolete) — Details

Depends on D46884

Thinker Li [:sinker]

Assignee

Updated

•

5 years ago

Attachment #9094777 - Attachment is obsolete: true

Flags: needinfo?(thinker.li)

Thinker Li [:sinker]

Assignee

Updated

•

5 years ago

Attachment #9094778 - Attachment is obsolete: true

Thinker Li [:sinker]

Assignee

Comment 46

•

5 years ago

Gabriele, would you mind to suggest reviewers?

Phabricator Automation

Updated

•

5 years ago

Attachment #9094787 - Attachment is obsolete: true

Thinker Li [:sinker]

Assignee

Comment 47

•

5 years ago

Gabriele, Would you mind to suggest reviewers?

Flags: needinfo?(gsvelto)

Gabriele Svelto [:gsvelto]

Comment 48

•

5 years ago

Part 1: I can do that
Part 2: :gcp or :jld
Part 3/4/5/6: An IPC peer https://wiki.mozilla.org/Modules/All#IPC

Flags: needinfo?(gsvelto)

Gabriele Svelto [:gsvelto]

Updated

•

5 years ago

Assignee: jld → thinker.li

Status: NEW → ASSIGNED

[:fabrice] Fabrice Desré

Comment 49

•

5 years ago

Hi Jed, when do you think you will have time to review?

Flags: needinfo?(jld)

Gian-Carlo Pascutto [:gcp]

Comment 50

•

5 years ago

Jed's KO, I'm trying to find another reviewer who's qualified. We're really sorry for the latency here.

Flags: needinfo?(jld)

Andrew McCreight [:mccr8]

Comment 51

•

5 years ago

If Jed's not available, unfortunately that this point your best bet might be waiting yet another week for Nathan to get back, and hope he has the cycles to spare. I'm not very familiar with the low-level parts of IPC. If Nathan gets back and he doesn't think he'll have time, then I can try to set aside some time to work on this. Sorry again for the delays here. Unfortunately this is not an area of the code where we have a ton of people available, which usually works okay because it doesn't have to change much.

Eric Rahm [:erahm]

Reporter

Comment 52

•

5 years ago

I discussed this with Gabriele, we think his r+ should be enough to land this with the following plan:

Land with enabling off by default
As a follow-up add a pref for enabling / disabling this at runtime if possible
Pref on by default

This lets us get the code in-tree and then gives us an escape hatch if we see any issues once it makes it to beta/release. It also gives us a fair amount of time on Nightly to evaluate the effectiveness and land any patches to handle feedback we may eventually get.

Thinker, does that sound like a reasonable plan?

Flags: needinfo?(thinker.li)

Eric Rahm [:erahm]

Reporter

Comment 53

•

5 years ago

Gabriele, can you take care of the final reviews to help move this along?

Flags: needinfo?(gsvelto)

Gabriele Svelto [:gsvelto]

Comment 54

•

5 years ago

Sure, I'll be on it tomorrow morning.

Flags: needinfo?(gsvelto)

Thinker Li [:sinker]

Assignee

Comment 55

•

5 years ago

Eric,
Sure! I agree.

Flags: needinfo?(thinker.li)

Neha Kochar [:neha]

Updated

•

5 years ago

Fission Milestone: M4 → M5

Chris Peterson [:cpeterson]

Comment 56

•

5 years ago

M6 because the fork server doesn't block dogfooding (M5).

Fission Milestone: M5 → M6

Phabricator Automation

Updated

•

5 years ago

Attachment #9094780 - Attachment description: Bug 1470591 - Part 2: Provide methods to recreate a delegated forker. → Bug 1470591 - Part 2: Provide methods to recreate a delegated forker. r=gsvelto

Gabriele Svelto [:gsvelto]

Comment 57

•

5 years ago

We're good to land but since we're during the soft code freeze we should probably wait until next monday. The forkserver is off by default so that's not a risk but the patch is large so it might confuse people during merges and the like. The good thing is that we should be able to turn it on quickly after we land it since we'll be at the beginning of a new cycle.

[:fabrice] Fabrice Desré

Comment 58

•

5 years ago

Thanks Gabriele!

Phabricator Automation

Updated

•

5 years ago

Attachment #9094779 - Attachment description: Bug 1470591 - Part 1: Add a new process type for ForkServer. → Bug 1470591 - Part 1: Add a new process type for ForkServer. r=gsvelto

Phabricator Automation

Updated

•

5 years ago

Attachment #9094781 - Attachment description: Bug 1470591 - Part 3: AppForkBuilder to ceate a new content process. → Bug 1470591 - Part 3: AppForkBuilder to ceate a new content process. r=gsvelto

Phabricator Automation

Updated

•

5 years ago

Attachment #9094782 - Attachment description: Bug 1470591 - Part 4: MiniTransceiver to do single-tasking IPC. → Bug 1470591 - Part 4: MiniTransceiver to do single-tasking IPC. gsvelto

Phabricator Automation

Updated

•

5 years ago

Attachment #9094785 - Attachment description: Bug 1470591 - Part 5: ForkServer to create new processes. → Bug 1470591 - Part 5: ForkServer to create new processes. r=gsvelto

Phabricator Automation

Updated

•

5 years ago

Attachment #9094786 - Attachment description: Bug 1470591 - Part 6: Create a fork server process. → Bug 1470591 - Part 6: Create a fork server process. r=gsvelto

Phabricator Automation

Updated

•

5 years ago

Attachment #9094782 - Attachment description: Bug 1470591 - Part 4: MiniTransceiver to do single-tasking IPC. gsvelto → Bug 1470591 - Part 4: MiniTransceiver to do single-tasking IPC. r=gsvelto

Gabriele Svelto [:gsvelto]

Comment 59

•

5 years ago

The code freeze is over, land at will Thinker!

Gabriele Svelto [:gsvelto]

Comment 60

•

5 years ago

I tried re-basing and landing your patches today but I have an issue on Lando. The patches are blocked because the author is not set correctly apparently. We have an entry in the wiki regarding this problem; can you have a look and update the patches with the correct author information? I'm eager to get this landed :-)

Flags: needinfo?(thinker.li)

Thinker Li [:sinker]

Assignee

Comment 61

•

5 years ago

Just updated!

Flags: needinfo?(thinker.li)

Thinker Li [:sinker]

Assignee

Comment 62

•

5 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5c9fdd51a673966d9e3e3fc02df659272fda5732

Pulsebot

Comment 63

•

5 years ago

Pushed by btara@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a10772f780f7 Part 1: Add a new process type for ForkServer. r=gsvelto https://hg.mozilla.org/integration/autoland/rev/ca1b804d404a Part 2: Provide methods to recreate a delegated forker. r=gsvelto https://hg.mozilla.org/integration/autoland/rev/daad4d736ec0 Part 3: AppForkBuilder to ceate a new content process. r=gsvelto https://hg.mozilla.org/integration/autoland/rev/cbac2d7dfe42 Part 4: MiniTransceiver to do single-tasking IPC. r=gsvelto https://hg.mozilla.org/integration/autoland/rev/f80db6e63169 Part 5: ForkServer to create new processes. r=gsvelto https://hg.mozilla.org/integration/autoland/rev/3ca19f8f388e Part 6: Create a fork server process. r=gsvelto

Bogdan Tara[:bogdan_tara | bogdant]

Comment 64

•

5 years ago

Backed out 6 changesets (Bug 1470591) for test_punycodeURIs & test_nsIProcess* crashes

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=android%2Cx3&fromchange=3c08edf74d039af79f9daad8ff5b57ffb64fdab6&tochange=09111adf1bd1502668e50d0983afc0bc97b99694&selectedJob=279428789

Backout link: https://hg.mozilla.org/integration/autoland/rev/09111adf1bd1502668e50d0983afc0bc97b99694

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=279428789&repo=autoland&lineNumber=4123

[task 2019-12-03T22:18:33.850Z] 22:18:33 [task 2019-12-03T22:19:05.914Z] 22:19:05 [task 2019-12-03T22:19:06.239Z] 22:19:06 [task 2019-12-03T22:19:08.952Z] 22:19:08 [task 2019-12-03T22:19:11.313Z] 22:19:11 [task 2019-12-03T22:19:11.313Z] 22:19:11 [task 2019-12-03T22:19:11.314Z] 22:19:11 [task 2019-12-03T22:19:11.314Z] 22:19:11 [task 2019-12-03T22:19:11.314Z] 22:19:11 [task 2019-12-03T22:19:11.314Z] 22:19:11 [task 2019-12-03T22:19:11.319Z] 22:19:11 [task 2019-12-03T22:19:11.319Z] 22:19:11 [task 2019-12-03T22:19:11.319Z] 22:19:11 [task 2019-12-03T22:19:11.319Z] 22:19:11 [task 2019-12-03T22:19:11.319Z] 22:19:11 [task 2019-12-03T22:19:11.319Z] 22:19:11 [task 2019-12-03T22:19:11.319Z] 22:19:11 [task 2019-12-03T22:19:11.319Z] 22:19:11 [task 2019-12-03T22:19:11.319Z] 22:19:11 [task 2019-12-03T22:19:11.319Z] 22:19:11 [task 2019-12-03T22:19:11.319Z] 22:19:11 [task 2019-12-03T22:19:11.319Z] 22:19:11 [task 2019-12-03T22:19:11.319Z] 22:19:11 [task 2019-12-03T22:19:11.319Z] 22:19:11 [task 2019-12-03T22:19:11.319Z] 22:19:11 [task 2019-12-03T22:19:11.319Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.320Z] 22:19:11 [task 2019-12-03T22:19:11.321Z] 22:19:11 [task 2019-12-03T22:19:11.321Z] 22:19:11 [task 2019-12-03T22:19:11.321Z] 22:19:11 [task 2019-12-03T22:19:11.321Z] 22:19:11 [task 2019-12-03T22:19:11.321Z] 22:19:11 [task 2019-12-03T22:19:11.321Z] 22:19:11 [task 2019-12-03T22:19:11.321Z] 22:19:11 [task 2019-12-03T22:19:11.321Z] 22:19:11 [task 2019-12-03T22:19:11.322Z] 22:19:11 [task 2019-12-03T22:19:11.322Z] 22:19:11 [task 2019-12-03T22:19:11.322Z] 22:19:11 [task 2019-12-03T22:19:11.322Z] 22:19:11 [task 2019-12-03T22:19:11.322Z] 22:19:11 [task 2019-12-03T22:19:11.322Z] 22:19:11 [task 2019-12-03T22:19:11.322Z] 22:19:11 [task 2019-12-03T22:19:11.322Z] 22:19:11 [task 2019-12-03T22:19:11.322Z] 22:19:11 [task 2019-12-03T22:19:11.322Z] 22:19:11 [task 2019-12-03T22:19:11.322Z] 22:19:11 [task 2019-12-03T22:19:11.323Z] 22:19:11 [task 2019-12-03T22:19:11.323Z] 22:19:11 [task 2019-12-03T22:19:11.323Z] 22:19:11 [task 2019-12-03T22:19:11.323Z] 22:19:11 [task 2019-12-03T22:19:11.323Z] 22:19:11 [task 2019-12-03T22:19:11.323Z] 22:19:11 [task 2019-12-03T22:19:11.323Z] 22:19:11 [task 2019-12-03T22:19:11.323Z] 22:19:11 [task 2019-12-03T22:19:11.323Z] 22:19:11 [task 2019-12-03T22:19:11.323Z] 22:19:11 [task 2019-12-03T22:19:11.323Z] 22:19:11 [task 2019-12-03T22:19:11.324Z] 22:19:11 [task 2019-12-03T22:19:11.324Z] 22:19:11 [task 2019-12-03T22:19:11.324Z] 22:19:11 [task 2019-12-03T22:19:11.324Z] 22:19:11 [task 2019-12-03T22:19:11.324Z] 22:19:11 [task 2019-12-03T22:19:11.324Z] 22:19:11 [task 2019-12-03T22:19:11.324Z] 22:19:11 [task 2019-12-03T22:19:11.324Z] 22:19:11 [task 2019-12-03T22:19:11.324Z] 22:19:11 [task 2019-12-03T22:19:11.324Z] 22:19:11 ..... INFO - TEST-START | uriloader/exthandler/tests/unit/test_punycodeURIs.js
INFO - TEST-FAIL | uriloader/exthandler/tests/unit/test_punycodeURIs.js | took 32064ms
INFO - mozcrash Downloading symbols from: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/NXVzVJMWTJmTQTfhjq7TAw/artifacts/public/build/target.crashreporter-symbols.zip
INFO - mozcrash Copy/paste: /builds/worker/workspace/build/linux64-minidump_stackwalk /tmp/tmpO8MNEO/3a4f1864-4a89-3442-d0b5-9c40b46d762a.dmp /tmp/tmpJKX2OI
INFO - mozcrash Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/3a4f1864-4a89-3442-d0b5-9c40b46d762a.dmp
INFO - mozcrash Saved app info as /builds/worker/workspace/build/blobber_upload_dir/3a4f1864-4a89-3442-d0b5-9c40b46d762a.extra
WARNING - PROCESS-CRASH | uriloader/exthandler/tests/unit/test_punycodeURIs.js | application crashed [@ base::LaunchApp(std::__ndk1::vector<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::allocator<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > > const&, base::LaunchOptions const&, int*)]
INFO - Crash dump filename: /tmp/tmpO8MNEO/3a4f1864-4a89-3442-d0b5-9c40b46d762a.dmp
INFO - Operating system: Android
INFO - 0.0.0 Linux 3.10.0+ #260 SMP PREEMPT Fri May 19 12:48:14 PDT 2017 x86_64
INFO - CPU: amd64
INFO - family 6 model 6 stepping 3
INFO - 4 CPUs
INFO - GPU: UNKNOWN
INFO - Crash reason: SIGSEGV /SEGV_MAPERR
INFO - Crash address: 0x0
INFO - Process uptime: not available
INFO - Thread 0 (crashed)
INFO - 0 libxul.so!base::LaunchApp(std::__ndk1::vector<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::allocator<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > > const&, base::LaunchOptions const&, int*) [process_util_linux.cc:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 268 + 0xa]
INFO - rax = 0x00007e4e0caaf498 rdx = 0x00007fffc3f0aae1
INFO - rcx = 0x0000000000000000 rbx = 0x00007fffc3f0aa70
INFO - rsi = 0x00007fffc3f0a1b0 rdi = 0x0000000000000000
INFO - rbp = 0x00007fffc3f0aba0 rsp = 0x00007fffc3f0aa00
INFO - r8 = 0x000000000000004b r9 = 0x00000000ffffffb5
INFO - r10 = 0x00007fffc3f0a230 r11 = 0x00007fffc3f0a218
INFO - r12 = 0x00007fffc3f0aa70 r13 = 0x00007fffc3f0aa08
INFO - r14 = 0x0000000000000018 r15 = 0x00007fffc3f0aa30
INFO - rip = 0x00007e4e0e2c902f
INFO - Found by: given as instruction pointer in context
INFO - 1 libxul.so!nsProcess::RunProcess(bool, char**, nsIObserver*, bool, bool) [nsProcessCommon.cpp:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 522 + 0x5]
INFO - rbp = 0x00007fffc3f0ac60 rsp = 0x00007fffc3f0abb0
INFO - rip = 0x00007e4e0df5f828
INFO - Found by: previous frame's frame pointer
INFO - 2 libxul.so!nsProcess::CopyArgsAndRunProcess(bool, char const**, unsigned int, nsIObserver*, bool) [nsProcessCommon.cpp:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 357 + 0x18]
INFO - rbp = 0x00007fffc3f0acb0 rsp = 0x00007fffc3f0ac70
INFO - rip = 0x00007e4e0df5f6bd
INFO - Found by: previous frame's frame pointer
INFO - 3 libxul.so!nsLocalHandlerApp::LaunchWithURI(nsIURI*, nsIInterfaceRequestor*) [nsLocalHandlerApp.cpp:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 97 + 0x65]
INFO - rbp = 0x00007fffc3f0ad60 rsp = 0x00007fffc3f0acc0
INFO - rip = 0x00007e4e0e698502
INFO - Found by: previous frame's frame pointer
INFO - 4 libxul.so!NS_InvokeByIndex + 0x8e
INFO - rbp = 0x00007fffc3f0ad90 rsp = 0x00007fffc3f0ad70
INFO - rip = 0x00007e4e115c19c6
INFO - Found by: previous frame's frame pointer
INFO - 5 libxul.so!XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) [XPCWrappedNative.cpp:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 1149 + 0xae9]
INFO - rbp = 0x00007fffc3f0afa0 rsp = 0x00007fffc3f0ada0
INFO - rip = 0x00007e4e0e594661
INFO - Found by: previous frame's frame pointer
INFO - 6 libxul.so!XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) [XPCWrappedNativeJSOps.cpp:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 946 + 0x8]
INFO - rbp = 0x00007fffc3f0b0e0 rsp = 0x00007fffc3f0afb0
INFO - rip = 0x00007e4e0e5952dd
INFO - Found by: previous frame's frame pointer
INFO - 7 libxul.so!js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) [Interpreter.cpp:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 549 + 0x182]
INFO - rbp = 0x00007fffc3f0b1b0 rsp = 0x00007fffc3f0b0f0
INFO - rip = 0x00007e4e1038a5ad
INFO - Found by: previous frame's frame pointer
INFO - 8 libxul.so!Interpret(JSContext*, js::RunState&) [Interpreter.cpp:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 618 + 0x9]
INFO - rbp = 0x00007fffc3f0b690 rsp = 0x00007fffc3f0b1c0
INFO - rip = 0x00007e4e103817c5
INFO - Found by: previous frame's frame pointer
INFO - 9 libxul.so!js::RunScript(JSContext*, js::RunState&) [Interpreter.cpp:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 424 + 0xb]
INFO - rbp = 0x00007fffc3f0b6f0 rsp = 0x00007fffc3f0b6a0
INFO - rip = 0x00007e4e10375a38
INFO - Found by: previous frame's frame pointer
INFO - 10 libxul.so!js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) [Interpreter.cpp:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 590 + 0xb]
INFO - rbp = 0x00007fffc3f0b7c0 rsp = 0x00007fffc3f0b700
INFO - rip = 0x00007e4e1038a9ea
INFO - Found by: previous frame's frame pointer
INFO - 11 libxul.so!js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICCall_Fallback*, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) [BaselineIC.cpp:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 2941 + 0xa]
INFO - rbp = 0x00007fffc3f0bb00 rsp = 0x00007fffc3f0b7d0
INFO - rip = 0x00007e4e108be265
INFO - Found by: previous frame's frame pointer
INFO - 12 0x3b7fe1ea7f58
INFO - rbp = 0x00007fffc3f0bb78 rsp = 0x00007fffc3f0bb10
INFO - rip = 0x00003b7fe1ea7f58
INFO - Found by: previous frame's frame pointer
INFO - 13 0x7e4e04bcad10
INFO - rbp = 0x00007fffc3f0bc18 rsp = 0x00007fffc3f0bb88
INFO - rip = 0x00007e4e04bcad10
INFO - Found by: previous frame's frame pointer
INFO - 14 0x3b7fe1ea548f
INFO - rbp = 0x00007fffc3f0bc80 rsp = 0x00007fffc3f0bc28
INFO - rip = 0x00003b7fe1ea548f
INFO - Found by: previous frame's frame pointer
INFO - 15 libxul.so!js::jit::EnterBaselineInterpreterAtBranch(JSContext*, js::InterpreterFrame*, unsigned char*) [BaselineJIT.cpp:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 187 + 0xec]
INFO - rbp = 0x00007fffc3f0be70 rsp = 0x00007fffc3f0bc90
INFO - rip = 0x00007e4e109599f5
INFO - Found by: previous frame's frame pointer
INFO - 16 libxul.so!Interpret(JSContext*, js::RunState&) [Interpreter.cpp:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 2014 + 0x8]
INFO - rbp = 0x00007fffc3f0c350 rsp = 0x00007fffc3f0be80
INFO - rip = 0x00007e4e10376bcb
INFO - Found by: previous frame's frame pointer
INFO - 17 libxul.so!js::RunScript(JSContext*, js::RunState&) [Interpreter.cpp:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 424 + 0xb]
INFO - rbp = 0x00007fffc3f0c3b0 rsp = 0x00007fffc3f0c360
INFO - rip = 0x00007e4e10375a38
INFO - Found by: previous frame's frame pointer
INFO - 18 libxul.so!js::ExecuteKernel(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value const&, js::AbstractFramePtr, JS::Value*) [Interpreter.cpp:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 811 + 0x5]
INFO - rbp = 0x00007fffc3f0c430 rsp = 0x00007fffc3f0c3c0
INFO - rip = 0x00007e4e1038bcc5
INFO - Found by: previous frame's frame pointer
INFO - 19 libxul.so!js::Execute(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value*) [Interpreter.cpp:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 844 + 0x19]
INFO - rbp = 0x00007fffc3f0c480 rsp = 0x00007fffc3f0c440
INFO - rip = 0x00007e4e1038be06
INFO - Found by: previous frame's frame pointer
INFO - 20 libxul.so!JS::EvaluateDontInflate(JSContext*, JS::ReadOnlyCompileOptions const&, JS::SourceText<mozilla::Utf8Unit>&, JS::MutableHandle<JS::Value>) [CompilationAndEvaluation.cpp:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 594 + 0x4a4]
INFO - rbp = 0x00007fffc3f0c710 rsp = 0x00007fffc3f0c490
INFO - rip = 0x00007e4e1047f22c
INFO - Found by: previous frame's frame pointer
INFO - 21 libxul.so!XRE_XPCShellMain(int, char**, char**, XREShellData const*) [XPCShellImpl.cpp:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 1000 + 0x13]
INFO - rbp = 0x00007fffc3f0caf0 rsp = 0x00007fffc3f0c720
INFO - rip = 0x00007e4e0e58aa84
INFO - Found by: previous frame's frame pointer
INFO - 22 xpcshell!main [xpcshell.cpp:3ca19f8f388ec99c1888f5eed7176629174a66d9 : 66 + 0xc]
INFO - rbp = 0x00007fffc3f0cb40 rsp = 0x00007fffc3f0cb00
INFO - rip = 0x00007e4e153b87cb
INFO - Found by: previous frame's frame pointer
INFO - 23 libc.so + 0x1c8d5
INFO - rbp = 0x00007fffc3f0cc88 rsp = 0x00007fffc3f0cb50
INFO - rip = 0x00007e4e13b7a8d5
INFO - Found by: previous frame's frame pointer
INFO - 24 0x7fffc3f0da18
INFO - rbp = 0x00007fffc3f0d9fc rsp = 0x00007fffc3f0cc98
INFO - rip = 0x00007fffc3f0da18
INFO - Found by: previous frame's frame pointer
INFO - 25 0x7fffc3f21000
INFO - rbp = 0x00007fffc3f0d9fc rsp = 0x00007fffc3f0cda8
INFO - rip = 0x00007fffc3f21000
INFO - Found by: stack scanning
INFO - 26 0x7e4e152d5000
INFO - rbp = 0x00007fffc3f0d9fc rsp = 0x00007fffc3f0ce18
INFO - rip = 0x00007e4e152d5000
INFO - Found by: stack scanning
INFO - 27 xpcshell + 0x6f0
INFO - rbp = 0x00007fffc3f0d9fc rsp = 0x00007fffc3f0ce38
INFO - rip = 0x00007e4e153b86f0
INFO - Found by: stack scanning

Flags: needinfo?(thinker.li)

Thinker Li [:sinker]

Assignee

Comment 65

•

5 years ago

https://phabricator.services.mozilla.com/D46880?vs=201991&id=202713

I have made a minor change to remove the crash when |exec()| after |fork()| fails.
The test harness expect to get a non-zero exit code instead of crash, however the previous changes
make it crash intently.

Flags: needinfo?(thinker.li)

Thinker Li [:sinker]

Assignee

Comment 66

•

5 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d889adbf0370f9d4938e039d5b2c42284bd23744

Pulsebot

Comment 67

•

5 years ago

Pushed by rmaries@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5fe0f063565f Part 1: Add a new process type for ForkServer. r=gsvelto https://hg.mozilla.org/integration/autoland/rev/7dbc37f95ab5 Part 2: Provide methods to recreate a delegated forker. r=gsvelto https://hg.mozilla.org/integration/autoland/rev/8f40dfd4d92f Part 3: AppForkBuilder to ceate a new content process. r=gsvelto https://hg.mozilla.org/integration/autoland/rev/a43f1a2e53ad Part 4: MiniTransceiver to do single-tasking IPC. r=gsvelto https://hg.mozilla.org/integration/autoland/rev/7ea0650d489d Part 5: ForkServer to create new processes. r=gsvelto https://hg.mozilla.org/integration/autoland/rev/d58db9c67aae Part 6: Create a fork server process. r=gsvelto

Oana Pop-Rus

Comment 68

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/5fe0f063565f
https://hg.mozilla.org/mozilla-central/rev/7dbc37f95ab5
https://hg.mozilla.org/mozilla-central/rev/8f40dfd4d92f
https://hg.mozilla.org/mozilla-central/rev/a43f1a2e53ad
https://hg.mozilla.org/mozilla-central/rev/7ea0650d489d
https://hg.mozilla.org/mozilla-central/rev/d58db9c67aae

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox73: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla73

Caspy7

Comment 69

•

5 years ago

Great work folks!

Comment 17 indicates a savings of 6 MB. So this would be a savings of that per process?
Comment 19 refers to that initial code of proof of concept. Do we need to rerun numbers to see how things have worked out practically?
Sorry I've not read every comment, but do we know if this has had any performance impact? Such as process opening (better or worse)?

Thinker Li [:sinker]

Assignee

Updated

•

5 years ago

Blocks: 1601742

Randell Jesup [:jesup] (needinfo me)

Updated

•

5 years ago

Blocks: fission-perf-pageload

Gabriele Svelto [:gsvelto]

Updated

•

5 years ago

Blocks: 1609881

:gerard-majax

Updated

•

5 years ago

Blocks: 1637777

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → -

Whiteboard: [overhead:>4MB][qf-] → [overhead:>4MB]

forkserver-WIP.diff 6 years ago Thinker Li [:sinker] 66.07 KB, patch		Details \| Diff \| Splinter Review
forkserver-wip2.diff 6 years ago Thinker Li [:sinker] 76.44 KB, patch		Details \| Diff \| Splinter Review
fork-server-partial-wip-2019071600.diff 6 years ago Jed Davis [:jld] ⟨⏰\|UTC-8⟩ ⟦he/him⟧ 26.72 KB, patch		Details \| Diff \| Splinter Review
forkserver-wip3.diff 6 years ago Thinker Li [:sinker] 57.56 KB, patch		Details \| Diff \| Splinter Review
forkserver-wip3.diff 6 years ago Thinker Li [:sinker] 65.08 KB, patch	gsvelto : feedback+	Details \| Diff \| Splinter Review
stack_trace.txt 5 years ago Gabriele Svelto [:gsvelto] 5.73 KB, text/plain		Details
forkserver.-wip4.diff 5 years ago Thinker Li [:sinker] 73.86 KB, patch	gsvelto : feedback+	Details \| Diff \| Splinter Review
forkserver-wip5.diff 5 years ago Thinker Li [:sinker] 74.46 KB, patch		Details \| Diff \| Splinter Review
Bug 1470591 - Part 1: Add a new process type for ForkServer. 5 years ago Thinker Li [:sinker] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1470591 - Part 1: Add a new process type for ForkServer. 5 years ago Thinker Li [:sinker] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1470591 - Part 1: Add a new process type for ForkServer. r=gsvelto 5 years ago Thinker Li [:sinker] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1470591 - Part 2: Provide methods to recreate a delegated forker. r=gsvelto 5 years ago Thinker Li [:sinker] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1470591 - Part 3: AppForkBuilder to ceate a new content process. r=gsvelto 5 years ago Thinker Li [:sinker] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1470591 - Part 4: MiniTransceiver to do single-tasking IPC. r=gsvelto 5 years ago Thinker Li [:sinker] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1470591 - Part 5: ForkServer to create new processes. r=gsvelto 5 years ago Thinker Li [:sinker] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1470591 - Part 6: Create a fork server process. r=gsvelto 5 years ago Thinker Li [:sinker] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1470591 - Part 7: Enable fork server by default. 5 years ago Thinker Li [:sinker] 47 bytes, text/x-phabricator-request		Details \| Review