Slow content process initialization or startup on OSX

NEW
Assigned to

Status

()

defect
P3
normal
2 years ago
2 years ago

People

(Reporter: gkrizsanits, Assigned: gkrizsanits)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [e10s-multi:+] sb+)

https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1559654,1,1&series=mozilla-inbound,1559657,1,1&series=mozilla-inbound,1559658,1,1&series=mozilla-inbound,1559655,1,1

The new cpstartup test shows surprisingly high numbers for OSX builds and some big noise as well compared to other tests. As a comparison the tabpaint test for example does not show such difference between OSX and other platforms.
Assignee: nobody → gkrizsanits
Priority: -- → P1
Whiteboard: [e10s-multi:+]
This only happens on OSX and on my local machine I could not detect such a significant difference between OSX and Windows... Are there any other talos tests as well that performs a lot slower on OSX than on other platforms? Is there anything special about the OSX machine that runs talos tests compared to the linux/windows ones (number of CPU or anything that would explain such a huge difference)?
Flags: needinfo?(jmaher)
here are the hardware specs for the machines:
https://wiki.mozilla.org/Buildbot/Talos/Misc#Hardware_Profile_of_machines_used_in_automation

I would say the osx machines are slightly faster than the linux/windows machines.  I have found in general osx is noisier than other platforms.  I think the end result is we won't be able to detect regressions in cpstartup on osx unless they are really large regressions.
Flags: needinfo?(jmaher)
Actually, only on try, on inbound it's working fine: https://treeherder.mozilla.org/perf.html#/graphs?series=try,1558970,1,1&series=mozilla-inbound,1559655,1,1&highlightedRevisions=8a3b8491838e

Joel, do you know what's the difference between inbound and try in this respect? Seems like something disables multi on try for this test but not on inbound. I'm trying to figure out what has changed on oct 5-6.
Flags: needinfo?(jmaher)
thanks for pointing this out, it appears some code was accidentally changed and we were using different preferences/addons in production, I believe this is fixed in bug 1402426.
Flags: needinfo?(jmaher)
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #5)
> thanks for pointing this out, it appears some code was accidentally changed
> and we were using different preferences/addons in production, I believe this
> is fixed in bug 1402426.

Well, thanks for the fix :)
This seems to be an activity:Stream issue https://perf-html.io/public/8b9ef9d4fe5227e4923158220ff05f374463d032/calltree/?hiddenThreads=&thread=3&threadOrder=0-2-3-1

The weird thing about this that I don't see this issue locally on my OSX machine. Is there any special setup for activity stream on OSX on inbound?

Also is there a way to wait for the activity:stream to completely load everything it has to load before I run this perf test? Or is it always active whenever I open a new tab and load some content in it (the test does not load about:newtab in the new tab)?
Flags: needinfo?(dmose)
:krizsa, how is it possible to tell what the cause is?  When I click that link, about half of the time seems to be spent in the event loop code itself (?), and the vast majority of the rest of the symbols with the most time attributed to them in the call tree are unsymbolicated hex addresses inside of libXUL.

:mstange, i was somehow under the impression that the key symbolication bugs had been squashed at this point.  Am I misremembering and misinformed?
Flags: needinfo?(mstange)
Flags: needinfo?(gkrizsanits)
Flags: needinfo?(dmose)
:mstange That was only half finished before I hit submit; sorry!  I'm curious whether this is a known bug, or if we've stumbled on a new one that needs to be filed...
(In reply to Dan Mosedale (:dmose) from comment #8)
> :krizsa, how is it possible to tell what the cause is?  When I click that
> link, about half of the time seems to be spent in the event loop code itself
> (?), and the vast majority of the rest of the symbols with the most time
> attributed to them in the call tree are unsymbolicated hex addresses inside
> of libXUL.

Sorry, I expected the link to do a better job at pointing at what I was seeing... So you want to select the 2nd content process (2/2) and then select the part of the time frame where that process is active and then just keep pressing the right key on the call tree until you see TabChild::RecvAsyncMessage RemotePage:Message and then go even further down until you see it calling into activity stream code. There are other factors here for sure, but I don't expect any activity stream stuff to show up here (which might be a false assumption ofc)

> 
> :mstange, i was somehow under the impression that the key symbolication bugs
> had been squashed at this point.  Am I misremembering and misinformed?

Is that true for perf data obtained from try builds as well? That would be awesome news... I think there is some black magic to get that working too, I might be looking into it soon.
Flags: needinfo?(gkrizsanits)
The other part that seems to be a big issue is in the 1st content process, we spend an awful lot of time in PContent::Msg_SetProcessSandbox. Haik, do we know about any sandbox perf issue on OSX?
Flags: needinfo?(haftandilian)
sandbox_compile_string takes over 100ms to compute... that's not a very fast function...
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #11)
> The other part that seems to be a big issue is in the 1st content process,
> we spend an awful lot of time in PContent::Msg_SetProcessSandbox. Haik, do
> we know about any sandbox perf issue on OSX?

I don't know of anything, but we haven't been profiling the sandbox startup code. You mentioned sandbox_compile_string and it makes sense that that would be relatively slow because it is compiling the sandbox policy string (which is a lisp program) that includes regular expressions and loading the result into the kernel.

I wonder how much of that 100ms is due to the content of the policy and how much is unavoidable overhead. Eventually our policy will get smaller. I don't know of anything that dictates which thread we have to use to setup the sandbox. I think it's worth filing a bug to investigate/improve.
Flags: needinfo?(haftandilian)
In looking into font loading for bug 1393259, I noticed that we load some font files from the filesystem on the main thread at content startup. The loading is from ActivateFontsFromDir() called from gfxMacPlatformFontList::gfxMacPlatformFontList(). I did some quick timing and the two calls to ActivateFontsFromDir() take a combined 50 milliseconds on my mid 2015 MacBook Pro. We might be able to move that off the main thread because we already pre-load other fonts on a non-main thread. Jonathan, do you know if those could be moved off-main-thread.
Flags: needinfo?(jfkthame)
(In reply to Dan Mosedale (:dmose) from comment #8)
> :mstange, i was somehow under the impression that the key symbolication bugs
> had been squashed at this point.  Am I misremembering and misinformed?

Where did this happen? Talos? Locally? What kind of build? I've lost sight of the situation on Talos, but symbolicating Nightly builds locally with the gecko profiler add-on should be fairly solid at this point.
Flags: needinfo?(mstange)
:mstange, this appears to be talos builds on mozilla-inbound.  Do those get symbolicated?
Flags: needinfo?(mstange)
They should, and if they don't, it's a bug. Can you file one?
Flags: needinfo?(mstange)
> Also is there a way to wait for the activity:stream to completely load everything
> it has to load before I run this perf test? Or is it always active whenever I
> open a new tab and load some content in it (the test does not load about:newtab
> in the new tab)?

So what's going on here, I suspect, is that as part of opening a new tab, the preload browser for the _next_ about:newtab is created and immediately starts loading invisibly behind the scenes.  mconley's patch in bug 1353013 intends to make that happen at idle time, rather than immediately as part of the tab load.

You can suppressing preloading entirely by setting browser.newtab.preload to false.

(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #10)
> 
> Sorry, I expected the link to do a better job at pointing at what I was
> seeing... So you want to select the 2nd content process (2/2) and then
> select the part of the time frame where that process is active and then just
> keep pressing the right key on the call tree until you see
> TabChild::RecvAsyncMessage RemotePage:Message and then go even further down
> until you see it calling into activity stream code. There are other factors
> here for sure, but I don't expect any activity stream stuff to show up here
> (which might be a false assumption ofc)

Assuming that Content Process two is running either about:home or about:newtab, that's entirely normal.  The way those pages work, once the most minimal bits are in place, they send a message to chrome asking for the state that they need to render.  The message you've captured is the response from chrome, and in the message handler for that it puts the state in place and causes React to render, which is what's taking place those 53 ms (drill further down the stack, and you'll see lots of react-dom).  Note that at this point, the chrome/content boundary has been crossed, and the handlers will be running in the content page.  So does make things more in line with what you would expect?


We are continuing to work on improving performance in general, (AS focus for 58 is entirely performance).  It's not obvious to me from what I understand at this moment whether there's anything that's worth attacking for uplift to 57.  Do you have thoughts on that?
:gabor, am I correct that the link in comment 7 was generated against a mozilla-inbound build?  If not, where did it come from?  (I'd like to file the bug that :mstange requested).
Flags: needinfo?(gkrizsanits)
(In reply to Dan Mosedale (:dmose) from comment #19)
> :gabor, am I correct that the link in comment 7 was generated against a
> mozilla-inbound build?  If not, where did it come from?  (I'd like to file
> the bug that :mstange requested).

It is from a try push to be precise (OSX opt), I believe from this one: 
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0dfbf237910025a34f867e773c46a12ab8f08a9e&selectedJob=135350647

(In reply to Dan Mosedale (:dmose) from comment #18)
> 
> You can suppressing preloading entirely by setting browser.newtab.preload to
> false.

That was my first bet as well, I did try that but it didn't change much: https://treeherder.mozilla.org/#/jobs?repo=try&revision=944290fa1f4e01909e99984f34eb943b50fce12b&selectedJob=137211352

I didn't have the profiler turned on for this one, so I will have to do it again and see if the acivity:stream work is still happening (so it's mostly unrelated to the regression) or something else is going on. I'll post here my results.

> 
> (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #10)
> Assuming that Content Process two is running either about:home or
> about:newtab, that's entirely normal.

I _hope_ that's the case and then browser.newtab.preload will fix this part of the problem.

> 
> 
> We are continuing to work on improving performance in general, (AS focus for
> 58 is entirely performance).  It's not obvious to me from what I understand
> at this moment whether there's anything that's worth attacking for uplift to
> 57.  Do you have thoughts on that?

I'm not sure yet, but at this point I would place my bet on the sandbox issue to be the real issue here, and the activity:stream bits are OK just the test should filter it out. If browser.newtab.preload would not work then maybe there is a more serious issue, but I hope that's not the case.
Flags: needinfo?(gkrizsanits)
(In reply to Haik Aftandilian [:haik] from comment #13)
> (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #11)
> > The other part that seems to be a big issue is in the 1st content process,
> > we spend an awful lot of time in PContent::Msg_SetProcessSandbox. Haik, do
> > we know about any sandbox perf issue on OSX?
> 
> I don't know of anything, but we haven't been profiling the sandbox startup
> code. You mentioned sandbox_compile_string and it makes sense that that
> would be relatively slow because it is compiling the sandbox policy string
> (which is a lisp program) that includes regular expressions and loading the
> result into the kernel.
> 
> I wonder how much of that 100ms is due to the content of the policy and how
> much is unavoidable overhead. Eventually our policy will get smaller. I
> don't know of anything that dictates which thread we have to use to setup
> the sandbox. I think it's worth filing a bug to investigate/improve.

This is a pretty big deal I'm afraid, let's triage this for sandbox bugs.
Whiteboard: [e10s-multi:+] → [e10s-multi:+] sb?
(In reply to Haik Aftandilian [:haik] from comment #14)
> In looking into font loading for bug 1393259, I noticed that we load some
> font files from the filesystem on the main thread at content startup. The
> loading is from ActivateFontsFromDir() called from
> gfxMacPlatformFontList::gfxMacPlatformFontList(). I did some quick timing
> and the two calls to ActivateFontsFromDir() take a combined 50 milliseconds
> on my mid 2015 MacBook Pro. We might be able to move that off the main
> thread because we already pre-load other fonts on a non-main thread.
> Jonathan, do you know if those could be moved off-main-thread.

The other "font preloading" that we do on a non-main thread is a different type of "loading" -- it's not affecting which fonts are present in our font list, but preloading some internal details of each font in the list, to avoid a potential delay the first time layout needs to use them. Importantly, it doesn't have to complete before we render stuff.

ActivateFontsFromDir is different in that it is registering additional resources with the OS font manager, such that they'll appear in the system's installed font collection. Changes to the system font collection will trigger a notification to Firefox (just like if the user manually installed or removed fonts while Firefox is running). In response, we discard and rebuild our font list, as the old one is no longer valid.

Therefore, while AFAIK there's no fundamental reason ActivateFontsFromDir couldn't happen on a different thread, I'm concerned that doing so might actually be worse overall. It would mean that the platform initialization happening on the main thread could proceed, building a font list based on the currently-available system font collection; but shortly thereafter, the activation of the additional fonts happens, we get notified that the font collection has changed, and so we throw away our list and rebuild it.

So although we'd have taken 50ms out of the main thread during startup, the effect would be to create an extra chunk of slightly-deferred work (the rebuild of the entire font list) that still has to happen before we actually render anything.
Flags: needinfo?(jfkthame)
So for activity stream I forgot to kill the preloaded browser on top of disabling it by the flag. But even turning off activity stream and content sandbox entirely the issue didn't go away.

I triggered a few more rounds and it turns out that the problem is intermittent (recently, it used to be pretty solid when I filed the bug) and the profile I was examining is the good case (which means there is plenty of space for improvements).

After found a profile that is from the slower case: https://perfht.ml/2ipb64n and compared with the faster path the real culprit is the GetCurrentProcess call. It calls into _RegisterApplication and sometimes that takes ~53ms sometimes a good ~253ms this is mainly the reason for that fluctuation.

I'm not a big OSX expert Haik, any ideas what might be the problem here?
Flags: needinfo?(haftandilian)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #23)
> I'm not a big OSX expert Haik, any ideas what might be the problem here?

Discussed with Gabor on IRC. Gabor is going to experiment with using alternatives for GetCurrentProcess() which is deprecated. It looks like _RegisterApplication() is doing some setup that requires talking to the Launch Services daemon over IPC, but I don't know anything more.
Flags: needinfo?(haftandilian)
(In reply to Haik Aftandilian [:haik] from comment #24)
> (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #23)
> > I'm not a big OSX expert Haik, any ideas what might be the problem here?
> 
> Discussed with Gabor on IRC. Gabor is going to experiment with using
> alternatives for GetCurrentProcess() which is deprecated. It looks like
> _RegisterApplication() is doing some setup that requires talking to the
> Launch Services daemon over IPC, but I don't know anything more.

I tried using the pid version of the API but that didn't help. 
https://treeherder.mozilla.org/#/jobs?repo=try&revision=af958804ab83efe40ffbf46a74273c5013f733a2&selectedJob=138891118

But this can be because of the other place where GetCurrentProcess is called:
https://dxr.mozilla.org/mozilla-central/rev/69e24c678a28dc46a4c1bda3ff04b2f6106ff71a/dom/plugins/ipc/PluginUtilsOSX.mm#276

which we call right after process creation from ContentChild::SetProcessName (the child process is using the same method as the plugin process), this call looks like a no-op, but the rename does not work without it...

So the only way to address this issue is to delay that call. Here is an attempt let's see how it goes:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=bec165199727fcb4b3c27b7c9fa233a77ac1cf76&selectedJob=139138593
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #25)
> So the only way to address this issue is to delay that call. Here is an
> attempt let's see how it goes:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=bec165199727fcb4b3c27b7c9fa233a77ac1cf76&selectedJob=1
> 39138593

And even that does not work: https://perfht.ml/2h5184N
OSX is kind enough to call it for us in NSApplication Init internally. So this is just something we have to pay for starting up a content process it seems.
Whiteboard: [e10s-multi:+] sb? → [e10s-multi:+] sb+
Filed bug 1412855 for the sandbox_compile_string case. Since this only affects 10.10 and I don't see any way to fix it, I'm lowering its priority for now.
Priority: P1 → P3
You need to log in before you can comment on or make changes to this bug.