Closed Bug 781613 Opened 12 years ago Closed 12 years ago

Tracking: Optimize app startup

Categories

(Core :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: cjones, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: meta)

Attachments

(3 files)

App startup targets are on the order of hundreds of milliseconds, but we're currently 1s+ for startup.  The work here crosses gaia and gecko.

Startup here is defined as user-perceived first paint; how long it takes to put meaningful "app pixels" on screen.

I'm using three apps as exemplars: (i) blank pink screen, which measures pure gecko overhead; (ii) Calculator, which is minimal gaia/packaging overhead; (iii) Calendar, heavy gecko and gaia utilization.  Patch for blank-pink app is attached.

I'm starting this work on top of glandium's elf-hack fix for b2g, which shaves 200ms or so off app-process startup.  I only care about OOP apps.

Without eideticker (!!), I'm using pseudo-scientific stopwatch timings: tap app icon and start stopwatch, see pixels and stop stopwatch.  Close app, repeat.  Each set of timings is after a full device reboot.
Note that these timings add ~200ms to the real time, since they're partially measuring my reflexes ;).

Initial timings for Pink

1.68s, 1.74, 1.45, 1.71, 1.65

throw out high/low

avg 1.68s
(Also need to add: measuring with USB disconnected.)
Initial timings for Calculator

1.85s, 1.76, 1.56, 1.75, 1.68

throw out high/low

avg 1.73s
Initial timings for Calendar

2.73s, 2.71, 2.57, 2.67, 2.64

throw out high/low

avg 2.67s
I should also note I'm testing on a Nexus S.  My otoro is on loan.

Based on earlier measurements, it looks like the startup profile is currently

   0 -  ~300ms : fork()+exec()+link
~300 -  ~750   : XPCOM init
~750 - ~1500   : ???

Need to quantify the ???.
This should remove ~500ms from startup time.
Without patch

I/Gecko   (  924): [ 7s:878ms] LAUNCH ContentParent: loading app process
I/Gecko   (  924): [ 7s:912ms] LAUNCH TabParent: sending load URL
I/Gecko   (  924): [ 8s:372ms] LAUNCH TabParent: sending load URL
I/Gecko   (  977): [ 8s:887ms] LAUNCH TabChild: recvd load URL
I/Gecko   (  977): [ 8s:989ms] LAUNCH TabChild: recvd load URL

With patch

I/Gecko   ( 1008): [32s:723ms] LAUNCH ContentParent: loading app process
I/Gecko   ( 1008): [32s:755ms] LAUNCH TabParent: sending load URL
I/Gecko   ( 1008): [32s:762ms] LAUNCH TabParent: sending load URL
I/Gecko   ( 1066): [33s:740ms] LAUNCH TabChild: recvd load URL
I/Gecko   ( 1066): [33s:816ms] LAUNCH TabChild: recvd load URL

So without the patch, we wait ~500ms to SendLoadURL() for the app content.  With the patch, we don't wait that 500ms, but startup still takes the same amount of user-perceived time, because we don't process the LoadURL() until about a second after the process starts up in both cases.

That means there's something after startup, and apparently after XPCOM init, that's taking ~500ms :(.

I'm thinking about using a nuclear weapon here.  But we should still fix incidentally.
With a hack to preload app processes in order to make ContentParent::CreateBrowser() "free", 

I/Gecko   (   77): [39s:  4ms] LAUNCH ContentParent: creating app browser
I/Gecko   (   77): [39s: 24ms] LAUNCH TabParent: sending load URL
I/Gecko   (   77): [39s: 26ms] LAUNCH TabParent: sending load URL
[SNIP --- I/Gecko   (   77): [39s:510ms] LAUNCH ContentParent: loading app process ---]
I/Gecko   (  159): [39s:618ms] LAUNCH TabChild: recvd load URL
I/Gecko   (  159): [39s:703ms] LAUNCH TabChild: recvd load URL

I see a small improvement in Pink

[1.43s], 1.33, [1.31], 1.41, 1.35 = avg 1.36s

so we've knocked out most of the XPCOM init overhead, but now we know that there's a ~600ms overhead in TabParent initialization (!!!!).  Poking ...
s/TabParent/TabChild/
We're spending ~650ms in TabChild::InitTabChildGlobal().  Let's see what's eating the time ...

I/Gecko   (  470): [14s:801ms] LAUNCH TabChild: starting init global ...
I/Gecko   (  470): [15s:244ms] LAUNCH TabChild: finished  init global and moar init
(In reply to Chris Jones [:cjones] [:warhammer] from comment #5)
> I should also note I'm testing on a Nexus S.  My otoro is on loan.
> 
> Based on earlier measurements, it looks like the startup profile is currently
> 
>    0 -  ~300ms : fork()+exec()+link
> ~300 -  ~750   : XPCOM init
> ~750 - ~1500   : ???
> 

So the timeline for Pink looks like

    0 -  ~300ms : fork()+exec()+link
 ~300 -  ~750   : XPCOM init
 ~750 - ~1350   : TabChild init
~1350 - ~1500   : Load+paint app (???)

These numbers are somewhat inaccurate, there are little spaces of 10-20ms interspersed.

I have a patch that hides XPCOM init overhead, so if we can fix/hide InitTabChildGlobal(), it looks like we can start up apps in 150-200ms :D.
InitTabChildGlobal compiles and runs BrowserElementChild.js, which is far from trivial
code.
I/Gecko   (  246): [44s:670ms] INITTCGLOBAL: start ...
I/Gecko   (  246): [44s:670ms] INITTCGLOBAL: creating TCG ...
I/Gecko   (  246): [44s:699ms] INITTCGLOBAL: loading BEC.js ...
I/Gecko   (  246): [45s:171ms] INITTCGLOBAL: done

So we spend ~30ms initializing TabChildGlobal, and then ~500ms loading BrowseElementChild.js.  Ouch! :(
(In reply to Olli Pettay [:smaug] from comment #13)
> InitTabChildGlobal compiles and runs BrowserElementChild.js, which is far
> from trivial
> code.

It's nowhere near 500ms of nontrivial.  That's an incredibly long time!
We do have a simple cache for frame scripts, and it works ok in Firefox where tabs end up
having the same frame scripts in the same process. But, if I've understood b2g architecture correctly,
apps end up having just one tabchildglobal per process usually. So the existing cache doesn't
really work in that case.

I wonder if it was possible to compile BrowserElementChild.js to some bytecode format and
put it to shared memory from where child processes could read it and execute it.
(though, would be good to check whether it is compilation or execution which takes time)
With all process priorities set equal

I/Gecko   (  336): [16s:437ms] INITTCGLOBAL: start ...
I/Gecko   (  336): [16s:438ms] INITTCGLOBAL: creating TCG ...
I/Gecko   (  336): [16s:462ms] INITTCGLOBAL: loading BEC.js ...
I/Gecko   (  336): [16s:825ms] INITTCGLOBAL: done

this cuts down BEC.js init to ~350ms.  So we're being starved a little bit by the opening animation, but not /starved/ starved.
So here's what I see on startup

I/Gecko   (  157): [30s:679ms] INITTCGLOBAL: start ...
I/Gecko   (  157): [30s:679ms] INITTCGLOBAL: creating TCG ...
I/Gecko   (  157): [30s:712ms] INITTCGLOBAL: loading BEC.js ...
I/Gecko   (  157): [30s:714ms] INITTCGLOBAL: load frame script ...
I/Gecko   (  157): [30s:715ms] INITTCGLOBAL: created channel ...
I/Gecko   (  157): [30s:715ms] INITTCGLOBAL: opened channel ...
I/Gecko   (  157): [30s:719ms] INITTCGLOBAL: read stream ...
I/Gecko   (  157): [30s:729ms] INITTCGLOBAL: converted to UTF16 ...
I/Gecko   (  157): [30s:816ms] INITTCGLOBAL: compiled script ...

[Ed: ouch, this is ~90ms.]

I/Gecko   (  157): [30s:816ms] INITTCGLOBAL: put script in cache ...
I/Gecko   (  157): [31s:184ms] INITTCGLOBAL: executed script ...

[Ed: GAH!  here's the overhead.]

I/Gecko   (  157): [31s:190ms] INITTCGLOBAL: done loading frame script
I/Gecko   (  157): [31s:190ms] INITTCGLOBAL: done

I/Gecko   (  157): [31s:190ms] INITTCGLOBAL: load frame script ...

[Ed: hm, loading another?  I guess this forms.js]

I/Gecko   (  157): [31s:191ms] INITTCGLOBAL: created channel ...
I/Gecko   (  157): [31s:191ms] INITTCGLOBAL: opened channel ...
I/Gecko   (  157): [31s:191ms] INITTCGLOBAL: read stream ...
I/Gecko   (  157): [31s:192ms] INITTCGLOBAL: converted to UTF16 ...
I/Gecko   (  157): [31s:201ms] INITTCGLOBAL: compiled script ...
I/Gecko   (  157): [31s:205ms] INITTCGLOBAL: put script in cache ...
I/Gecko   (  157): [31s:211ms] INITTCGLOBAL: executed script ...
I/Gecko   (  157): [31s:213ms] INITTCGLOBAL: done loading frame script
I/Gecko   (  157): [31s:237ms] INITTCGLOBAL: load frame script ...

[Ed: 40ms, so not "hot" yet.]

So prefetching and precompiling BrowserElementChild.js, which would be pretty easy in our prelaunched content process, would save up to 100ms or so on startup.  Not bad, but all the time is going to executing the script.
I'll see what I can do with dom/browser-element/BrowserElementChild.js, but it really wants a jseng ninja who knows how to optimize (interpreted?) JS.
With METHODJIT_ALWAYS + TYPE_INFERENCE:

I/Gecko   (  252): [36s:659ms] INITTCGLOBAL: converted to UTF16 ...
I/Gecko   (  252): [36s:743ms] INITTCGLOBAL: compiled script ...

~90ms, about the same.  Somewhat surprising ...

I/Gecko   (  252): [36s:744ms] INITTCGLOBAL: put script in cache ...
I/Gecko   (  252): [37s:122ms] INITTCGLOBAL: executed script ...

Also about the same, ~350ms.  Maybe I did something wrong?
Stopwatch profile of BEC itself

I/Gecko   (  504): [0ms] INITTCGLOBAL: BEC.js executing ...
I/Gecko   (  504): [102ms] INITTCGLOBAL: imported modules ...
I/Gecko   (  504): [108ms] INITTCGLOBAL: defined BEC ...
I/Gecko   (  504): [338ms] INITTCGLOBAL: created BEC instance ...
I/Gecko   (  504): [340ms] INITTCGLOBAL: defined ContentPanning ...
I/Gecko   (  504): [342ms] INITTCGLOBAL: inited ContentPanning ...
I/Gecko   (  504): [343ms] INITTCGLOBAL: defined KineticPanning ...
I/Gecko   (  504): [343ms] INITTCGLOBAL: defined ElementTouchHelper ...
I/Gecko   (  504): [344ms] INITTCGLOBAL: BEC.js done

So importing modules is hurty, but BrowserElementChild:_init() is costing 200ms+.  Youch.

Interestingly enough, this logging also ran for master-process content, where I see

I/Gecko   (  469): [0ms] INITTCGLOBAL: BEC.js executing ...
I/Gecko   (  469): [35ms] INITTCGLOBAL: imported modules ...
I/Gecko   (  469): [37ms] INITTCGLOBAL: defined BEC ...
I/Gecko   (  469): [65ms] INITTCGLOBAL: created BEC instance ...
I/Gecko   (  469): [65ms] INITTCGLOBAL: defined ContentPanning ...
I/Gecko   (  469): [65ms] INITTCGLOBAL: inited ContentPanning ...
I/Gecko   (  469): [66ms] INITTCGLOBAL: defined KineticPanning ...
I/Gecko   (  469): [66ms] INITTCGLOBAL: defined ElementTouchHelper ...
I/Gecko   (  469): [66ms] INITTCGLOBAL: BEC.js done

for the first load of BrowserElementChild.js, (presumably for the in-process system app).

So BEC.js takes 5x longer to run in a content process?  W.T.F.!!

Smells like a mjit/TI issue ... maybe we're not setting up our runtime properly or something.  Could use help here.
Maybe dmandelin has an idea for somebody who can look into this.
With all potential wins from dependent bugs
 - Pink startup time goes to limit of measurement error
 - Calculator goes to same limit
 - Calendar could improve to ~2.4s or so

Calendar will need further followups.
It makes me sad that we have to bend over backwards to work around 600ms of startup overhead caused by core code written in JS for programmer productivity that would have been microseconds-millisecond if written in C++.  Even if we pull off heroics for the JS code, we'll still be maybe ~100x slower than equivalent C++.

We shouldn't have to choose based on that.  How do we fix it?
Chris, the BEC.js call a sends a sync message to the parent process to get the manifest URL and set the app id and such. How is the timing if you comment from "let appManifestURL = [..]" to "windowUtils.setIsApp()"?
Also, the securityUI initialization could take some time. It doesn't seem trivial: https://mxr.mozilla.org/mozilla-central/source/security/manager/boot/src/nsSecureBrowserUIImpl.cpp
With all dependent patches applied

Pink: 1.03, [1.07], [0.92], 1.01, 0.96 avg 1.00 (below the limit of measurement error)

Calculator: [1.07], 1.23, 1.22, 1.24, [1.27] avg 1.22

Calendar: 2.73, [2.78], 2.70, [2.68], 2.75 avg 2.73

I'm not testing with the same version of gaia, so looks like calendar startup regressed considerably in the meantime.  Now we're back where we started :(.
(In reply to Olli Pettay [:smaug] from comment #13)
> InitTabChildGlobal compiles and runs BrowserElementChild.js, which is far
> from trivial
> code.

If time is taken up by compilation should look into breaking up startup-optional parts of that file into another file.
Depending how how much js is loaded on startup, should look into XDR for js files loaded on startup. I hear JS team plans to lazily parse function bodies in the future, that might avoid avoid having to go with XDR uglyness.
Bug 781725 works around this, but not in a happy-making way.
Not knowing about this bug I filed bug 812437. This one seems to already develop into a full blown bug with patches, should we use that one for pure tracking and make this more specific?
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
See Also: → AppStartup
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: