Closed
Bug 834622
Opened 12 years ago
Closed 7 years ago
[meta] More Gecko startup win potpourri
Categories
(Core :: General, defect, P2)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: cjones, Unassigned)
References
(Depends on 2 open bugs)
Details
(Keywords: meta, perf, Whiteboard: [c=progress p= s= u=])
Attachments
(7 files, 10 obsolete files)
1.16 MB,
image/jpeg
|
Details | |
4.08 KB,
patch
|
Details | Diff | Splinter Review | |
58.24 KB,
image/jpeg
|
Details | |
98.24 KB,
patch
|
Details | Diff | Splinter Review | |
506 bytes,
patch
|
Details | Diff | Splinter Review | |
6.26 KB,
patch
|
Details | Diff | Splinter Review | |
10.69 KB,
patch
|
Details | Diff | Splinter Review |
I applied the patch in bug 797189 that I (half forgot I) made a while ago to drop nice sample labels onto IPC work, and the scales immediately fell from my eyes!
The following profile of Template app startup is captured with 1ms sample interval in a 10s buffer
http://people.mozilla.com/~bgirard/cleopatra/#report=65a7832015a3c3e3335f17e322da8f3a5c150b6b
There's quite a bit we can do here. Roughly speaking, in units of samples that are 2ms+ each, the time is going to these significant things (in chronological order)
21: something to do with setting app ID
24: blocked on parent in NotifyIMEFocus() (???)
51: running BrowserElementChild.js
35: compiling two chrome JS scripts
5: running forms.js
24: running UserAgentOverrides.jsm
21: something to do with AppProtocolHandler.js
Now we've actually started loading content. We spent about 181 samples to get there, which is around 350ms.
6: more BrowserElementChild.js stuff; setVisible/nextPaintListener
5: UpdateDimensions; first layout flush
7: TabChild::RecvActivate (focus), mostly in forms.js
26: nsInputStreamPump::OnStateStart, broken down like so
6: BrowserElementChild.js and AppsService.js stuff (maybe different)
5: blocked on parent getting DPI (???)
15: potpourri
16: flushing results of HTML parsing, including selector matching
20: first refresh driver tick, second reflow and first paint
7: style computation
5: BrowserElementChild.js titleChangedHandler() (???)
28: second refresh driver tick, broken down like so
20: painting
7: BrowserElementChild.js onMozAfterPaint() handler
At this point the page should be fully loaded and painted.
We spend a bit of time sitting in the event loop after that, then the screenshotting code runs.
We have a lot of winning we can do here.
Reporter | ||
Comment 1•12 years ago
|
||
Mostly a checkpoint.
This gets almost all of the PBrowser::RecvLoadRemoteScript samples out of the profile, 66 -> 5. This is theoretically a 100ms+ win, and I maybe measure 40ms-100ms win on stopwatch but it's tricky.
http://people.mozilla.com/~bgirard/cleopatra/#report=311208ba64bfc846c5e1f558e757faaea95aca19
Reporter | ||
Comment 2•12 years ago
|
||
One more thing that "ain't it" before I forget: I modified the Template application.zip to include a giant mp3 (9.5MB), and it didn't impact startup time at all. So looks like our JAR reading code is doing a good job.
Reporter | ||
Comment 3•12 years ago
|
||
Attachment #706299 -
Attachment is obsolete: true
Reporter | ||
Comment 4•12 years ago
|
||
Initial wins.
Reporter | ||
Comment 5•12 years ago
|
||
Rebased profile with above patches applied
http://people.mozilla.com/~bgirard/cleopatra/#report=d8156a72df7c213ed9821f8f29b31d4e29dd9cc3
Reporter | ||
Comment 6•12 years ago
|
||
The rebased profile was rebuilt without --disable-elf-hack, and now the resolved libxul.so symbols look less believable. Known issue?
Comment 7•12 years ago
|
||
> Known issue?
Yes.
Reporter | ||
Comment 8•12 years ago
|
||
This seems to shave 5-8 samples off the processing there, but I don't have enough data yet to know if that's statistically significant.
Comment 9•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #6)
> The rebased profile was rebuilt without --disable-elf-hack, and now the
> resolved libxul.so symbols look less believable. Known issue?
Seems to be. See bug 827846
There is a bionic patch which apparently fixes things.
Reporter | ||
Comment 10•12 years ago
|
||
We have three or four synchronous IPC's during PBrowser::RecvShow() that are causing us to hang on the b2g process for a nontrivial number of cycles (25-50). Two of those calls are necessary and can't really be avoided, but this one is totally pointless in b2g.
For some reason, this call seemed to be sticking more than the others, so a couple profiles showed it knocking off ~20 samples. Take with a grain of salt though.
Reporter | ||
Comment 11•12 years ago
|
||
Stepping back a bit, the biggest remaining offenders are
50-60 samples: RecvShow -> BrowserElementChild.js init
20-30 samples: sometimes we see 1 refresh driver tick, and we spend ~20 samples in reflow+painting stuff. Sometimes we see 3 ticks and spend ~50 samples there. We shouldn't do unnecessary work here.
~25 samples: nsInputStreamPump::OnInputStreamReady. We do a variety of things here and I don't see any low-hanging fruit.
~20 samples: flushing HTML parse state. This seems high for a trivial test like the Template app, but there's probably not a lot we can feasibly do here.
But, there's not much left here that's OOP specific. Will compare against in-process load of Template.
Reporter | ||
Comment 12•12 years ago
|
||
It's very hard to measure any difference by stopwatch with Template in- vs. out-of-process. This may be due to regressions in the gaia window manager.
Reporter | ||
Comment 13•12 years ago
|
||
Hmmmm lots of good stuff here ...
out-of-process (with 3 refresh driver ticks behavior)
http://people.mozilla.com/~bgirard/cleopatra/#report=3cb37bcffa8e2b4cf4e8cf332ce5f8acec2f9a08
in-process
http://people.mozilla.com/~bgirard/cleopatra/#report=6e2d6a5b5523b6e5748f324716a32e06c185688b
Reporter | ||
Comment 14•12 years ago
|
||
Quick notes
- average sample interval is a bit over 2ms for OOP, about 1.3ms for in-process. So each OOP sample is about 50% more clock time.
- BEC startup seems to take about the same amount of *samples* in/out-of-process
- gaia window_manager is triggering a stupid-expensive sync reflow in setDisplayedApp(), apparently caused by
var cssHeight = window.innerHeight - StatusBar.height;
^^^ this
I'm not 100% sure we've already started the child load when that happens, but we should have.
- we're processing *12* expensive refresh driver ticks, each pretty expensive.
The rest is a bit cloudier because the samples aren't organized as neatly, but
- InputStream, html5 parsing, and CSS style computation "stuff" seems to consume about the same number of samples in/OOP
- loading the app screenshot shows up in the profile
Reporter | ||
Comment 15•12 years ago
|
||
A bit earlier back in the tap-icon phase of launch,
- system app is chewing ~30 samples processing touch events that seem to be eventually forwarded to the Homescreen process. That's a bit odd and probably adds ~50ms latency to Homescreen processing the tap.
- after the touchend is sent to Homescreen, b2g process sits idle for ~50 samples before it gets the WebApps:launchApp message. Seems to be ~75ms that Homescreen is doing ... something.
- homescreen profile shows ~60 samples between touchstart and WebApps::launch. (Homescreen samples come in at ~1.3ms apart.)
- homescreen hits a pretty exensive refresh driver tick (~40 samples) after touchstart but before touchend. Most likely the app icon being highlighted.
- homescreen spends another ~40 samples reflowing and restyling after sending launch(). Probably resetting the icon tap state. This is happening while the b2g process starts chewing CPU so probably appears more expensive than it really is.
- homescreen sits at SendGetInputContext() for *174* samples after the post-launch reflow/restyle. I have no idea what's happening here, and I don't think it's necessary.
- homescreen spends ~20 samples in something CSP-related around an nsInputStreamPump::OnInputStreamReady. Maybe this is the code actually fetching the highlighted icon?
- then homescreen finishes off with another expensive refresh driver tick
Reporter | ||
Comment 16•12 years ago
|
||
After some more gecko fixes, the parts of the Homescreen app on the critical startup path are looking /OK/, but some room for improvement.
- touchstart is handled quickly by Homescreen.
- ... but then BrowserElementScrolling does 20-30ms unnecessary work. Bug 834996.
- refresh driver tick. Reflow and repaint. 40-50ms. Not sure why this is happening yet.
- touchmove dispatched. mouse_event_shim seems to be doing ~10ms unnecessary work, bug 834997.
- homescreen handles synthetic mousemove quickly.
- touchend dispatched. Homescreen takes about 7ms to call launch().
Based on the profile, no idle event-loop time, it appears that the touchmove and touchend are sitting in the event queue. So processing those more quickly would shorten the critical path here, but by how much I can't say without better instrumentation.
Reporter | ||
Comment 17•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #16)
> Based on the profile, no idle event-loop time, it appears that the touchmove
> and touchend are sitting in the event queue. So processing those more
> quickly would shorten the critical path here, but by how much I can't say
> without better instrumentation.
The main b2g process sits idle for a little over 40ms between sending touchend and receiving WebApps:launchApp(). So we may have ~30ms to win here.
Reporter | ||
Comment 18•12 years ago
|
||
The b2g process is not behaving as well as the homescreen. Rough breakdown is
- small amount of useless work on touchend, ~5ms. Then sends to homescreen.
- receives launchApp(). Then lots of fun stuff happens, total of about 200 samples ~250ms
- eventually dispatch to gaia window_manager.js
- 60 samples ~75ms in manifest manipulation (bug 834999)
- 75 samples ~85ms in BrowserElementParent and other iframe-related C++ init (bug 835001)
- 40 samples ~50ms mostly doing a sync reflow forced by |StatusBar.height| in gaia setDisplayedApp()
We SendLoadURL() to the app process just before the sync reflow, but everything else is on the critical path.
- after that a smattering of refresh driver ticks and JS callbacks, mostly cheap though one tick is 20 samples ~25ms
- ~30 samples ~35ms synchronously decoding an image on the main thread (app screenshot?), interrupted by refresh driver tick
- another relatively expensive refresh driver tick, ~15 samples
- relatively expensive repaint, 30 samples
- expensive transitionend handler, 45 samples
The last four aren't on the critical startup path but will compete for and win CPU over the loading app process.
Reporter | ||
Comment 19•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #8)
> Created attachment 706633 [details] [diff] [review]
> Patch from fabrice to speed up RecvPBrowserCtor
>
> This seems to shave 5-8 samples off the processing there, but I don't have
> enough data yet to know if that's statistically significant.
From what I've seen so far this is on the borderline of significant so this patch is probably worth taking.
Updated•12 years ago
|
Summary: More Gecko startup win potpourri → [meta] More Gecko startup win potpourri
Comment 20•12 years ago
|
||
If this is indeed a meta bug we won't block on it but if not please remove [meta] and re-nom. I don't know if we have a clear bar for perf work to be blockers but nomming perf fixes is very much appreciated.
blocking-b2g: tef? → ---
Reporter | ||
Comment 21•12 years ago
|
||
Reporter | ||
Comment 22•12 years ago
|
||
Maybe BenWa will be able to help figure out what's wrong here.
Reporter | ||
Comment 23•12 years ago
|
||
Comment on attachment 706605 [details] [diff] [review]
Precompile some more scripts
Tracked in bug 835584 now.
Attachment #706605 -
Attachment is obsolete: true
Reporter | ||
Comment 24•12 years ago
|
||
Still not getting markers in my profile ... but this is more for other folks's benefit, so I'm going to bail for now.
Reporter | ||
Comment 25•12 years ago
|
||
Comment on attachment 706649 [details] [diff] [review]
Don't initialize cross-process IME when we don't need it
Tracked in bug 835591 now.
Attachment #706649 -
Attachment is obsolete: true
Reporter | ||
Comment 26•12 years ago
|
||
Comment on attachment 707288 [details] [diff] [review]
Backport bug 773428 part 2 to b2g18
Tracked in bug 835592 .
Attachment #707288 -
Attachment is obsolete: true
Reporter | ||
Comment 27•12 years ago
|
||
Comment on attachment 707298 [details] [diff] [review]
Apparently nonfunctional patch to add sample markers to critical startup path
Tracked in bug 835592 .
Attachment #707298 -
Attachment is obsolete: true
Reporter | ||
Comment 28•12 years ago
|
||
Comment on attachment 707329 [details] [diff] [review]
Backport bug 773428 part 1
Tracked in bug 835592 .
Attachment #707329 -
Attachment is obsolete: true
Reporter | ||
Comment 29•12 years ago
|
||
before:
0.94, 0.97, 0.91
after:
0.85, 0.75, 0.88
My stopwatch timings are getting extremely difficult to do, but we should be sitting on ~100ms win so far.
Reporter | ||
Comment 30•12 years ago
|
||
Comment on attachment 706633 [details] [diff] [review]
Patch from fabrice to speed up RecvPBrowserCtor
Tracked in bug 835596 .
Attachment #706633 -
Attachment is obsolete: true
Reporter | ||
Comment 31•12 years ago
|
||
With some manual annotations on relevant startup methods, I see
01-29 01:38:54.015 I/Gecko ( 1584): STARTUP: TabParent::SendLoadURL
01-29 01:38:54.235 I/Gecko ( 1666): STARTUP: TabChild::RecvLoadURL
01-29 01:39:25.375 I/Gecko ( 1584): STARTUP: TabParent::SendLoadURL
01-29 01:39:25.556 I/Gecko ( 1684): STARTUP: TabChild::RecvLoadURL
so the TabParent is sending LoadURL about 200ms before TabChild starts processing it. The startup "stuff" before that mostly runs in parallel, so we should look for ways to rearrange things so that we can start processing LoadURL earlier.
Unfortunately, the child process isn't idle during any of this time.
But let me check when we reset the new app to foreground priority ...
Reporter | ||
Comment 32•12 years ago
|
||
This timeline starts at PContent:SendBrowserCtor. The "stuff" on the critical path in the b2g process before this diagram totals for about 90 samples, or ~110ms. (Will be a bit lower outside the profiler.)
This diagram shows the b2g process finishing its work up to TabParent::SendLoadURL() relatively quickly, ~30ms. It continues to chew a lot of CPU after its last entry on the diagram here though.
The app process is twiddling its thumbs when it finally gets TabChild::RecvBrowserCtor(); it starts processing that in the same ms it's sent.
But then TabChild gets backed up for a long time on TabChild::RecvShow() (BrowserElementChild initialization mainly). The LoadURL message is sitting in the content process's IPC queue for almost 200ms before the content process dequeues it.
So by my count, we have about 320ms of garbage work on the critical path to TabChild::RecvLoadURL(). At a high level, the wins will come from
- make BrowserElementParent/nsFrameLoader init faster
- make TabChild::RecvShow()/BrowserElementChild init faster
Those get more CPU cycles off the current critical path.
We can try to reshape the critical path by starting the SendBrowserCtor/SendLoadURL code much earlier in the b2g process. Preferably, before we do the BrowserElementParent init. Maybe we can rejigger gaia to help with this. If we could reorder the pipeline to look like
b2g | SendPBrowserCtor/SendLoadURL | | (expensive post-setup)
App | | .../RecvLoadURL | (load content)
Then we can kick off the load of app:// resources as early in startup as possible, and hopefully hide the IO latency there behind the CPU bound b2g process "(expensive post-setup)", which includes BEP init and starting the app-open animation.
Reporter | ||
Comment 33•12 years ago
|
||
Patch used to generate above diagram, along with |adb logcat -v time|.
Reporter | ||
Comment 34•12 years ago
|
||
I don't think it matters based on the timeline above, but FTR the preallocated process is running with background priority during its early startup work through and beyond RecvLoadURL(). I think jlebar fixed that, so I should probably rebase everything now ...
Reporter | ||
Comment 35•12 years ago
|
||
Indeed, we now load the app with fg priority, but as expected the stopwatch timings don't change.
Reporter | ||
Comment 36•12 years ago
|
||
The b2g process profile during startup didn't change at all
http://people.mozilla.com/~bgirard/cleopatra/#report=dfb87035451fbaf36d064d2957903b6de2b6ad07
but the Template app profile looks a lot different. There are now 370 samples in the profile vs. 260 samples before, however the stopwatch results are identical.
http://people.mozilla.com/~bgirard/cleopatra/#report=427594015c91a5fedfa918262af7c4767efa68c5
Now that the Template process runs with higher priority during startup, it appears that it's doing a "hurry up and wait" (as they say in the Army) to locations it blocks on the b2g process. So the end result is that the Template process gets more time slices to capture a profiler sample, but the critical path is still about the same.
Reporter | ||
Comment 37•12 years ago
|
||
Update: the patch from bug 835548 (along with numerous others applied) gets us pretty consistently under 800ms startup time. So we're closing in on 200ms win now.
Reporter | ||
Comment 38•12 years ago
|
||
It looks like we might not be drawing app-frame pixels to screen until 200-300ms after they're ready. However, I need to measure more carefully. That would be a nice win!
Depends on: 835809
Reporter | ||
Updated•12 years ago
|
Blocks: Settings-Startup
Reporter | ||
Comment 39•12 years ago
|
||
I went to clean up a branch and replicate results and have better news. I had left in a patch that did some logging on critical paths and measuring without improves results measurably.
We're now around 750ms for Template startup on otoro.
Rollup patch is included, which includes these commits
$ git log -11
commit b6d61fdecfb1fc39fe1907f05a1246a093c2f40c
Author: Chris Jones <jones.chris.g@gmail.com>
Date: Tue Jan 29 20:43:10 2013 -0800
bug 835641: only send updatedims when they change
commit 0ffc6c3ac15c7aafbb79e579e8f8549de9577a0e
Author: Chris Jones <jones.chris.g@gmail.com>
Date: Tue Jan 29 20:38:17 2013 -0800
bug 835591: remove cross-process ime
commit c9366b9cb1c021792856f0d2b763201da29906cc
Author: Chris Jones <jones.chris.g@gmail.com>
Date: Tue Jan 29 20:36:49 2013 -0800
EXPERIMENTAL bug 835681: remove sync from bec init
commit 815cf7a009bc6f83d91e9028a0c79dce72ab20c4
Author: Chris Jones <jones.chris.g@gmail.com>
Date: Tue Jan 29 20:35:34 2013 -0800
bug 835679: improve getting dpi and renderframe ctor
commit 5ec8ed109ee5a198d168ec21ce94b7b8987d377e
Author: Chris Jones <jones.chris.g@gmail.com>
Date: Tue Jan 29 20:25:54 2013 -0800
bug 835596: speed up getappbylocalid
commit 0438acdfdf9d704c5f999dbdf1abb75d2ea99568
Author: Chris Jones <jones.chris.g@gmail.com>
Date: Tue Jan 29 20:24:50 2013 -0800
bug 835584: precompile moar things
commit 1758a5c110e8008d75c728841b563bbb46ffebfc
Author: Chris Jones <jones.chris.g@gmail.com>
Date: Tue Jan 29 20:23:04 2013 -0800
bug 835548: make browserelement* init cheaper
commit ed1ab4af4e8899ec7a253e86e4b758394166b0b8
Author: Chris Jones <jones.chris.g@gmail.com>
Date: Tue Jan 29 20:21:16 2013 -0800
bug 835013: make appprotocolhandler not block on parent
commit 50daada643a5f3684d7aa968f4f17ba4aca3acb6
Author: Chris Jones <jones.chris.g@gmail.com>
Date: Tue Jan 29 20:01:51 2013 -0800
bug 834999: make manifest wrapping cheaper
commit 1869741e6d04dcab0c680f34d0eef4b44fddf877
Author: Chris Jones <jones.chris.g@gmail.com>
Date: Tue Jan 29 19:58:13 2013 -0800
bug 834622: better startup info from profiler
Based on
commit cedbc43dfa7602f37ba451d2bdb6c3536fea823b
Author: Doug Turner <dougt@dougt.org>
Date: Tue Jan 22 21:26:00 2013 -0800
Bug 833178 - Increase Stream Transport thread pool max threads. r=bsmith, a=overholt
Reporter | ||
Comment 40•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #38)
> It looks like we might not be drawing app-frame pixels to screen until
> 200-300ms after they're ready. However, I need to measure more carefully.
> That would be a nice win!
Better data is pointing this way, but to a slightly lesser extent.
01-30 08:34:14.407 I/Gecko (20198): STARTUP: child process blocking txn posted for 0
01-30 08:34:14.407 I/Gecko (20198): STARTUP: resolving child process tree 2
[ touchstart? icon -> blue highlight? ]
-- snip --
01-30 08:34:14.757 I/Gecko (20198): STARTUP: child process blocking txn posted for 4
[ first paint of new process ~350ms later; not sure what was painted though ]
01-30 08:34:14.757 I/Gecko (20198): STARTUP: resolving child process tree 2
01-30 08:34:14.777 I/Gecko (20198): STARTUP: resolving child process tree 2
01-30 08:34:14.787 I/Gecko (20198): STARTUP: child process blocking txn posted for 0
01-30 08:34:14.787 I/Gecko (20198): STARTUP: resolving child process tree 2
-- snip --
01-30 08:34:14.787 I/Gecko (20198): STARTUP: resolving child process tree 4
[ new process pixels in layer tree, ~480ms later; not sure if actually composited ]
-- snip --
01-30 08:34:14.968 I/Gecko (20198): STARTUP: child process blocking txn posted for 4
[ second paint of new process ~560ms later; also not sure what was painted ]
-- snip --
01-30 08:34:14.968 I/Gecko (20198): STARTUP: resolving child process tree 4
[ new process pixels in layer tree, ~560ms later; not sure if actually composited ]
-- snip --
01-30 08:34:15.008 I/Gecko (20198): STARTUP: child process blocking txn posted for 0
01-30 08:34:15.008 I/Gecko (20198): STARTUP: resolving child process tree 4
[ maybe first sample seen onscreen, allowing for reaction lag; ~600ms later ]
-- snip --
01-30 08:34:15.118 I/Gecko (20198): STARTUP: child process blocking txn posted for 0
01-30 08:34:15.118 I/Gecko (20198): STARTUP: resolving child process tree 4
[ first sample unequivocably seen onscreen; ~700ms later ]
The "unequivocably seen onscreen" sample was me eyeballing the phone while watching logcat.
What this data shows is that between the first composite after launching Template, and the /last/ repaint of Template before I'm sure it was onscreen, ~560ms elapsed. There was one paint of Template before that, so judging by bug 836198 the second paint was the first "real paint". So that means it takes us ~560ms from ~touchstart to compositor seeing first real Template pixels painted. Not great but getting there.
Then we have another 150ms before I'm 100% sure those Template pixels made it to screen. If I guess that's the right tick and add that to the almost real delay to paint, 560ms, and then add in a little fudge for the human reaction time to press "Stop" on a stopwatch, then we're around the same times I'm seeing in comment 39.
So my guess is that gaia + gecko is adding about 150ms overhead to putting the pixels we've already painted on startup to screen.
Reporter | ||
Comment 41•12 years ago
|
||
The highlighted blue area (if you can distinguish that in the horrific white balance) is the contested delay.
Comment 42•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #39)
> Created attachment 707993 [details] [diff] [review]
> Rollup of wins so far
>
> I went to clean up a branch and replicate results and have better news. I
> had left in a patch that did some logging on critical paths and measuring
> without improves results measurably.
With the patch applied the homescreen never ends loading with:
E/profiler( 352): Registering start signal
E/GeckoConsole( 352): [JavaScript Error: "NS_ERROR_FILE_NOT_FOUND: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIXPCComponents_Utils.import]" {file: "jar:file:///system/b2g/omni.ja!/components/BrowserElementParent.js" line: 14}]
E/GeckoConsole( 352): [JavaScript Warning: "Unknown property '-moz-align-self'. Declaration dropped." {file: "resource://gre-resources/ua.css" line: 44}]
Reporter | ||
Comment 43•12 years ago
|
||
Do you have a BrowserElementParent.jsm in your tree?
Reporter | ||
Comment 44•12 years ago
|
||
With bug 836605 knocked out, the only significant time in one spot on the critical path to PBrowser::SendLoadURL() in the parent process will be 10 samples or so in BrowserElementParentFactory.observe(). There'll of course still be a smattering of samples distributed about.
Reporter | ||
Comment 45•12 years ago
|
||
I've been wondering my stopwatch timings have been getting increasingly unreliable, and I just accidentally figured out why: if I load Template, put it into the background, and then tap its icon again (just putting it into foreground, not relaunching) I measured a 0.63s startup time. Right now we're just under 0.75s for cold start, so we're going to need better tools to measure firstpaint wins much below that.
Comment 46•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #45)
> we're going to need better tools
> to measure firstpaint wins much below that.
We also should soon stop using Template app as a reference "minimum overhead" app since it's startup characteristics are becoming less of a predictor of what happens with most apps.
For example, I was able to get Template startup below 300ms (another 200ms win over reference build) by just disabling setFrameBackground (https://github.com/mozilla-b2g/gaia/blob/master/apps/system/js/window_manager.js#L493), but didn't manage to reproduce any win on any app with actual scripts/links and content.
Reporter | ||
Comment 47•12 years ago
|
||
That definition of startup isn't what's being measured here.
That said, I don't necessarily disagree, but launching a Hello World app in 750ms is still not acceptable.
Reporter | ||
Comment 48•12 years ago
|
||
NBB!: for this patch to be meaningful, you need to have the fix for bug 836198 in your tree.
This resets a stopwatch every time gecko sees touchstart (as close to driver as we can see it), and then when a new layer subtree is seen, the elapsed time is reported.
This corroborates the analysis of comment 40 (but much more simply with bug 836198 gone!): Template's layer tree appears at the compositor quite noticeably before it comes onscreen. This is lag due to gaia.
The first-layer-tree-appears-in-compositor time for Template is right around 500ms. With appropriate gaia fixes, we can make that the real firstpaint time.
Comment 49•12 years ago
|
||
Would it make sense to use this number for apploadtime event (https://github.com/mozilla-b2g/gaia/blob/master/apps/system/js/window_manager.js#L1030) ?
Also, is firstPaint always with DOM painted? When I was testing using mozbrowserfirstpaint event reporter, I think I got the event fired with blank screen appearing and the DOM has been painted later. (same for any imgs within the DOM)
Comment 50•12 years ago
|
||
firstpaint is not always with DOM painted.
Because of the asynchronicity of painting, it's hard to (read, I don't know how to) fire a "first paint" event that guarantees that DOM was painted.
Reporter | ||
Comment 51•12 years ago
|
||
Oh, sorry, when I said "firstpaint" I meant, "first display of app pixels on screen", not mozbrowsefirstpaint. Sorry for confusing things.
(In reply to Zbigniew Braniecki [:gandalf] from comment #49)
> Would it make sense to use this number for apploadtime event
> (https://github.com/mozilla-b2g/gaia/blob/master/apps/system/js/
> window_manager.js#L1030) ?
The number reported by the above patch relies on measuring after a touchstart, so it's a little more specific than what code wants, I think. The patch is geared towards measuring launching off of user input.
> Also, is firstPaint always with DOM painted? When I was testing using
> mozbrowserfirstpaint event reporter, I think I got the event fired with
> blank screen appearing and the DOM has been painted later. (same for any
> imgs within the DOM)
The definition of "DOM painted" is not very precise, unfortunately :(. Which is why writing this kind of measuring code is hard.
Reporter | ||
Comment 52•12 years ago
|
||
(In reply to Zbigniew Braniecki [:gandalf] from comment #42)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #39)
> > Created attachment 707993 [details] [diff] [review]
> > Rollup of wins so far
> >
> > I went to clean up a branch and replicate results and have better news. I
> > had left in a patch that did some logging on critical paths and measuring
> > without improves results measurably.
>
> With the patch applied the homescreen never ends loading with:
Building on a different machine, I reproduce this now.
Comment 53•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #52)
> Building on a different machine, I reproduce this now.
https://bugzilla.mozilla.org/show_bug.cgi?id=835809#c10 - it may be this
Reporter | ||
Comment 54•12 years ago
|
||
Attachment #707993 -
Attachment is obsolete: true
Reporter | ||
Comment 55•12 years ago
|
||
It wasn't filling the full frame, so many (most?) of my stopwatch timings were actually counting when gaia replaced the screenshot with the white background! (The "Implement me" text got a layer sized to it, ~100x20 pixels.)
With this change, I measure startup at ~660-690ms on vanilla gecko-18, and ~540ms with all dependent patches applied. Getting there!
Reporter | ||
Comment 56•12 years ago
|
||
This measurement had a bug: it was counting the first time the subtree was *resolved*, but not the first time it had a leaf layer that was composited. I found out in bug 837881 that we sometimes get a few frames with just a container layer, no leaf layers. The gecko profiler wasn't catching these txns.
With this patch, we're counting startup more accurately. The reported times are within 100ms or so of what my stopwatch is measuring, and the appears-in-logcat to seen-with-eye lag is hard to notice.
Attachment #708941 -
Attachment is obsolete: true
Reporter | ||
Comment 57•12 years ago
|
||
(In reply to Zbigniew Braniecki [:gandalf] from comment #46)
> We also should soon stop using Template app as a reference "minimum
> overhead" app since it's startup characteristics are becoming less of a
> predictor of what happens with most apps.
> For example, I was able to get Template startup below 300ms (another 200ms
> win over reference build) by just disabling setFrameBackground
> (https://github.com/mozilla-b2g/gaia/blob/master/apps/system/js/
> window_manager.js#L493), but didn't manage to reproduce any win on any app
> with actual scripts/links and content.
FWIW I ran a (quick) experiment similar to bug 837881 comment 15 with the Dialer and saw very similar results to what I saw with Template. So it'd be good to understand what's different in our testing setups.
Depends on: 840538
Depends on: 840627
Depends on: 841353
Depends on: 841760
Depends on: 842627
Reporter | ||
Comment 58•12 years ago
|
||
Attachment #706602 -
Attachment is obsolete: true
Depends on: 847352
Depends on: 847354
Updated•11 years ago
|
Whiteboard: [FFOS_perf] → [FFOS_perf] c=
Updated•11 years ago
|
Whiteboard: [FFOS_perf] c= → [c=progress p= s= u=]
Updated•11 years ago
|
Priority: -- → P2
Comment 59•7 years ago
|
||
Closing this old B2G bug.
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•