Closed Bug 914913 Opened 6 years ago Closed 6 years ago

Contacts App First Time Launch Latency

Categories

(Firefox OS Graveyard :: Gaia::Contacts, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:koi+, b2g-v1.2 fixed)

RESOLVED FIXED
1.2 C4(Nov8)
blocking-b2g koi+
Tracking Status
b2g-v1.2 --- fixed

People

(Reporter: skamat, Assigned: bkelly)

References

Details

(Keywords: perf, regression, Whiteboard: [c=progress p=5 s= u=1.2])

Attachments

(4 files, 3 obsolete files)

Based on partner expectations, First time launch latency for contacts app is 1500ms. The bug is to track the user story for 1.2 release. 1.2 should cause no regression from 1.1.
Blocks: 915068
Blocks: 915073
Whiteboard: PERF [UCID:Perf5, FT:Perf, KOI:P1] [c= u=1.2][oslo] → PERF [c=progress u=1.2][oslo]
Summary: [B2G] [Performance User Story][v1.2]: Contacts App First Time Launch Latency → Contacts App First Time Launch Latency
Whiteboard: PERF [c=progress u=1.2][oslo] → [c=progress u=1.2][oslo]
Depends on: 871823
David - you're engineering mgr for comms so starting with you to find a proper owner.
Assignee: nobody → david.scravaglieri
Contact launch latency is 2119 in 1.1 (lowest is 1928 and highest is 2300)
Contact launch latency is 3358 in 1.2.1 (lowest is 2575 and highest is 3358).

I tired to find cause of regression from gecko point of view :
I found following things happens during touch down to contact launch:

1) nsInputStreamPump::OnInputStreamReady () takes 565ms in 1.2.1
nsInputStreamPump::OnInputStreamReady () takes 459ms in 1.1
2) html5::RunFlushLoop() takes 609ms in 1.2.1
html5::RunFlushLoop() takes 334.6ms in 1.1
3) PresShell::FlushPendingNotifications() [1] takes 818ms in 1.2.1
PresShell::FlushPendingNotifications() takes 447ms in 1.2.1
4) PresShell::Paint()[2] takes 350ms in 1.2
PresShell::Paint() takes 194ms in 1.1

it seems to me that Contact is doing more layout work in 1.2 than 1.1 .
I copied whole gaia/app/communications folder from 1.1 to 1.2.1 . But I don't see any improvement in Contact app launch latency . I still see that Contact launch is taking >3000ms with stopwatch.

So I am assuming that regression is coming from gecko .
Or we may need to optimize gaia Contact app more to hide latency from gecko regression in 1.2.
Depends on: 865747, 769037
(In reply to Tapas Kumar Kundu from comment #2)
> Contact launch latency is 2119 in 1.1 (lowest is 1928 and highest is 2300)
> Contact launch latency is 3358 in 1.2.1 (lowest is 2575 and highest is 3358).

I'm sorry if this was stated somewhere and I missed it, but can you say:

a) What reference workload you are using?  Is it something like our gaia "make reference-workload-heavy"?  Or b2gpopulate?
b) What are you looking for when you stop the stopwatch?  Contact names to appear?  Images to appear?
c) What device are you running on?

Also, I'm not familiar with what 1.2.1 is.  Can you say what revision of gecko this?

Note, when working on contacts previously I ran into many gralloc lock failures which would introduce large amounts of jank.  Do you see anything like that in your logcat?
Also, can you say roughly how much time passes between subsequent launches of the same app?  I know you're probably not timing this, but is it around 5 seconds or less?  Does spacing out the app launches help?

Thanks!
Flags: needinfo?(tkundu)
Lets not optimize Gaia to cover Gecko defects. We need to detailed profile of the Gecko path. Can you guys try to capture a delta profile?
(In reply to Ben Kelly [:bkelly] from comment #5)
> (In reply to Tapas Kumar Kundu from comment #2)
> > Contact launch latency is 2119 in 1.1 (lowest is 1928 and highest is 2300)
> > Contact launch latency is 3358 in 1.2.1 (lowest is 2575 and highest is 3358).
> 
> I'm sorry if this was stated somewhere and I missed it, but can you say:
> 
> a) What reference workload you are using?  Is it something like our gaia
> "make reference-workload-heavy"?  Or b2gpopulate?

I used "UI Test" app to insert 500 FAKE contacts.

> b) What are you looking for when you stop the stopwatch?  Contact names to
> appear?  Images to appear?

I waited till I see 1st full page of contact appears on screen.

> c) What device are you running on?
>
I am using QRD 7x27a 512MB device. 
 
> Also, I'm not familiar with what 1.2.1 is.  Can you say what revision of
> gecko this?
Gecko Revision number: 97cedd4a0ee4afdf71e929dabbfc4cf1cce0daa8 . 
> 
> Note, when working on contacts previously I ran into many gralloc lock
> failures which would introduce large amounts of jank.  Do you see anything
> like that in your logcat?
I also saw same thing earlier. But now Gralloc is fine.
Flags: needinfo?(tkundu)
Tapas or Ben, we need a proper delta profile of 1.1 and 1.2 for the identical workload (the 1.1 contacts app). With that this should be easy.
(In reply to Andreas Gal :gal from comment #9)
> Tapas or Ben, we need a proper delta profile of 1.1 and 1.2 for the
> identical workload (the 1.1 contacts app). With that this should be easy.

I can upload |adb logcat| logs for delays from certain gecko functions with identical loads in both 1.1 and 1.2. 

Please suggest.
Flags: needinfo?(gal)
Have you used ./profile.sh before? It should give you a nice profile and its pretty easy to use.

https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler

If you don't mind giving it a try, that would be great.
Flags: needinfo?(gal)
(In reply to Andreas Gal :gal from comment #9)
> Tapas or Ben, we need a proper delta profile of 1.1 and 1.2 for the
> identical workload (the 1.1 contacts app). With that this should be easy.

I can do this later today.
Assignee: david.scravaglieri → bkelly
Status: NEW → ASSIGNED
As a data point, we actually do see a launch time regression in Eideticker for contacts of about 15%.  (I missed this yesterday because I was looking for an across the board regression, which does not appear to be the case in the Eideticker results.)

This is much smaller than the 50%+ regression reported in comment 2.  Some differences between the tests to investigate:

1) Eideticker was only using 200 contacts vs the 500 contacts here.
2) Eideticker uses marionette to launch the app which excludes the whole touch event and homescreen-to-system messaging paths.

In particular I'm wondering if we have a regression in our touch handling that might explain why we are seeing so many regressions in manual tests that don't show in our automated tests.

Note, I believe Eideticker was using the same criteria for the app being launched.  It determined that its "first stable frame" was when the first full page of contacts appeared.  Thanks to wlach's awesome framework we even have a video of the test here:

  http://people.mozilla.org/~wlachance/v12-contacts-load.webm
Before getting the profiles I wanted to see if I could at least see a similar regression to know if I was measuring the correct thing.

Since I don't trust my stopwatch finger, I added debug to gaia to display times at touchstart and in contacts when it has rendered above-the-fold.  Not exactly the same, but should capture most of the work being done.

Here are median values for different revisions:

b2g18 rev b459ac3ebe24:  2049ms
m-c   rev c4bcef90cef9:  2314ms, 13% regression  (This is the rev used by Tapas above.)
m-c   rev 4471e331bb4f:  2233ms,  9% regression  (This rev is close to when 1.2 moved to m-a.)

So I believe some improvements have already been made relative to the gecko revision reference in comment 8.

Of course, we still don't see anything close to the 50% regression being reported.  Tapas, it would be interesting to know if you see similar times using the debug statements as I have here.

If it shows only ~10% regression as we have here, then it would suggest the stopwatch test is catching an issue this measurement does not cover.  If the debug does show a ~50% regression, though, then we would know there is something more significantly different between our test configurations.

If you're willing to try this, you just need to add statements like:

  console.log("### ### homescreen touchstart: " + Date.now());
  console.log("### ### contacts above-the-fold: " + Date.now());

At these locations:

  https://github.com/mozilla-b2g/gaia/blob/master/apps/homescreen/js/dock.js#L43
  https://github.com/mozilla-b2g/gaia/blob/master/apps/communications/contacts/js/views/list.js#L526

And in v1-train:

  https://github.com/mozilla-b2g/gaia/blob/v1-train/apps/homescreen/js/dock.js#L43
  https://github.com/mozilla-b2g/gaia/blob/v1-train/apps/communications/contacts/js/contacts_list.js#L383

Note, this assumes you launch contacts from the "dock" at the bottom of the screen.

Meanwhile, I will next get those profiles.  I am going to do them with latest mozilla-b2g18 and mozilla-aurora.
Flags: needinfo?(tkundu)
Priority: -- → P1
Here is a profile of the contacts app running on mozilla-aurora at rev 0322470077b7 with gaia master.
Profile for launching contacts app on mozilla-b2g18 at rev b459ac3ebe24 on gaia v1-train.
In case it matters, I also see a lot of these in my logcat for both b2g18 and m-c:

E/memalloc( 5048): /dev/pmem: No more pmem available
E/msm7627a.gralloc( 5048): gralloc failed err=Out of memory
W/GraphicBufferAllocator( 5048): alloc(320, 385, 1, 00000133, ...) failed -12 (Out of memory)

All my testing is on a buri.
Sotari, any idea why we are seeing more gralloc failures?
That would be Sotaro of course. Sorry about that.
Note, I'm running with vendor gonk/firmware that was tagged US 9/12/2013.  If there is a better vendor image I should use for my buri, please let me know.
Here is another mozilla-aurora profile with MOZ_PROFILING=1 which I believe should turn on jld's native stack unwinding.
(In reply to Andreas Gal :gal from comment #18)
> Sotaro, any idea why we are seeing more gralloc failures?
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Andreas Gal :gal from comment #18)
> any idea why we are seeing more gralloc failures?

MozBuild ROM bans implicit fallback from pmem to system memory. It is banned in Bug 905784. From Bug 916626 Comment 4, buri/hamachi have only 8 M pmem limit, it is easy to exhaust pmem.
Flags: needinfo?(sotaro.ikeda.g)
And it is banned in MozBuild ROM because it's already being done in the partner ROM - and apparently has been all along.
Unfortunately its somewhat difficult to compare the profiles between b2g18 and m-c at the moment.  I have not been able to get gaia/master to run on b2g18, so these profiles include significant differences in the app as well as gecko changes.

Since nothing was jumping out at me there I decided to take a look at the gaia side of things anyway.

The main improvement I came up with was bug 918174.  The contacts app is currently lazy loading a number of CSS files prior to displaying the list.  Due to the style recalculations this causes we are better off simply loading the CSS from the index.html.

The patch in bug 918174 moved us from a 9% regression compared to b2g18 to a 6% improvement over b2g18.

I also noticed that the tag_visibility_monitor seems to trigger a lot of reflows since it looks at scrollTop on every DOM mutation.  I wrote bug 918179 for this.  I don't have a working patch for that yet, but I also don't think it should effect our results here either.  The tag_visibility_monitor is not activated by contacts until after the first page of contacts is displayed.
Depends on: 918174
Next steps for tomorrow:

 - Try copying the m-c contacts app alone over to run on b2g18 instead of full gaia/master
 - Try to perform some stopwatch tests myself to see if I can observe a larger regression
Mike, your team should talk about some of the things they discover and file bugs for and fix. I think the Gaia team would benefit a lot from hearing about some of this (to avoid it next time).
I should note that while bug 918174 improved our load times, it regressed the web activity case of opening a single contact.  (For example, from call log.)  I opened bug 918347 to track implementing a better long term solution.  I plan to stay focused on the main contacts app launch time issue for now.
Actually, bug 918174 has been reverted until we can address the web activity regression.  Sorry for the churn here.
Ok, re-landed bug 918174 after some more investigation.  Due to the way the web activity is currently implemented we actually did not regress there with this change.
Whiteboard: [c=progress u=1.2][oslo] → [c=progress u=1.2 p=5][oslo]
Whiteboard: [c=progress u=1.2 p=5][oslo] → [c=progress p=5 s= u=1.2]
Consolidating with bug 915073.  Here is the information from that bug:

Steps to reproduce:

Launch the Contacts App (QRD 7x27 device). Measurements are made using a high speed camera measuring the time from when the touch event was recognized to first app paint.


Actual results:

Measured time on (1.2) is:
3317 ms
Measured time on (1.1):
2119 ms
Keywords: regression
No longer blocks: 915073
Duplicate of this bug: 915073
I did some manual measurements today.  I used the very advanced setup of my iphone 4s and a stack of warhammer books.  I think the 30 fps / 33ms resolution of the camera should be adequate for the numbers we are talking here.

With v1.1 revs:

  Gecko mozilla-b2g hg rev:  119918:ddd56f6a924d
  Gaia v1-train git rev:     edd44ef32a8fcae6e937ee349a11e7f847456ed4

I got this video:

  https://www.dropbox.com/s/cp56w7xwvxfzupn/20130920_b2g18%20119918.mov

Which gives me a time of ~2.2 seconds.

With latest v1.2 revs:

  Gecko mozilla-aurora hg rev:  155449:0ff4dcd55258
  Gaia v1.2 git rev:            7b6147372cbf560744a02be50e0a862a825caef6

(Note, these include recent graphics fixes and my CSS improvement for contacts.)

I got this video:

  https://www.dropbox.com/s/1xc9ggyuctm0k21/20130920_m-a%20155449.mov

Which gives me a time of ~2 seconds.

This suggests to me that, while we are not meeting our goal of 1500ms, we are at least no longer regressing against the v1.1 times.
Attached file Updated 1.2 profile (obsolete) —
This is a profile of v1.2 revs:

  Gecko mozilla-aurora:  155449:0ff4dcd55258
  Gaia v1.2 branch:      7b6147372cbf560744a02be50e0a862a825caef6

To view only the app launch, zoom in on 9284 to 11419.
Attachment #806817 - Attachment is obsolete: true
Attachment #806850 - Attachment is obsolete: true
Attachment #807970 - Attachment mime type: text/plain → text/html
Here is a video using what I believe are the revisions from the QC report:

  Gecko mozilla-central hg rev:  145142:4ba8dda1ee31
  Gaia master branch git rev:    407fbfb6a9de68ec4db2f0f3dc6c67463e293f47

Video:

  https://www.dropbox.com/s/nry71426c8tlsc7/20130920_m-c%20145142.mov

Time measured at ~2.7 seconds.

While this does not exactly match the QC report numbers, I think it does show we have made considerable progress.
That profile shows that the majority of time we are spending doing reflows from the refresh driver. What CSS animations or DOM changes are happening at load time?
(In reply to Andreas Gal :gal from comment #36)
> That profile shows that the majority of time we are spending doing reflows
> from the refresh driver. What CSS animations or DOM changes are happening at
> load time?

In terms of animations, the only one I know of during load is the main system launch animation.  There is CSS translation that occurs during scrolling to keep the group header at the top of the page, but that only occurs from the scroll event.  I wonder if we could somehow be triggering that logic at load.

In terms of DOM, the app uses a mix of pre-defined structure in index.html and dynamically generated content.  I believe all of the main panels are defined up front in the index.html.  The list view is a <section> that has child <section> elements added for each group; "A", "B", "#", "Favorites", etc.  Each group then consistents of a header and the list of items.  These group structures are built at load time.

In terms of items being added to the list, however, the first page is added first and fully "rendered" with its structure.  Items off screen are then placed in the appropriate group list with DOM placeholders.  The placeholders later get rendered into full elements when the visibility_monitor indicates they are coming into view.

I think Kevin makes a good case that avoiding generating the overall DOM list structure would be better.  He has a purely cursor based solution prototyped over in bug 909935.  That particular solution requires modifications to the contacts API to work correctly.  We may be able to create a hybrid approach by filling an in-memory structure from the API and then using a "cursor" to render only a handful of DOM list items at any one time.

Kevin also has some ideas to restructure the app to be better lazy loaded.  See bug 907907.  As you can see in that bug, though, there are some concerns about how much refactoring of the app to do at this point in the 1.2 release cycle.

In addition to the largish index.html, many javascript files which are lazy loaded unfortunately still block the loading of the list.  See here:

  https://github.com/mozilla-b2g/gaia/blob/master/apps/communications/contacts/js/contacts.js#L645

In addition, while the CSS is separated into separate files I found that it is structured around functional concepts instead of panels.  This means to really lazy load it well we would need to do more extensive refactoring.  In the meantime, if we are going to block the list anyway we should definitely load CSS from the index.html to avoid recomputing styles.  (This is what I did in bug 918174.)  I wrote bug 918347 to track longer term CSS refactoring.

None of these are impossible problems to solve, but they require different amounts of refactoring in the app.  I think we're trying to determine the right balance here given the comms team's concerns with changing things this late in the release.  And I do understand those concerns since the new 1.2 features in contacts are still stabilizing and wholesale refactoring could make that job intractable.

One thought I had was to experiment with scoped styles.  Since we have all the panels and CSS defined up front it seems we could attach specific stylesheets to only the scopes where they are used.  While this doesn't save us any loading time, it should reduce the amount of time in the style engine running computations.  This would not be very intrusive and might get us a win both at load time and while otherwise using the app.  This is probably where I will investigate next.
(In reply to Ben Kelly [:bkelly] from comment #37)
> One thought I had was to experiment with scoped styles.  Since we have all
> the panels and CSS defined up front it seems we could attach specific
> stylesheets to only the scopes where they are used.  While this doesn't save
> us any loading time, it should reduce the amount of time in the style engine
> running computations.  This would not be very intrusive and might get us a
> win both at load time and while otherwise using the app.  This is probably
> where I will investigate next.

Actually, scratch this.

I'm going to see if I can write a customized version of the building blocks CSS first.  There are a fair number of descendent selectors in the shared CSS.  In particular, there are descendent selectors on tags like <p>, <li>, and <section> which are heavily used in the contacts DOM.  More targeted classes could help reduce our style computations.

To get a sense of the upper bound I tried commenting out the building blocks style.  This reduces load time by ~350ms.  Obviously we won't gain that much through optimization, but if we can save even a fraction of that it could still be worth it.
Hmm, we do use getBoundingClientRect() in a couple of places to try to compute the number of rows on the screen.  I thought that could be triggering an early reflow, but removing it does not seem to help.  The profile still shows that large reflow in the middle of the load.

I don't see scrollTop, offsetTop, etc being used before the first page is displayed either.  I may not have the full list of attributes/functions that trigger reflows in gecko, though.
Is the DOM mutated during load from an event handler? We must be constantly mutating the dom for this many reflows from the refresh timer. The performance of this app totally stinks. Can you please get help from the team owning it? This shouldn't be just on you.
(In reply to Andreas Gal :gal from comment #40)
> Is the DOM mutated during load from an event handler? We must be constantly
> mutating the dom for this many reflows from the refresh timer.

I do think there is opportunity here to remove DOM mutations, but the largest reflow seems to be from the initial html render.

Since there is a good chance I am misreading something or confused about the gecko pipeline, let me explain what I am seeing.  From attachment 807970 [details]:

T 9524  to  9829: initial static content parsing in html5::RunFlushLoop()
T 9829  to  9940: static content pauses while the non-defer'd l10n.js loads
T 9940  to 10080: RunFlushLoop() finishes processing static content
T 10080 to 10515: Initial style flush and layout occurs (~450ms!)
T 10515 to 10530: Painting
T 10530 to 10654: l10n startup and locale load
T 10670 to 10680: initContacts()
T 10680 to 10710: More painting
T 10720         : mozAfterPaint()
T 10720 to 11015: Continue contacts init including lazy loading files
T 11015 to 11080: Flush and paint caused by DOM modifications during contacts init
T 11080 to ...  : Loading contacts

I suspect the reflow at time 11015 could be caused by one or more of these:

1) contacts_shortcuts.js scrollbar code which uses templates to modify the DOM
2) list.js renderGroupHeader() code which renders the sub lists into the DOM
3) getBoundingClientRect() and window.innerWidth uses early in contacts init

Overall, though, it seems like that first style flush at T 10080 is taking the most time at almost half a second.  I don't see events or DOM modifications occurring during that time window, but perhaps I'm just missing it.  That's why it feels perhaps we need to dig into the CSS and static DOM a bit to optimize that first flush.

Unfortunately I won't be able to really work on this more until Monday morning.
(In reply to Andreas Gal :gal from comment #40)
> Can you please get help from the team owning it? This shouldn't be just on you.

Looping in Francisco and Jose.

I do feel personal responsibility here, though, as I have been looking at Contacts load performance for the last month or two.  However, I've been focused more on after initial page load times (bug 865747 and friends).

I've also perhaps focused too much on incremental improvements.  I think Kevin makes a convincing argument that we may need larger changes as he proposed in bug 907907 and bug 909935.  There has been an ongoing discussion about some patches in bug 907907 and when would be a good time to make structural changes.
Depends on: 907907
450ms for the initial reflow is nuts. Why is that so slow?
(In reply to Andreas Gal :gal from comment #43)
> 450ms for the initial reflow is nuts. Why is that so slow?

It's not that bad as the time needed for the initial search for a person. On my Unagi this takes about 15s! Sadly bug 865747 hasn't gotten that much traction. Not sure if it will, but I thought it's worth mentioning here, given that you state 450ms as too slow.
(In reply to Henrik Skupin (:whimboo) from comment #44)
> It's not that bad as the time needed for the initial search for a person. On
> my Unagi this takes about 15s! Sadly bug 865747 hasn't gotten that much
> traction. Not sure if it will, but I thought it's worth mentioning here,
> given that you state 450ms as too slow.

I do agree the total load time is also frustratingly slow.  I'd like to respond with an update, but I think its a bit off topic for this particular bug.  I ended up writing it over in bug 865747 comment 41.  I hope that should at least address concerns that total load time is being ignored, while also acknowledging we have not made the progress we would like.
I was finally able to get native stacks correctly enabled.  Here is another profile with the additional info.

See range 32700 to 35010 for the events leading up to first page of results.

At a course level, around 2/3 of the time of that initial flush is spent in:

  nsCSSFrameConstructor::CreateNeededFrames(nsIContent* aContent)

The other 1/3 of the time is in:

  PresShell::ProcessReflowCommands(bool)
Attachment #807970 - Attachment is obsolete: true
Adding class=hide to <section> elements for the offscreen panels in index.html seems to save us 200+ ms per load.  I think this argues that bug 907907 should land and will probably be a big win.
Here's a video of v1.2 gaia/gecko with the patches from bug 907907 applied:

  https://www.dropbox.com/s/g11ykbfu8ga66w5/20130923_lazy_html.mov

Using my poor man's video editing tools, this shows a 1.9s launch time.  So a significant improvement, but perhaps not as large as I hoped in comment 47.

Profile with these patches to come next.
Here's a profile with the patches from bug 907907 applied.  For the initial load see range 133660 to 135500.

At first glance it appears that the initial style flush is greatly reduced down to ~100ms.
Attachment #808726 - Attachment mime type: text/plain → text/HTML
Attachment #808634 - Attachment mime type: text/plain → text/HTML
Depends on: 919692
In bug 919692 I propose defining the scrollbar hit targets statically in index.html instead of dynamically generating them through a templating process.  This avoids a late reflow and doesn't cost us anything since the scrollbar hit targets are hardcoded anyway.

This video shows bug 919692 combined with bug 907907:

  https://www.dropbox.com/s/6x10ycv5chwv65q/20130923_static_scrollbar.mov

Approximate load time is 1.7 seconds.

I think there is another similar change we can make to avoid dynamically generating the group lists.  The resulting reflow seems much smaller, however.  I'll investigate that tomorrow.
Profile of bug 907907 and bug 919692 with pseudo stacks.  Load range is range 7750 to 9500.
Attachment #808766 - Attachment mime type: text/plain → text/html
FxOS Perf: Issues identified in bug 920713 may be impacting this bug but further investigation is needed to confirm.
Bug 865747 is really about total time to load the contacts list which mainly occurs after the first page of results are shown.  While this is load related, its different from this bug.  Therefore I am removing as a dependency.
No longer depends on: 865747
Depends on: 921683
Depends on: 921685
Depends on: 921802
Depends on: 921808
Depends on: 922677
It sounds like we're looking at different things, but I'll leave this here just in case: in bug 915083, I'm looking at an unexplained 1.1s lag between the touch event (RecvRealTouchEvent) and the b2g parent process doing the app.launch().
(In reply to Mike Habicher [:mikeh] from comment #54)
> It sounds like we're looking at different things, but I'll leave this here
> just in case: in bug 915083, I'm looking at an unexplained 1.1s lag between
> the touch event (RecvRealTouchEvent) and the b2g parent process doing the
> app.launch().

Mike, are you launching via marionette, b2gperf, etc?  I believe we saw ~1 second delay in eideticker that I don't see manually launching the app by tapping the screen.  See bug 920097.  Not sure if those are related.

Otherwise I have not seen a reliable 1 second delay while working this bug.  I previously had a bunch of debug to check that, but it always showed minimal (~10ms) delay, so I took it out.
Ben, I'm running a profiler-enabled build manually, tapping on the Camera icon on the Homescreen.

If you look at the following profile:
  http://people.mozilla.org/~bgirard/cleopatra/#report=6deb68a49898a0ca7062b6c516f06cf6587797b0

...you can see the last RecvRealTouchEvent in the Homescreen process at relative timestamp 59751ms, and the app.launch() in the b2g process/Webapps.jsm doesn't happen until ~60904ms; the Camera app launches shortly afterwards.

During this (mostly) idle time, Homescreen, b2g, and Preallocated/Camera are blocked in epoll_wait().
With the proposed pull request in bug 921683 we're moving into the range of the target 1500ms:

  https://www.dropbox.com/s/r76uhsg39z6x8jy/20131004_dynamic_group.mov

That movie shows ~1.5 seconds, although I am not confident that this test method is super accurate.  It seems +/- 100ms is reasonable.  We should retest with a more accurate camera once this lands and gets uplifted to v1.2.

Note, that movie was with 1.3 m-c and gaia/master on a buri with reference-workload-medium.
Blocks: 913775
Depends on: 923871
Depends on: 923873
Another movie with today's v1.2 gaia and mozilla-aurora.  I believe this has all our change to this point uplifted.

  https://www.dropbox.com/s/3tcju4sl9zelcfd/20131008_v12uplift.m4v

Time is again around 1.5 seconds.
Dave, I understand you have the only working eideticker camera at the moment.  If you have any time, could you run some app launch tests with contacts using the latest gaia v1.2 branch and mozilla-aurora?  Also, if this could include Will's improvements to the launch process using orangutang, that would be great.

The workload should be 500 contacted.  The "make reference-workload-medium" or b2gpopulate methods should both be fine.

I'd just like to have a more accurate measurement of where we are in order to prioritize further effort on this bug.

Thanks!
Flags: needinfo?(dave.hunt)
I spoke with Dave on irc and it sounds like we're a bit backed up with work on eideticker due to some of the equipment breaking.  Also, Dave points out he only has an inari running on eideticker which may not be the best comparison for our partner tests.  I believe we prefer hamachi/buri.

Vikram, would it be possible to get a run on your test rig of the contacts app launch using a gaia v1.2 rev at or newer than:

  https://github.com/mozilla-b2g/gaia/commit/e9533a66b441b902a77966da87144f89e1db808e
Flags: needinfo?(mvikram)
We're profiling a build from 10/8 night.
Flags: needinfo?(mvikram)
Target Milestone: 1.2 FC (16sep) → 1.2 C3(Oct25)
Depends on: 926752
No longer depends on: 926752
I now have a hamachi, do you still need some eideticker results? If so, I can try to get them by the end of the week.
Flags: needinfo?(dave.hunt)
Dave, we just got some updated results from our partner, so I think we are good for now.  Thanks!

From those results it appears we're at 1615ms.  Its not clear yet which rev this was tested with, though.  Asking for clarification.
Ok, the 1615ms time is as of gaia rev:

  313599c293f596519604070fa378ffc89e61e98f

Checking the log it looks like this tests up to around Oct 7 on v1.2 branch and does not yet account for these fixes:

  bug 921683
  bug 921808
  bug 923871
  bug 923873
Actually, I think some eideticker tests would be helpful to know how much those last 4 bugs will really help us.  Since its late for Dave's timezone now I asked Will to do some with his new camera.
I have a patch in for review on bug 921685 that removes another small sync reflow from launch.  Also, bug 924032 landed which should help.

To check where we are I ran a couple runs on v1.2:

  Gecko mozilla-aurora:  155912:4253a66382fa
  Gaia v1.2:             a34f1a7903715 + patch from bug 921685

I made two movies:

  https://www.dropbox.com/s/mzt7xtejsssf6x0/20131025_contacts_1_2_cookie.mov
  https://www.dropbox.com/s/h0fuu8mxnfzs756/20131025_contacts_1_2_cookie_2.mov

Movie 1 shows us at ~1.4 seconds and movie 2 shows us at ~1.5 seconds.  I believe this shows we are either at our goal or slightly faster.

Based on this I intend to mark this resolved once bug 921685 lands.
Depends on: 924032
Based on measurements made on the following build, the launch time is now on par with 1.1.

Gaia version:
845801e0c74badf0b96657a864935ef1a983cf47
Gecko version:
bbb4c0a8fa65cf1546a6cedc4c20fea16cd63ef2

marking this as not blocking for 1.2.
No longer blocks: 916996
Bug 921685 has landed.  I retested on the latest code and got values similar to comment 66.  Combined with Vikram's comment in comment 67, I think its safe to mark this one fixed.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Setting Target Milestone for this sprint's koi+ fixes.
Target Milestone: 1.2 C3(Oct25) → 1.2 C4(Nov8)
You need to log in before you can comment on or make changes to this bug.