Closed Bug 1072621 Opened 10 years ago Closed 10 years ago

[Cost Control] ~250ms regression in launch latency for 2.1 on 9/22

Categories

(Firefox OS Graveyard :: Gaia::Cost Control, defect)

Other
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:-, b2g-v2.1 affected, b2g-v2.2 affected)

RESOLVED DUPLICATE of bug 1093847
blocking-b2g -
Tracking Status
b2g-v2.1 --- affected
b2g-v2.2 --- affected

People

(Reporter: gmealer, Unassigned)

References

Details

(Keywords: perf, regression)

Attachments

(3 files)

[Blocking Requested - why for this release]:

This is affecting 2.1 performance. 

Mai, salva, can you guys help investigate why this is happening?  was there a regression from cost control that landed around 9/22 timeframe?
blocking-b2g: --- → 2.1?
Flags: needinfo?(salva)
Flags: needinfo?(mri)
triage: significant perf regression
blocking-b2g: 2.1? → 2.1+
The only patch that landed at those dates is a CSS refresh where another CSS file is being loaded. Geo, can we check if this behaviour has been seen in other apps. Maybe it's caused by other patches outside Cost Control... 

And another question, how is this time measured?

We'll continue looking for reasons.
Flags: needinfo?(salva) → needinfo?(gmealer)
(In reply to Salvador de la Puente González [:salva] from comment #3)
> The only patch that landed at those dates is a CSS refresh where another CSS
> file is being loaded. Geo, can we check if this behaviour has been seen in
> other apps. Maybe it's caused by other patches outside Cost Control... 
> 
> And another question, how is this time measured?
> 
> We'll continue looking for reasons.

Salvador, if you navigate to the Datazilla link provided, you can toggle checkboxes for other apps. While I have seen other apps have a slight rise around the 9/17-9/19 range, there's nothing this dramatic. The change there is pretty obvious, and seems almost certainly linked with that checkin (keep in mind we test each app on each tinderbox build, whether or not it has a checkin directly affecting it so the previous low result would have been immediately before the css merge). 

Possible the css is causing a performance issue when the window is created. In the past, think we've seen small changes in css sometimes prevent internal optimizations. Best bet is probably to profile that area.

Re: the test,

The timing is based on internal instrumentation, starting from triggering launch of the app and ending at the emitting of the moz-visually-complete event, done in costcontrol here:

https://github.com/mozilla-b2g/gaia/blob/a912833545f5f024f9e2f09030072d3f8b29e309/apps/costcontrol/js/common.js#L92

See the test code at https://github.com/mozilla-b2g/gaia/blob/a912833545f5f024f9e2f09030072d3f8b29e309/tests/performance/startup_events_test.js and further explanation at https://developer.mozilla.org/en-US/Apps/Build/Performance/Firefox_OS_app_responsiveness_guidelines#App_startup_stages_and_recommended_time_goals
Flags: needinfo?(gmealer)
(In reply to Geo Mealer [:geo] from comment #4)
> (In reply to Salvador de la Puente González [:salva] from comment #3)
> > The only patch that landed at those dates is a CSS refresh where another CSS
> > file is being loaded. Geo, can we check if this behaviour has been seen in
> > other apps. Maybe it's caused by other patches outside Cost Control... 
> > 
> > And another question, how is this time measured?
> > 
> > We'll continue looking for reasons.
> 
> Salvador, if you navigate to the Datazilla link provided, you can toggle
> checkboxes for other apps. While I have seen other apps have a slight rise
> around the 9/17-9/19 range, there's nothing this dramatic. The change there
> is pretty obvious, and seems almost certainly linked with that checkin (keep
> in mind we test each app on each tinderbox build, whether or not it has a
> checkin directly affecting it so the previous low result would have been
> immediately before the css merge). 

Yes we did it. I was asking in case we missed something. Thank you.

> 
> Possible the css is causing a performance issue when the window is created.
> In the past, think we've seen small changes in css sometimes prevent
> internal optimizations. Best bet is probably to profile that area.

Ok. I confirmed these changes were only applied to our application and include the use of flex layout which makes sense with the fact that other apps are not affected.

> 
> Re: the test,
> 
> The timing is based on internal instrumentation, starting from triggering
> launch of the app and ending at the emitting of the moz-visually-complete
> event, done in costcontrol here:
> 
> https://github.com/mozilla-b2g/gaia/blob/
> a912833545f5f024f9e2f09030072d3f8b29e309/apps/costcontrol/js/common.js#L92

Main changes from the suspicious commit are made to the First Time Experience where the event is triggered from. I would lower the priority cause this is a feature the user experiences only once, though it's true it's the first time.

> 
> See the test code at
> https://github.com/mozilla-b2g/gaia/blob/
> a912833545f5f024f9e2f09030072d3f8b29e309/tests/performance/
> startup_events_test.js and further explanation at
> https://developer.mozilla.org/en-US/Apps/Build/Performance/
> Firefox_OS_app_responsiveness_guidelines#App_startup_stages_and_recommended_t
> ime_goals

Thank you very much for the explanations.
Hi,
I've been testing with:
Environment:
  Phone     Flame (319MB)
  Gaia      86905e1
  Gecko     a38df34
  BuildID   20140926015055
  Version   34.0a2

STR:
  1. Flash the device
  2. Active "time to load" tool (Settings > Developer > Developer HUD > Show time to load). 
  3. Launching the App Usage and recollecting times:

     856
     836
     818
     842
     837
Avg: 837,4

 Repeat the same procedure reverting the patch of the bug 907700.

     847
     864
     834
     851
     790

Avg: 837,2

I don't see a big difference on the times.

Which gecko version is using for your tests?


Regards,
Mai
Flags: needinfo?(mri)
Salvador, the results don't support an "only FTE" conclusion on this. 

The way the test works, only the first repetition of each set of 30 is immediately after FTE. The balance are relaunched within the same execution session after hard-killing the app. 

I consider the first result to be anomalous because of the FTE confounding factor (we need to patch the test, to be honest). In order to get around that, we look at medians instead of averages, which has the effect of excluding that first outlying result.

Upshot is, your test results are generally slower, even for subsequent launches.

Marina,

The HUD value you're looking at only tests to first paint, and isn't considered authoritative. Lazy-loading the UI, for example, won't show up there. Our acceptance is to "visually complete" above the fold, using the instrumented events previously mentioned and linked to.

You can run this test locally using these instructions:

https://developer.mozilla.org/en-US/Firefox_OS/Platform/Automated_testing/Gaia_performance_tests

The test result in question is startup > moz-app-visually-complete.

The Gecko revision is listed in the "Test Replicates" pane in Datazilla, if you click on a test point. The tests are run against each combo of gaia/gecko that's pushed into CI testing, and generally will be "latest" of each. The Gecko revision listed for the particular build I flagged was 2c6e3261c47b.
M. Angeles, do you know which engineer can take this bug?

Thx.
Flags: needinfo?(oteo)
Hi,
I've been running the performance tests locally over 2.1 with the patch of the bug 907700 and reverting the patch. The table below shows the results:

Time current v2.1     | Time curret v2.1 reverting bug 907700
-------------------------------------------------------------------
  2467,909937         |               2317,2746558
  2423,7473014        |               2299,745833
  2426,648739         |               2319,6528848
--------------------------------------------------------------------
 Avg: 2439,435326     |         Avg : 2312,224458

The regression exists but in my tests is less than the reported, about 127 ms
Regards
Hi Marshall,

Not sure if this is relevant to Bug 1033549.
Would you mind to give us some comments?

Cheers,
Wesley
Flags: needinfo?(marshall)
The launch latency reported in this bug only affects to the FTE so it will only happen once per SIM so imho this bug is not so critical.

There is a more important regression in launch latency in the graphic loading introduced by the feature bug 1033549 and affects all the launches after the FTE (~500ms). 
It was reported in bug 1064491 and Salva suggested Marshall to separate in two queries the total data usage and the app statistics data in order to keep app and widget synchronized (https://bugzilla.mozilla.org/show_bug.cgi?id=1064491#c7). With that fix in bug 1064491 we could avoid that latency regression.
Flags: needinfo?(oteo)
[Blocking Requested - why for this release]:

Taking into account comment 13, I'm renominating for triage group to evaluate.

Since it's a problem that we have just once in the whole app, IMHO, we should not block, but let's get that solved from the triage meeting.

Thanks!
blocking-b2g: 2.1+ → 2.1?
Mai and I have dig deeper into the causes of this bug and discovered it's related with the flex layout added by the patch [1]. This patch adds the new flex-based layout from shared, it provides new interaction features for switches and it updates some visual styles.

There two options:
 1.- To add an specific CSS file providing the interaction enhancements and visual updates but without including the new flex layout.
 2.- Do nothing since the regression will happen only once in the First Time Usage and it can not be appreciated by the user.


The problem with option 1 is we need to maintain a special file with the changes included in shared while keeping the patch produces no loose of responsiveness and make future improvements in the Gecko flex engine will be automatically inherited.

[1] https://github.com/mozilla-b2g/gaia/commit/05f7d41f53339f9b007d6fec8b7dc81a87339561
I currently have a patch for Bug 1064491 that attempts to resolve the startup latency introduced by all the new app usage requests. That perf fix should be separate from this one, though..
Flags: needinfo?(marshall)
Wilfred will check with Qualcom if this is blocker for them
Flags: needinfo?(wmathanaraj)
Note that fixing bug 1078654 is likely to reduce start up time (per Marshall).
in which case lets focus on bug 1078654 and not block on this bug.
blocking-b2g: 2.1? → -
Flags: needinfo?(wmathanaraj)
(In reply to Wilfred Mathanaraj [:WDM] from comment #19)
> in which case lets focus on bug 1078654 and not block on this bug.

Though, we're not sure if it will make up the difference.
(In reply to Peter Dolanjski [:pdol] from comment #18)
> Note that fixing bug 1078654 is likely to reduce start up time (per
> Marshall).

I fear that is not correct. The performance regression is measured for the First Time Usage wizard and not for the regular scenario covered in bug 1078654. The regular scenario is enabled only when FTU is finished and so, when the measures are already taken.
Per comment 13, I'm not too concerned about this regression enough to block since it only affects the FTE case (with bug 1064491).
[Blocking Requested - why for this release]:

(In reply to Salvador de la Puente González [:salva] from comment #21)
> (In reply to Peter Dolanjski [:pdol] from comment #18)
> > Note that fixing bug 1078654 is likely to reduce start up time (per
> > Marshall).
> 
> I fear that is not correct. The performance regression is measured for the
> First Time Usage wizard and not for the regular scenario covered in bug
> 1078654. The regular scenario is enabled only when FTU is finished and so,
> when the measures are already taken.

I've corrected this in comment 9. Out of N repetitions within the test, only the first one is immediately after FTU. The remaining repetitions are after that first initializing run, whatever it did that took longer.

I'm attaching the screenshot of the replicates in the most recent Datazilla run. Note the first outlier at 3632 ms. That's the first run after FTU, which presumably does additional initialization. We eliminate that by looking at the median. 

After that run (and each other replicate) we hard-kill the app from memory and then relaunch. The rest of the launches are steady around 2700 ms, which is going to be representative of your typical cold launch performance to a user (and over the 1000 ms acceptance threshold) *not* counting FTU.

In comparison, 2.0 results from around the same time show steady state at 1800 ms.

I'm re-nominating, pending at least a comment on this analysis.
blocking-b2g: - → 2.0?
Screenshot of replicates from 2014-10-08 Datazila performance run
[Blocking Requested - why for this release]:
blocking-b2g: 2.0? → 2.1?
Attachment #8501862 - Attachment description: Screen Shot 2014-10-08 at 10.05.28 AM.png → Replicates for 2014-10-08 Datazilla 2.1
Realized I just crossed up my concerns above re: regression within 2.1 and acceptance for 2.1.

Re: regression, I still feel the analysis is flawed, though I understand the confusion better.

I've attached the replicates from the 9/22 build, which I originally flagged, and the 9/16 build prior to that.

In the 9/22 build, we had a clear hop up to 2700-2800 or so, without even having the first outlier there. That was the impetus for the bug. Since then, the results have come back down to ~2600 (I said 2700 above, sorry).

However, in the 9/16 build our median/steady is around 2200 ms. All the builds around that time show 2200-2300 ms median for runs not counting the first after FTU. Even the slight rise around 9/19 is ~2400 ms steady.

There's not much question to me from these results that there's been a regression since 9/16, and possibly another one since 9/19, and that it generally affects the performance.
:geo, sorry, just to clarify. After launching Usage, do you perform an automatic procedure to complete the FTU? FTU is a wizard so it should be completed manually. I'm not referring to the first time execution but to a user procedure, the FTU Wizard.
Flags: needinfo?(gmealer)
(In reply to Salvador de la Puente González [:salva] from comment #27)
> :geo, sorry, just to clarify. After launching Usage, do you perform an
> automatic procedure to complete the FTU? FTU is a wizard so it should be
> completed manually. I'm not referring to the first time execution but to a
> user procedure, the FTU Wizard.

Gaia is reset with the "NOFTU=1" flag prior to testing, which I believe skips the wizard entirely.

Would that somehow slow all of the runs after startup?
Flags: needinfo?(gmealer)
Oh, I think we have some name confusion here. Sorry.

This FTU is not related with the Gaia's FTU. It is the First Time Experience Wizard for Usage application. If you pass the tests starting from a freshly installed Gaia and you open Usage application, a Wizard appears. We dispatch performance events when finishing to load that wizard. If nobody is passing through the wizard steps until completion, it is unimportant if you hard kill the app because the next time you will be launching the same wizard again.

Did you get my point?
Flags: needinfo?(gmealer)
(In reply to Salvador de la Puente González [:salva] from comment #29)
> Oh, I think we have some name confusion here. Sorry.
> 
> This FTU is not related with the Gaia's FTU. It is the First Time Experience
> Wizard for Usage application. If you pass the tests starting from a freshly
> installed Gaia and you open Usage application, a Wizard appears. We dispatch
> performance events when finishing to load that wizard. If nobody is passing
> through the wizard steps until completion, it is unimportant if you hard
> kill the app because the next time you will be launching the same wizard
> again.
> 
> Did you get my point?

Yep, absolutely, and you're right--we have semantic confusion. My apologies about that as well.

So, OK, that's good to know. Let me see if I can stage a post-usage-FTU version of this test then, at least for a one-time run to see where we're at.

We load a reference "workload" (i.e., data fixture) to initialize other apps for testing. Is there any way we can add something to this workload that would fast-forward us past your FTU?
Flags: needinfo?(gmealer)
Let me think how to circumvent the FTU but we still need to add performance event dispatching in the normal UI flow as well.
Wilfred, 
Could you call out to partner if this is going to be blocking?
Flags: needinfo?(wmathanaraj)
triage: leave the nomination before confirming partner's expectation. (NI? to Wilfred)

Also, does bug 1079194 cover the scope of this already?
Flags: needinfo?(wmathanaraj) → needinfo?(rdandu)
Michael - can you tell me based on comments #29 and #30 if bug this would block a release?
Flags: needinfo?(mvines)
I differ to Moz triage on the v2.1 blocking decision here
Flags: needinfo?(mvines)
Lets fix this without blocking FC release. We'll nominate this for blocking CC release, can we plan for fixing it by then.
Flags: needinfo?(rdandu)
blocking-b2g: 2.1? → -
We are evaluating how to reduce the cold start with the data in bug 1093847 in order to low the time trying to reach the 1000ms threshold. So, solving that bug we will be solving this as well.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: