SMS's startup time is over the 1000 ms acceptance threshold for 2.1

RESOLVED WONTFIX

Status

defect
P2
normal
RESOLVED WONTFIX
5 years ago
4 years ago

People

(Reporter: gmealer, Unassigned)

Tracking

({perf})

Dependency tree / graph

Firefox Tracking Flags

(tracking-b2g:backlog)

Details

(Whiteboard: [priority])

[Blocking Requested - why for this release]:

See test results at:

https://wiki.mozilla.org/B2G/QA/2014-10-02_Performance_Acceptance#SMS

Median startup time was 1674 ms for 480 launches. This is an improvement from 2.0's median startup time of 1751 ms for 480 launches.

Graphs, raw data and other testing details are on the wiki.

When examining the graph, note that it's bimodal. This is unusual for a launch time test. That means some operation might be happening during startup only sometimes, and slowing performance when it does.

Seems unlikely this'll hit 1000 ms for release, but noming for an explicit decision and guidance on a more realistic acceptance threshold if appropriate.
triage: it's actually improved compared with 2.0. Practically it's not possible to meet 1000ms in one week. Put into priority and we should plan in next release.
blocking-b2g: 2.1? → backlog
Whiteboard: [priority]
(In reply to Geo Mealer [:geo] from comment #0)
> [Blocking Requested - why for this release]:
> 
> See test results at:
> 
> https://wiki.mozilla.org/B2G/QA/2014-10-02_Performance_Acceptance#SMS
> 
> Median startup time was 1674 ms for 480 launches. This is an improvement
> from 2.0's median startup time of 1751 ms for 480 launches.
> 
> Graphs, raw data and other testing details are on the wiki.
> 
> When examining the graph, note that it's bimodal. This is unusual for a
> launch time test. That means some operation might be happening during
> startup only sometimes, and slowing performance when it does.

The explanation is quite simple:
When we've displayed the first panel (I think our threshold is set to 9 threads, I can check if necessary), then we kick off additional lazy loading.

Maybe we should move the marker before the call to firstViewDone in [1] to get more consistent results. "firstViewDone" is where we do the lazy loading.

But see bug 1079700 for the real needed fix.

[1] https://github.com/mozilla-b2g/gaia/blob/138032603c07df82bb80ad7e113561c97f2aff6d/apps/sms/js/thread_list_ui.js#L585

> 
> Seems unlikely this'll hit 1000 ms for release, but noming for an explicit
> decision and guidance on a more realistic acceptance threshold if
> appropriate.
Depends on: 1079700
Note that bug 1079700 likely means we should look at the lower peak only. The higher peak is an artifact of the testing. We're still not at 1000ms :) but that's better.
I'm duping to bug 1074783. I have some work in a dependent bug that can make a difference in v2.1, but let's track this in one place only.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1074783
As of 10-31, median launch time is 1206 ms, which is over guidelines and a regression from 2.0 (now that the timing fix for 2.0 seems to have landed correctly).

https://wiki.mozilla.org/B2G/QA/2014-10-31_Performance_Acceptance#SMS

Per Julien's comment in comment 4, I appreciate the difficulty of tracking the work across multiple bugs and think it's fine to track it there. 

However, this bug represents QA's acceptance of the release based on internal guidelines, and shouldn't be closed out unless we either decide to suspend or change the acceptance threshold for 2.1, or get the numbers under it. 

My suggestion would be to spin the actual work out into implementation bugs, and make both partner acceptance bugs and internal acceptance bugs dependent on that actual work, but in any case please leave this one open.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
see bug 1089920, bug 1087329; there are some ideas to make it work better, I even have some things that are near ready to land (ex: bug 1086529, bug 1087981), but they don't fix anything that could appear in v2.1 because that's code from a long time ago.

My main culprit is bug 1089920, and Wilson has a patch in bug 1074783 that I'll try this week as well.
Julien, I'd like to assign an updated target time to https://wiki.mozilla.org/index.php?title=FirefoxOS/Performance/Release_Acceptance and use it as the pass/fail goal going forward.

We saw 1206 ms on 10/31 and I'll be reporting 1238 ms for 11/07. Other times have been giving us a target, with the understanding that 100 ms is the margin for a regression bug.

With that in mind, what should we align on here? 1225 ms?
Flags: needinfo?(felash)
Correcting myself above, "other teams have been giving us"
I think 1000ms should still be our target, but I don't think we'll achieve this in v2.1. I have some work on master currently that I want to possibly uplift.
Flags: needinfo?(felash)
(In reply to Julien Wajsberg [:julienw] from comment #9)
> I think 1000ms should still be our target, but I don't think we'll achieve
> this in v2.1.

Right. What we want to do is identify the point we can achieve for 2.1 so QA has something achievable to sign off against. We'll revise the target for 2.2, and if 1000 ms sounds achievable there I think that'd be perfect.

So with that in mind, what's a good temporary target? My suggestion amounted to holding the line, though I admit I shot a little under the current numbers since you said there might be more optimization.
Flags: needinfo?(felash)
It all depends on the outcomes of setting such target.

If it's an informative target and the only consequence is saying "SMS failed and is above the target", then I don't really care. We can stay at 1000ms because this is the overall goal I want to keep.

Yet if staying at 1000ms means we need to make this bug a blocker, then let choose an higher target.

We can't regress from v2.0 so if you need a hard limit I'd use the result from v2.0. But then we'll never improve :)
Flags: needinfo?(felash)
(Let me know if the ni's bug you. A lot of people (myself included) find responses easier with them.)

(In reply to Julien Wajsberg [:julienw] from comment #11)

> Yet if staying at 1000ms means we need to make this bug a blocker, then let
> choose an higher target.

Yeah, it's this. We want a target by which we can draw a line in the sand for 2.1 and say "ok, this and no more, or we don't ship." Of course, IRL, it really means at ship time we make a call, so being within a margin of the target is probably OK. But the idea is to represent what we can realistically enforce for 2.1.

This is no way replaces the 1000 ms overall perception of progress target, it just acknowledges that we'll bump most further improvement to 2.2.

Put succinctly, we want all apps with tests and criteria to be within criteria by the time we ship. Loosening the criteria with considered decisions acknowledging context is one route to that. Much better to acknowledge that up front than stick to unrealistic criteria and therefore have no checks at all.

> 
> We can't regress from v2.0 so if you need a hard limit I'd use the result
> from v2.0. But then we'll never improve :)

Yeah, but unless we're going to tighten the time up by 125 ms or so, we're going to regress from 2.0 (currently tests at ~1100 ms). Comes back to whether it's realistic that we'll improve that much in 2.1 scope.

In future releases I suspect we're going to more strictly enforce the "no regression" criterion, as well as try to fold in more external expectations (android parity, etc.) into our internal numbers. But that's something that really has to be decided and enforced from the beginning to be effective.

Anyway, getting back to the specific request, it would be great to get a number that we know is practically achievable in the 2.1 timeframe that holds a line. It will be used in my acceptance reports with the over/under/indeterminate decision based on a 25 ms margin, and if we stray from it by 100 ms or more, there will definitely be a blocking regression bug filed. If you feel that margin for a blocker should be lower, we can do that too--I'm basing the 100 ms on what other teams have requested.

We've sought this sort of info from all the apps not naturally under 1000 ms, and I've specifically been asked to get it for SMS and update the release criteria. Let me know if I should approach this in a different forum to do that.
Flags: needinfo?(felash)
(In reply to Geo Mealer [:geo] from comment #12)
> (Let me know if the ni's bug you. A lot of people (myself included) find
> responses easier with them.)

I don't mind :) I usually answer all my bugmail but if you want to be sure that I answer, a NI is still the best ;)

> 
> (In reply to Julien Wajsberg [:julienw] from comment #11)
> 
> > 
> > We can't regress from v2.0 so if you need a hard limit I'd use the result
> > from v2.0. But then we'll never improve :)
> 
> Yeah, but unless we're going to tighten the time up by 125 ms or so, we're
> going to regress from 2.0 (currently tests at ~1100 ms). Comes back to
> whether it's realistic that we'll improve that much in 2.1 scope.

I still hope to fix the regressions we had from v2.0. I think I'll know more after today.

I'll give you an answer tomorrow, is it good enough?

Also in bug 1089145 I'll slightly move where the 'moz-app-visually-complete' marker is located. The main reason is because it's closer to what happens in reality (for example, without moving it, we wouldn't see the improvements this patch gives). But as a side effect it will also likely mechanically improve the launch time. To make the comparison to v2.0 fair, maybe we should do the same move in v2.0?
Geo, do you know why there is no data in datazilla anymore?
Eli gave me some answer but the result is that I don't really know how much my patches are helping here.

Let's use 1200ms as a target then, and if we're better than that, then good.
Flags: needinfo?(felash)
No idea re: Datazilla (that may have been what Eli answered?). OK, I'll put it down at 1200 ms, and let's see how it goes.
Per https://wiki.mozilla.org/B2G/QA/2014-11-14_Performance_Acceptance#SMS, we're still over the revised target at 1254 ms. 

I do know the two patches that didn't get uplifted in bug 1089145 and bug 1086529 probably would have put us within margin.

We're approaching CC, so are starting to make acceptance calls. NI'ing Julien for recommendations on further action, cc'ing Tony.
Flags: needinfo?(felash)
Patches in bug 1089145 and bug 1086529 are IMO safe to uplift. It's a pity approval was rejected :( TBH I counted on these when I said 1200ms. If we can't uplift anything then obviously we won't improve anything.

See also the video in http://www.youtube.com/watch?v=XnEQy2ajSvI that shows the patch I made for bug 1089920. This is IMO quite impressive. That patch is more scary but I took great care to make the component behave like it used to when we don't use the new attributes. So given the improvements I'd really want to uplift it too.
Flags: needinfo?(felash)
blocking-b2g: backlog → ---
Priority: -- → P2

Comment 19

4 years ago
Main stream is moving to 2.5 on Flame and Aries (Sony Z3C). FxOS 2.1 is low priority now. Mark as wontfix. Please reopen if any special request.
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.