1002847 - Messages cold launch time regressed since 1.3, above 1000ms acceptance threshold

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Reporter

Description

•

11 years ago

Per http://mzl.la/1jaLWxI, Messages is currently at/above 1200ms for its load time. Our acceptance metric for performance is either no regression from last release, or within the responsiveness guidelines, whichever is greater. This is a regression from 1.3's behavior (http://mzl.la/1jaMd3I) at 1000ms, which is right at the 1000ms responsiveness guideline for cold launch.(https://wiki.mozilla.org/FirefoxOS/Performance/UserStories, Perception of Progress). Cold launch of Messages should be brought within 1000ms. Requesting 1.4+ due to importance of the app.

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Reporter

Updated

•

11 years ago

blocking-b2g: --- → 1.4?

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Reporter

Updated

•

11 years ago

Keywords: regression

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 1

•

11 years ago

It's been done on purpose in bug 947234 because it made the global load time (time to show the first thread) better. It's easy to see by comparing 2 devices side by side, the difference is quite obvious. Before bug 947234 we had a low cold load time but we still loaded all files after "onload" before doing anything. This was stupid and thus removed this. In the future we want to introduce lazy loading again (after bug 881469 lands) but in a smarter way. I'd like to resolve this WONTFIX for 1.4. We may able to make this better in 2.0 if bug 881469 lands in a timely fashion, but I'd see this more in 2.1.

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Reporter

Comment 2

•

11 years ago

In Performance's discussion on acceptance yesterday, Messages was considered a very high priority app. I'd prefer we at least look into this in the 1.4 time frame to see if there are other ways to cut the load time, rather than assuming that the referenced bugs are the only root cause or possible solutions.

bhavana bajaj [:bajaj]

Comment 3

•

11 years ago

Waiting on Julien, to get back before removing it form the nom queue..

Flags: needinfo?

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 4

•

11 years ago

If possible, I'd like first a side by side video comparison of v1.3 vs v1.4 on the same device and same workload to prove there is a regression. What matters to me is the "time to display the first thread".

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Updated

•

11 years ago

Flags: needinfo?

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Reporter

Comment 5

•

11 years ago

(In reply to Julien Wajsberg [:julienw] from comment #4) > If possible, I'd like first a side by side video comparison of v1.3 vs v1.4 > on the same device and same workload to prove there is a regression. What > matters to me is the "time to display the first thread". Julien, I can do this but it's a little expensive compared to just looking at the automated results I posted. Are you concerned that they're inaccurate? I'd like us to be able to rely on them in the future. Also worth noting that in bug 947234 comment 1 and elsewhere, you noted 1000ms as an appropriate 1.4 requirement. Bug 881469 also looks slated for a 1.4 milestone. What's changed there to push reaching the goal to possibly 2.1?

Flags: needinfo?(felash)

Ben Kelly [:bkelly, not reviewing]

Comment 6

•

11 years ago

(In reply to Geo Mealer [:geo] from comment #5) > (In reply to Julien Wajsberg [:julienw] from comment #4) > > If possible, I'd like first a side by side video comparison of v1.3 vs v1.4 > > on the same device and same workload to prove there is a regression. What > > matters to me is the "time to display the first thread". > > Julien, I can do this but it's a little expensive compared to just looking > at the automated results I posted. Are you concerned that they're > inaccurate? I'd like us to be able to rely on them in the future. Yes, the datazilla tests do not show the whole picture. The only show static load time, but not any of the dynamic work that must occur before presenting a usable interface to the user. It would be more informative to look at eideticker or other visual "time to above the fold" tests for true load time for the user.

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 7

•

11 years ago

(In reply to Geo Mealer [:geo] from comment #5) > (In reply to Julien Wajsberg [:julienw] from comment #4) > > If possible, I'd like first a side by side video comparison of v1.3 vs v1.4 > > on the same device and same workload to prove there is a regression. What > > matters to me is the "time to display the first thread". > > Julien, I can do this but it's a little expensive compared to just looking > at the automated results I posted. Are you concerned that they're > inaccurate? I'd like us to be able to rely on them in the future. I think they don't show what's needed in the real life. In the future, we need to rely on the performance framework instead, that's sadly not precise enough until bug 846909 is resolved (which should happen soon now !). The cold load time measures the time from launch to the "onload" event. The "onload" event fires when all resources are loaded (scripts, css, images). What we did in 1.3 is waiting for the "onload" event, and only then requesting the needed scripts. Obviously, this means we were waiting some time before requesting the scripts that made the app's behavior ! Moreover this scripts were not minified/concatenated, whereas they are now ! All this leads to a bigger onload time, but a smaller "time to first thread" in v1.4. > > Also worth noting that in bug 947234 comment 1 and elsewhere, you noted > 1000ms as an appropriate 1.4 requirement. Bug 881469 also looks slated for a > 1.4 milestone. > > What's changed there to push reaching the goal to possibly 2.1? I don't make the priority of bugs... Other bugs had higher priority and the performance improvements usually come last, when we're already late in the process. I agree this is not ideal :/ The good thing is that bug 881469 is close to a first-review state now, that's why I was saying it's possible that it lands in 2.0 already, and that would enable us to add useful and reliable lazy loading. We could do good lazy loading now, but that would be more hacky and in my opinion error-prone. Still, we can make a profile and see if there are some easy wins.

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Updated

•

11 years ago

Flags: needinfo?(felash)

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Reporter

Comment 8

•

11 years ago

(In reply to Ben Kelly [:bkelly] from comment #6) > (In reply to Geo Mealer [:geo] from comment #5) > > (In reply to Julien Wajsberg [:julienw] from comment #4) > > > If possible, I'd like first a side by side video comparison of v1.3 vs v1.4 > > > on the same device and same workload to prove there is a regression. What > > > matters to me is the "time to display the first thread". > > > > Julien, I can do this but it's a little expensive compared to just looking > > at the automated results I posted. Are you concerned that they're > > inaccurate? I'd like us to be able to rely on them in the future. > > Yes, the datazilla tests do not show the whole picture. The only show > static load time, but not any of the dynamic work that must occur before > presenting a usable interface to the user. It would be more informative to > look at eideticker or other visual "time to above the fold" tests for true > load time for the user. Yeah, but I don't want to muddy the issue. For acceptance purposes, we decided to go with first paint/first chrome, whatever's being shown in Datazilla. We're not chasing above the fold as acceptance until 2.0.

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Reporter

Comment 9

•

11 years ago

But OK, reading Julien's comment, I'll do the video and report back.

Ben Kelly [:bkelly, not reviewing]

Comment 10

•

11 years ago

(In reply to Geo Mealer [:geo] from comment #8) > Yeah, but I don't want to muddy the issue. For acceptance purposes, we > decided to go with first paint/first chrome, whatever's being shown in > Datazilla. We're not chasing above the fold as acceptance until 2.0. Well, that will lead developers to worsen the user experience in order to optimize a bad metric. If a resource is always going to be used it will be better for total load time to do it as a static resource. That will make the datazilla cold_load time worse, however.

:Eli Perelman

Comment 11

•

11 years ago

For what it's worth, on Bug 996038 there is currently an effort under way to change at which point some of these metrics determine the concept of loaded [1]. The goal will be to use new events triggered by the applications to use as indicators to the profiling platforms for garnering a better measurement of startup time based on when actual usability points are available.

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 12

•

11 years ago

(In reply to Eli Perelman, :Eli from comment #11) > For what it's worth, on Bug 996038 there is currently an effort under way to > change at which point some of these metrics determine the concept of loaded > [1]. The goal will be to use new events triggered by the applications to use > as indicators to the profiling platforms for garnering a better measurement > of startup time based on when actual usability points are available. how is this different from the existing performance framework?

:Eli Perelman

Comment 13

•

11 years ago

Apologies Julien, but could you clarify what you mean by your statement? I'm not sure I follow what you mean. :)

Gabriele Svelto [:gsvelto]

Comment 14

•

11 years ago

(In reply to Ben Kelly [:bkelly] from comment #10) > Well, that will lead developers to worsen the user experience in order to > optimize a bad metric. Which is precisely what happened to the SMS app before bug 947234: the cold load time was looking good but that actually harmed both the time spent before showing the first thread and the first time the app would respond to user input. There's another thing that I'd like to stress here, not only the SMS app actually starts much faster than it used to do but it also scales much better. An x-heavy workload required 10+ seconds before the app would show the first thread and become responsive before 1.3 while now it's around 1.5s

Preeti Raghunath(:Preeti)

Comment 15

•

11 years ago

Mike, Can you please weigh in for perf? What is the acceptable range?

Flags: needinfo?(mlee)

:Eli Perelman

Comment 16

•

11 years ago

The guidelines that we use for perf are outlined here: https://developer.mozilla.org/en-US/Apps/Build/Performance/Firefox_OS_app_responsiveness_guidelines#Goals The app must be ready for user interaction by 1.25s of user selecting to start the application. If this is not possible, then we have to indicate to the user that it is not ready, e.g. through the use of loading interactions, or some other UI perception trick to give the user the ability to interact with a smaller subset and load remaining interactions.

Flags: needinfo?(mlee)

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Reporter

Comment 17

•

11 years ago

(In reply to Eli Perelman, :Eli from comment #16) > The guidelines that we use for perf are outlined here: > > https://developer.mozilla.org/en-US/Apps/Build/Performance/ > Firefox_OS_app_responsiveness_guidelines#Goals > > The app must be ready for user interaction by 1.25s of user selecting to > start the application. If this is not possible, then we have to indicate to > the user that it is not ready, e.g. through the use of loading interactions, > or some other UI perception trick to give the user the ability to interact > with a smaller subset and load remaining interactions. Worth mentioning that until Bug 996038 lands, there isn't a very good way to measure user/interaction-ready. I could see chasing it by trying to spam key or click events into the app and looking for when they reflect, but I think just having the app tell us is probably far better. So upshot is, yeah, 1250 ms for user-ready, but we're not prepared to measure that so I wouldn't throw it out as the acceptance stat. To that point, the assumption was that Datazilla represented something close to first-chrome, which is what we were pegging at 1000 ms. I understand we lazy-load content, but that isn't the concern here. How much chrome is loaded and showing to the user by the time the onload event fires? As for the accuracy: I'm certainly aware that the phases/instrumentation will give us better and more accurate results, but if what Ben and Julien are saying is that the current results in Datazilla are both wrong and misleading, there's a bigger conversation here. We shouldn't stand them up, if we can't effectively use them.

:Eli Perelman

Comment 18

•

11 years ago

(In reply to Geo Mealer [:geo] from comment #17) > As for the accuracy: > > I'm certainly aware that the phases/instrumentation will give us better and > more accurate results, but if what Ben and Julien are saying is that the > current results in Datazilla are both wrong and misleading, there's a bigger > conversation here. We shouldn't stand them up, if we can't effectively use > them. This is indeed what I am saying at 996038#c10. Since b2gperf essentially measures `window.onload`, which doesn't actually measure when an application is loaded and ready to interact with, it can be completely arbitrary between applications. So yes, I believe that Datazilla's measurement of app launch is wrong and incorrect. Bug 996038 will standardize the notation of loading across applications.

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Reporter

Comment 19

•

11 years ago

(In reply to Eli Perelman, :Eli from comment #18) > which doesn't actually measure when an application > is loaded and ready to interact with, it can be completely arbitrary between > applications. So yes, I believe that Datazilla's measurement of app launch I want to be careful here. Could still be arbitrary (and that's why I'm asking what it approximates) but we know the timeline is more nuanced than just the end goal of "ready to go." We've discussed in performance the need to track some of those earlier aspects, such as chrome-ready and above-the-fold, which have earlier targets which we will also no doubt eventually want to track and consider as acceptance targets (in fact, there are some pretty good reasons to stick with above-the-fold as a primary acceptance target, but that's a different discussion). But in this case, we're just trying to peg or at least approximate chrome-ready, understanding that isn't the whole story. It's an intentional compromise, reflecting current capabilities. Given Julien's concerns, though, I think we can probably take the video and make a call from there. To Ben's comment re: Eideticker, the current test results for messages load on Will's dashboard (albeit on Inari) show a large amount of variability, so I think Julien's side-by-side video is going to work out better for the immediate task.

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Reporter

Comment 20

•

11 years ago

Also, to be clear, when I ask what onload approximates, I mean only in terms of Messages. I get the point that it can be inconsistent, but what I want to know is whether the chrome is ready and showing when that event fires.

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 21

•

11 years ago

(In reply to Eli Perelman, :Eli from comment #13) > Apologies Julien, but could you clarify what you mean by your statement? I'm > not sure I follow what you mean. :) I mean, https://developer.mozilla.org/en-US/Firefox_OS/Platform/Automated_testing/Gaia_performance_tests :) You can talk to Hubert about them. That's why I don't understand why we do yet another way to measure the application readiness.

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 22

•

11 years ago

(In reply to Geo Mealer [:geo] from comment #20) > Also, to be clear, when I ask what onload approximates, I mean only in terms > of Messages. I get the point that it can be inconsistent, but what I want to > know is whether the chrome is ready and showing when that event fires. In the current code, the chrome is near-ready when onload fires (We still wait for onload before calling the various module's init functions, maybe we should switch to DOMContentLoaded... which could admittedly be a good low hanging fruit). In v1.3, the chrome was not loaded at all when onload fired. We're supposed to be able to measure the readiness of the app thanks to bug 837666 and the performance test framework, but until bug 846909 is done it's not very reliable.

Preeti Raghunath(:Preeti)

Comment 23

•

11 years ago

1.4+ for perf improvement.

blocking-b2g: 1.4? → 1.4+

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 24

•

11 years ago

I'd like the side-to-side video before blocking.

blocking-b2g: 1.4+ → 1.4?

:Eli Perelman

Comment 25

•

11 years ago

QA, could you please produce a video demonstrating this functionality, using 2 hamachis, one running 1.3 and the other running 1.4? We can then just compare side-by-side whether any further outcome on this bug is necessary.

Keywords: qawanted

Jason Smith [:jsmith]

Comment 26

•

11 years ago

(In reply to Julien Wajsberg [:julienw] (away until 5th May) from comment #24) > I'd like the side-to-side video before blocking. We've already made a decision here - this is a requirement to hit performance criteria. I don't think there's value to get videos here to keep analyzing this.

blocking-b2g: 1.4? → 1.4+

Keywords: qawanted

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Reporter

Comment 27

•

11 years ago

I'm going to supply the video by tomorrow. I do agree it would be valuable in this case, especially given comment 22. If chrome-ready moved from after the event to before, we could easily have the same performance with the framework showing the wrong thing. That's why Julien wants the video, and it's perfectly valid. In fact, *I* want to see how the timeline compares at this point. Let's keep qawanted off the bug, since our contract team largely picks those up. Instead, needinfoing myself to deliver the video. In the meantime, let's please deliver that bootstrap instrumentation ASAP. I'm certainly in agreement it's going to be the better way to do this in 2.0.

Flags: needinfo?(gmealer)

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Reporter

Comment 28

•

11 years ago

Please see http://people.mozilla.org/~gmealer/msgs-comparison.mov 1.4 is orange border on left. 1.3 is green border on right. Both are using medium workload, with Messages launched once post-flash/workload to do any processing needed there, then phone rebooted for the cold-launch test. I let some time pass after reboot for launch process cleanup. I synced the two videos based on the first frame of launch animation. Since the sequence changed between 1.3 and 1.4, that's the button fading in 1.3 and the zoom in 1.4. Video was shot at 240fps and shown at 30fps, so 1 sec of video is 125ms in real life. With that, you can see that Julien is correct. In fact, 1.4 looks around 200ms faster to chrome-ready than 1.3 (and with a much cleaner sequence). Above the fold looks like it's behind by a small amount, but I wouldn't recommend blocking on it. I'm not quite sure enough that the systems were in equivalent enough states to go off one single video to call out a <100ms regression there. I'm resolving this as INVALID. I'll leave the blocking flag in case someone decides that the videos are flawed and reopens the bug.

Status: NEW → RESOLVED

Closed: 11 years ago

Flags: needinfo?(gmealer)

Resolution: --- → INVALID

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Reporter

Comment 29

•

11 years ago

Attached video messages-comparison.m4v — Details

Also added an m4v.

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 30

•

11 years ago

Thanks Geo !

Mike Lee [:mlee]

Updated

•

11 years ago

Whiteboard: [c=progress p= s= u=] → [c=progress p= s=2014.05.09.t u=1.4]

Mike Lee [:mlee]

Updated

•

11 years ago

Target Milestone: --- → 2.0 S1 (9may)