Closed Bug 1041967 Opened 6 years ago Closed 6 years ago

[Messages] do some safe lazy load to improve launch latency

Categories

(Firefox OS Graveyard :: Gaia::SMS, defect, P1, blocker)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:2.0+, b2g-v2.0 fixed, b2g-v2.1 fixed)

RESOLVED FIXED
2.1 S1 (1aug)
blocking-b2g 2.0+
Tracking Status
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: julienw, Assigned: steveck)

References

Details

(Keywords: perf, Whiteboard: [c=progress p=2 s= u=2.0])

Attachments

(2 files)

46 bytes, text/x-github-pull-request
julienw
: review+
julienw
: feedback+
Details | Review
46 bytes, text/x-github-pull-request
julienw
: review+
Details | Review
+++ This bug was initially created as a clone of Bug #1038176 +++

SMS app launch latency has regressed compared to v1.3.

For v1.3:
gaia: https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/v1.3&id=e08beb0297aff472b5c84437e9d7eaf8c0400a29
gecko: https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/v1.3&id=e0bfab94fd20e4dcbb2fecf19d7c2dc606189f81




For v2.0:
gaia: https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/v2.0&id=ef67af27dff3130d41a9139d6ae7ed640c34d922
gecko: https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/v2.0&id=d3eae03cdf4e6944e646d05938a22dc1380a0d95


Test Steps:
1) Flush v1.3 and v2.0 build on two flame device (512MB) device. Launch Message app and compare side by side.

2) launch latency = touchdown to displaying Messages. Message app contains 80 messages.
   In v1.3 launch latency is : 694 msecs
   In v2.0 launch latency is : 1694 msecs
Taking this since both bug 1038176 & bug 1040020 might get some benefit from this one.
Assignee: nobody → schung
Some profiling from my flame on master/2.0, tested with light workload(200 messages):
               master         v2.0
non-minified   840 ms         834 ms
minified       781 ms         765 ms

I made a patch that moving some scripts(10 js files) that unnecessary for app launch into lazyloader:
               master         v2.0
non-minified   812 ms         812 ms
minified       765 ms         747 ms

We could see the improvement isn't that much, around 10~30 ms. I've tried it on tarako, the result is almost the same: The minified result is 1135ms/1118ms

Next step I might try to lazyload everything and init modules after that, or simply load thread list view related module first and put others to lazyload.
(In reply to Steve Chung [:steveck] from comment #2)
> Next step I might try to lazyload everything and init modules after that, or
> simply load thread list view related module first and put others to lazyload.

Careful with lazy loading as it's a double-edged sword. In general unless you're lazy-loading a *lot* of stuff you won't be much benefit so it's often not worth the fuss. Additionally if you lazy-load something that is needed in the startup path you will actually increase load time as lazy-loading has a small but incremental cost for every file you load. You can have a look at bug 947234 where I made all files that were in the startup path greedy loaded. I had a patch that was lazy-loading more stuff in attachment 8358504 [details] [diff] [review] but it didn't give much benefit and it introduced some non insignificant risks.

In general my suggestion here would be first to take profiles of both master and v1.3 (multiple profiles, at least 5 as there's always some variation), then identify what regressed and only then decide what to do to optimize it again.
(In reply to Gabriele Svelto [:gsvelto] from comment #3)
> (In reply to Steve Chung [:steveck] from comment #2)
> > Next step I might try to lazyload everything and init modules after that, or
> > simply load thread list view related module first and put others to lazyload.
> 
> Careful with lazy loading as it's a double-edged sword. In general unless
> you're lazy-loading a *lot* of stuff you won't be much benefit so it's often
> not worth the fuss. Additionally if you lazy-load something that is needed
> in the startup path you will actually increase load time as lazy-loading has
> a small but incremental cost for every file you load. You can have a look at
> bug 947234 where I made all files that were in the startup path greedy
> loaded. I had a patch that was lazy-loading more stuff in attachment 8358504 [details] [diff] [review]
> [details] [diff] [review] but it didn't give much benefit and it introduced
> some non insignificant risks.
> 
> In general my suggestion here would be first to take profiles of both master
> and v1.3 (multiple profiles, at least 5 as there's always some variation),
> then identify what regressed and only then decide what to do to optimize it
> again.

Thanks for the reminder, I knew we used lazyloader for everything before, and just like you said, it actually takes more time for everything ready in lazyloader. Since this issue is focused on safe lazyloader, I just profile there extreme cases the for comparison to see if there is still some room for lazyloader. In bug 1038176 #c12, it seems that every step takes longer time, we'll need to investigate separately.
I got another profiling result from lazyload/init everything and lazyload everything except thread list rendering related modules. Since the the result from mater/2.0 didn't have much difference, I just show the data with master minified result:
                lazyload/init everything           lazyload everything except thread list
Dom loaded          506 ms                                  638 ms
lazyload time      ~350 ms                                 ~300 ms 

For the preload thread list cases, I put 36 scripts into the lazyloader. We might get ~150ms improvement from loading(will do more profiling for time to render on the view), but the drawback is:
 - Can't switch view before lazyload complete
 - Activity case will take a little more time to complete

I didn't test very carefully yet, so I will create patch tomorrow. Not quite sure whether we should take this risk for hundred of ms improve or not.
Severity: blocker → major
Status: NEW → ASSIGNED
Whiteboard: [p=2] → [c=progress p=2 s= u=]
A WIP testing patch about lazy load modules unrelated to thread list rendering:
https://github.com/steveck-chung/gaia/tree/message-safe-lazyloader

I also set timeout for lazyloader to load after the first panel ready. If we start lazyload and thread rendering at the same time, it will take additional ~50ms to start the rendering and ~250ms to complete the first panel, that means the total time actually longer than original message app. If we could delay the lazyload and init after rendering, the rendering performance could be improved:

                 | original | lazyload | lazyload after timeout
  loaded         |          |          |
    to           |  231 ms  |  286 ms  |    218 ms
start render     |          |          |
------------------------------------------------------------
start render     |          |          |
    to           |  755 ms  |  1031 ms |    634 ms
first panel ready|          |          |

The comparison video for original case and lazyload after timeout:
https://www.youtube.com/watch?v=uyN3Q1j9aB4&list=UUkM1XSMB9c0p9684qCa6ZAA

Hi julien, since we will separate the view and lazy load in bug 1009545 eventually, do you think we should take this step(lazyload/init the module that not in the thread render call path) first and even uplift to v2.0(for Bug 1038176)?  Maybe we could try only delay init in this patch, but user might not discover the improvement because effect is limited.
Flags: needinfo?(felash)
Attached file Link to github
Hi Julien, I put more stuff in lazyloader in this patch. Now the message app cold load time could reduce to ~450ms or less, it was 780ms originally. The time from dom loaded to first panel ready also decrease ~100ms. Although we need to wait for first panel loaded and lazyloader complete in this patch, it will take  additional 300~500ms for totally accessible. For my POV it's worth to try because user actually need some time to read the thread then react, so this latency won't affect the user experience. But it depends on you ;)
Attachment #8462672 - Flags: feedback?(felash)
Comment on attachment 8462672 [details] [review]
Link to github

I think this is a good step to go; we'll do better later (lazy loading only when we need it).

However, using the Navigation object we can easily avoid to use "pointer-events: none" and postpone all user interactions until we're ready. So let's do this :)
Attachment #8462672 - Flags: feedback?(felash) → feedback+
Flags: needinfo?(felash)
Comment on attachment 8462672 [details] [review]
Link to github

Hi Julien, I adjusted the patch per your suggestion. It might has nearly 80ms regression it we don't lazy load css and navigation/EventDispatcher, but I think the overall result(including load time and rendering time) is still pleasant. But it still has a drawback that we need a little bit longer time for entering message conversation view in share case.
Attachment #8462672 - Flags: review?(felash)
Comment on attachment 8462672 [details] [review]
Link to github

As far as I tested, this seems to work fine.
The delay at the start of the application is now about the same as the v1.3 version (I ran it on a v2.0 Gecko to clearly compare).

Added some minor comments and then I think we can check it in. I know we will do better in v2.1, but I hope we won't be asked to do better for the activity for v2.0 :/
Attachment #8462672 - Flags: review?(felash)
Comment on attachment 8462672 [details] [review]
Link to github

Patch for 2nd round. That's true that we could do better in v2.1, but having less noise might be able to save our life from blockers ;)
Attachment #8462672 - Flags: review?(felash)
Comment on attachment 8462672 [details] [review]
Link to github

r=me with nits in unit tests
please merge with a green travis or gaia try !
Attachment #8462672 - Flags: review?(felash) → review+
Thanks for the reminder!

In master: cb79df33dbe01425d9b2e2c0581d98b002586b15
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Comment on attachment 8462672 [details] [review]
Link to github

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
[Bug caused by] (feature/regressing bug #): App load time regressing (This patch is part of fixing for blocker bug 1038176)
[User impact] if declined: App load time increase 300 ~ 400ms. Thread rendering time also increase as well.
[Testing completed]: Yes
[Risk to taking this patch] (and alternatives if risky): Low
[String changes made]: No
Attachment #8462672 - Flags: approval-gaia-v2.0?
[Blocking Requested - why for this release]: blocking a blocker
blocking-b2g: --- → 2.0?
Be careful wen uplifting: depends on bug 1039585
Depends on: 1039585
Ah fine, bug 1039585 was uplifted this night already :)
blocks a CAF blocker, so +
blocking-b2g: 2.0? → 2.0+
Attachment #8462672 - Flags: approval-gaia-v2.0?
I had to revert this from v2.0 in https://github.com/mozilla-b2g/gaia/commit/4c7d129f556824addfc73856ab5b760190644eca for being the likely cause of these Gaia-ui test timeouts on TBPL: https://tbpl.mozilla.org/php/getParsedLog.php?id=44930367&tree=Mozilla-B2g32-v2.0

The other two commits included in that TBPL run were bug 1043856 and bug 1038630, and neither of those seemed likely.
Flags: needinfo?(schung)
Being that this was sandboxed to the sms app and had a green Gip gaia-try run, I highly doubt that this is the cause of failure...

https://tbpl.mozilla.org/?rev=aaf8f0b52e7bb03176af1cee259617bb2f57aef4&tree=Gaia-Try
Yeah, it's still failing on the revert.
Since gaia unit test passed on Linux x64 Opt & OS X, it's most like the environment issue in x64 debug build. You can see the log the crash happened in environment setting and none of the test is triggered.

Sorry Ryan, could you please help the uplifting again? Thanks.
Flags: needinfo?(schung) → needinfo?(ryanvm)
Of course, now that we got the cause of the mass perma-fail reverted, another perma-fail appeared that appears related to this push. I've reverted it again to confirm. I'll update the bug one way or another once I have test results.
v2.0: https://github.com/mozilla-b2g/gaia/commit/cbf288dafe5350d775a4b597dbce880fa5f3e3a2

https://tbpl.mozilla.org/php/getParsedLog.php?id=44976067&tree=Mozilla-B2g32-v2.0
Indeed the failures in comment 25 are gone post-backout.
I'd say it's not unexpected ;)

For such a patch, it's likely better to do the uplift ourself and test on the device.

Steve, you can do it? Otherwise tell me and I can take this task.
Flags: needinfo?(schung)
Attached file Link to github
It actually has a bug if we auto uplift to 2.0: There are 2 non-existent scripts in lazyloader and cause lazyloader unable to callback successfully. Hi Julien, could you please take a look again on 2.0? Thanks.
Attachment #8466028 - Flags: review?(felash)
Flags: needinfo?(schung)
Comment on attachment 8466028 [details] [review]
Link to github

v2.0: 72853343a3387e587492cdfb87808a2823f76dee
Attachment #8466028 - Flags: review?(felash) → review+
Manually checked that the app was launchable, tested various cases (activities, composing...), Gip looks fine in TBPL.
Severity: major → blocker
Whiteboard: [c=progress p=2 s= u=] → [c=progress p=2 s= u=2.0]
You need to log in before you can comment on or make changes to this bug.