Closed Bug 968276 Opened 6 years ago Closed 6 years ago

300ms startup time regression in v1.3 branch Usage app

Categories

(Firefox OS Graveyard :: Gaia::Cost Control, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:1.3+, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed)

RESOLVED FIXED
1.4 S1 (14feb)
blocking-b2g 1.3+
Tracking Status
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: huseby, Assigned: mai)

Details

(Keywords: perf, regression, Whiteboard: [c=progress p= s= u=1.3] [b2gperf_regression])

Attachments

(2 files)

46 bytes, text/x-github-pull-request
salva
: review+
Details | Review
46 bytes, text/x-github-pull-request
salva
: review+
Details | Review
v1.3 branch

Between revision: https://github.com/mozilla-b2g/gaia/commit/c8aad64c2639cecd
and: https://github.com/mozilla-b2g/gaia/commit/bb36b4164d3e51ca

The cold launch time of the Usage app increased 300ms.

Here's the perf-o-matic graph: http://mzl.la/1fHI2u8
I found this in datazilla this morning.  Who should take a look at this?
Flags: needinfo?(dscravaglieri)
Priority: -- → P1
Whiteboard: [b2gperf_regression] → [c=progress p= s= u=] [b2gperf_regression]
blocking-b2g: --- → 1.4?
Component: Gaia → Gaia::Cost Control
Missed the fact this was pointing to 1.3, so moving to 1.3?
blocking-b2g: 1.4? → 1.3?
We need a performance improvement period. This is probably caused by DSDS related bugs because now we need to ask settings for which SIM is the data SIM.

My advice is not to nominate this for 1.3 and wait for the US in bug 967582 (we need UX input here). This will lead to a refactor in the start up process (bug 968087) and better cold launch times.
(In reply to Salvador de la Puente González [:salva] from comment #3)
> We need a performance improvement period. This is probably caused by DSDS
> related bugs because now we need to ask settings for which SIM is the data
> SIM.
> 
> My advice is not to nominate this for 1.3 and wait for the US in bug 967582
> (we need UX input here). This will lead to a refactor in the start up
> process (bug 968087) and better cold launch times.

We aren't taking a 300ms startup regression - that's absolutely unacceptable. The regressing bug needs to be backed out here & reworked to take perf considerations into account.
The regression window leads unequivocally to bug 963682. So we can backout that bug if you want.

Anyway, my former comment is not wrong. The problem is related with the functions getting the SIM configuration in addition to some extra resources (already filed too under bug 968091) being loaded now. I don't know how the cold start is measured but I think this has less impact on the user than the problem fixed in bug 963682.

WDYT?
(In reply to Jason Smith [:jsmith] from comment #4)
> (In reply to Salvador de la Puente González [:salva] from comment #3)
> > We need a performance improvement period. This is probably caused by DSDS
> > related bugs because now we need to ask settings for which SIM is the data
> > SIM.
> > 
> > My advice is not to nominate this for 1.3 and wait for the US in bug 967582
> > (we need UX input here). This will lead to a refactor in the start up
> > process (bug 968087) and better cold launch times.
> 
> We aren't taking a 300ms startup regression - that's absolutely
> unacceptable. The regressing bug needs to be backed out here & reworked to
> take perf considerations into account.

Then I want to know how exactly the time is being measured in order to, first, determine there is no inconsistencies because CC performs a lot of HTML lazy loading and I fear we are stopping measurements before that loading and second, because I need to know how to measure if we want to avoid regressions.
Flags: needinfo?(dhuseby)
Attached file proposal v-1.0
Yesterday, I found a bug, whose resolution improves the startup time.
Salva, do you mind review the patch?

WDYT?
Attachment #8371513 - Flags: review?(salva)
(In reply to Salvador de la Puente González [:salva] from comment #6)
> Then I want to know how exactly the time is being measured in order to,
> first, determine there is no inconsistencies because CC performs a lot of
> HTML lazy loading and I fear we are stopping measurements before that
> loading and second, because I need to know how to measure if we want to
> avoid regressions.

The datazilla cold_load_time test measures time from app launch to the mozbrowseloadend event.  This is effectively just the static content in the page.  It does not measure lazy loaded content.  So it does not really measure "true" load time.

For that we have eideticker tests which use a high speed camera:

  http://eideticker.mozilla.org/b2g/#/inari/b2g-contacts-startup/timetostableframe

Unfortunately usage is not being measured in eideticker to compare.

That being said, while datazilla cold_load_time is not "true" load time, these measures are often correlated.  Most of the time cold_load_time is regressed we do indeed see a true load time regression as well.

If you can make the case that datazilla cold_load_time increased, but true load time stayed the same or decreased then I think its ok.  I think we would want to see some videos, though.  These can just be done with a 30fps camera like in an iphone or high end android phone.  This is usually enough to detect the difference.

For example, settings app recently purposely removed some lazy loads because the true load was faster loading things statically.  Because of this we accepted the regression on the datazilla graph.

Did something similar happen in usage?
Flags: needinfo?(dhuseby)
triage: 1.3+ for regression
blocking-b2g: 1.3? → 1.3+
Comment on attachment 8371513 [details] [review]
proposal v-1.0

This should not affect the cold start-up as Ben is defining it but it can improve user perception so let's make a video before the regression and now.

Good catch Mai! Thank you.
Attachment #8371513 - Flags: review?(salva) → review+
clear n? as the bug is actively worked out
Flags: needinfo?(dscravaglieri)
Marina - I'm setting the assignee to you here as you have a patch here to fix this.
Assignee: nobody → mri
Master: fdea06af4276c37cb83fdd4cd3c551be7109bef1
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Hi,
I'm not sure if the bug resolution is completely fixed. I've recorded some videos with the different scenarios:
1. V1.3 with the patch 963682 --> http://youtu.be/BzyxXLvw26A
2. v1.3 with the patch 963682 and 968276 --> http://youtu.be/Fvh37F-tsgk
3. v1.3 without both patchs --> http://youtu.be/BzyxXLvw26A

This patch improves the startup time because it prevents the execution of unnecessary processes during the application loading. This patch is more efficient on master branch than in v1.3 branch, because this process is simpler on 1.3.

Dave, would you mind analyzing the performance of the branch v1.3 with my patch?

Regards, 
Marina
Status: RESOLVED → REOPENED
Flags: needinfo?(dhuseby)
Resolution: FIXED → ---
Mike

Can you please help with the perf analysis here? If there is a drastic fall in perf we'll need a backout of the patch causing it. TEF will take care of both
Flags: needinfo?(mlee)
Dave,

Have you looked at Maria's patch and comment 14? If not please do so asap. This is a 1.3 blocker and needs immediate attention.

Thanks,
Mike
Flags: needinfo?(mlee)
Whiteboard: [c=progress p= s= u=] [b2gperf_regression] → [c=progress p= s= u=1.3] [b2gperf_regression]
I'm on it right now.
Flags: needinfo?(dhuseby)
I just downloaded all four videos you posted to youtube and loaded them up in my video editing software.  Here are the black-box numbers I'm measuring for the time between the app launches to the time the Usage app UI is drawn:

v1.3:                 ~3008 ms
v1.3.patch:           ~3009 ms
actualv1.3 newPatch:  ~2018 ms
1.3 -revert 963682:   ~2017 ms

I'm not sure exactly which videos correspond to which patches so I'll need some feeback from you.  Clearly, there are two videos of the app launching almost a full second faster.
Flags: needinfo?(mri)
Marina,

Your links to videos #1 and #3 are the same.  Video "v1.3 with patch 963682" I measured at ~3008 ms.  Video "v1.3 with the patch 963682 and 968276" I measured at ~3009 ms.  The other two videos "actualv1.3 newPatch" and "1.3 -revert 963682" were in your Youtube channel but you didn't link to them from here.  I'm assuming that video "1.3 -revert 963682" is what you meant to link to for "#3 v1.3 without both patchs" above.  That one I measured at ~2017 ms.

Let me know where'd you like to go from here.
Hi,
I'm sorry for the video url mess.

These videos reflect the same situation. I recorded the video several times because the first time that i recorded the application on v1.3 branch it took a lot of time. Due to this reason, I did a few videos to test what the normal behaviour is.
V1.3 with the patch 963682 :
           v1.3: ~3008 ms
           v1.3.patch: ~3009 ms


v1.3 with the patch 963682 and 968276 :
          actualv1.3 newPatch: ~2018 ms

v1.3 without both patchs:
          1.3 -revert 963682: ~2017 ms

I hope this to clarify the situation,
Regards
Flags: needinfo?(mri)
Hi Dave
 I update the video url:
1. V1.3 with the patch 963682 --> http://youtu.be/BzyxXLvw26A
2. v1.3 with the patch 963682 and 968276 --> http://youtu.be/wB7ik-13wuA
3. v1.3 without both patchs --> http://youtu.be/EaQ4GX2kzVg

Could you make your perf analysis with them?
Regards
Flags: needinfo?(dhuseby)
Hi Marina,

Thanks for clearing up the video links.  So it's clear that path 963682 created a 1 second perf regression and that patch 968276 fixes the regression.  I say we land the patch and close this bug :)

-dave
Flags: needinfo?(dhuseby)
It looks like the patch has already landed.  I'm closing this bug as fixed.
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Comment on attachment 8371513 [details] [review]
proposal v-1.0

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

This patch improves the startup time because it prevents the execution of unnecessary processes during the application loading.

[Approval Request Comment]
[Bug caused by] (feature/regressing bug #): Bug 963682
[User impact] if declined: worse performance on the startup
[Testing completed]: yes
[Risk to taking this patch] (and alternatives if risky): low
[String changes made]:
Attachment #8371513 - Flags: approval-gaia-v1.3?(fabrice)
Comment on attachment 8371513 [details] [review]
proposal v-1.0

Given the metrics in comment #22 approving this for uplift.
Attachment #8371513 - Flags: approval-gaia-v1.3?(fabrice) → approval-gaia-v1.3+
Target Milestone: --- → 1.4 S1 (14feb)
This has conflicts on v1.3. Please post a rebased PR for v1.3 merging.
Flags: needinfo?(mri)
Attached file PR for 1.3
Patch for 1.3
Flags: needinfo?(mri)
Comment on attachment 8377154 [details] [review]
PR for 1.3

Salva, could yor review the patch for branch v1.3?
Attachment #8377154 - Flags: review?(salva)
Review ping?
Flags: needinfo?(salva)
Comment on attachment 8377154 [details] [review]
PR for 1.3

Simple change. Thank you :mai!
Attachment #8377154 - Flags: review?(salva) → review+
Flags: needinfo?(salva)
v1.3: a8b2cb177b8a94987dbb13ace1cb454bc64b3486
You need to log in before you can comment on or make changes to this bug.