Closed Bug 1017247 Opened 8 years ago Closed 8 years ago

[Gaia] May 26 - May 27 - 100 - 150ms Launch Regression Across Multiple Apps

Categories

(Firefox OS Graveyard :: Gaia::System, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:2.0+, b2g-v2.0 fixed, b2g-v2.1 fixed)

RESOLVED FIXED
2.0 S6 (18july)
blocking-b2g 2.0+
Tracking Status
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: mchang, Assigned: kgrandon)

References

Details

(Keywords: perf, regression, Whiteboard: [c=regression p= s=2014.07.18.t u=2.0][systemsfe])

Attachments

(2 files, 1 obsolete file)

good
Gaia Revision: 6a391274cd436f8f
Gecko Revision: ca042038cdee

bad
gaia Revision: bc6f07c149770c6e
Gecko Revision: 448f2153d6d3

50ms comes from 1016483. 100ms coming from something else.
Summary: [Gaia] 100 - 150ms Launch Regression Across Multiple Apps → [Gaia] May 26 - May 27 - 100 - 150ms Launch Regression Across Multiple Apps
I'm pretty sure this is a gaia regression. All tested on the Gecko revision: ca042038cdee. Tested on a hamachi.

Results for Good Gaia and Good Gecko
Results for Phone, cold_load_time: median:1370, mean:1379, std: 39, max:1560, min:1342, all:1560,1388,1388,1360,1360,1358,1342,1392,1366,1360,1382,1367,1353,1350,1383,1365,1392,1375,1380,1386,1344,1398,1369,1450,1352,1372,1357,1379,1365,1404

Results for Settings, cold_load_time: median:1832, mean:1839, std: 37, max:2015, min:1807, all:2015,1825,1821,1815,1832,1807,1841,1832,1883,1836,1809,1838,1836,1874,1825,1838,1819,1816,1851,1840,1875,1831,1835,1834,1830,1823,1829,1813,1841,1819

Results for Bad Gaia, Good Gecko
Results for Phone, cold_load_time: median:1512, mean:1509, std: 32, max:1596, min:1375, all:1596,1527,1518,1495,1525,1496,1500,1547,1516,1510,1516,1505,1513,1507,1515,1488,1508,1375,1501,1535,1499,1520,1519,1481,1493,1499,1541,1514,1526,1511

Results for Settings, cold_load_time: median:1962, mean:1967, std: 27, max:2047, min:1914, all:2013,1965,1943,1960,1952,1975,1983,1955,2047,1983,1972,1964,1914,1960,1971,1986,1957,1940,1968,1953,2007,2005,1946,1958,1952,1943,1989,1919,1998,1955

results for diff gaia, good gecko - 9c34f28aa8e11a0ac90fd2d150e5bfe6a05d3144
Results for Phone, cold_load_time: median:1376, mean:1382, std: 38, max:1543, min:1338, all:1543,1377,1351,1403,1361,1402,1382,1348,1349,1370,1396,1375,1387,1353,1392,1362,1389,1358,1376,1383,1398,1352,1338,1395,1352,1469,1372,1370,1384,1376

Results for Settings, cold_load_time: median:1819, mean:1831, std: 51, max:2070, min:1757, all:2070,1837,1815,1814,1807,1814,1833,1813,1907,1819,1797,1864,1818,1838,1821,1815,1828,1802,1830,1813,1883,1819,1822,1827,1824,1810,1757,1802,1809,1826
Keywords: regression
blocking-b2g: --- → 2.0?
The offending commit is from 8f17ec2afc237bda25615fe74483c6a27f71a717, which is the same commit from bug 1016483. It just effects every app a bit differently. Also, this seems to be unique to hamachi. On a flame, I cannot reproduce this.

Bissection notes
5b90ce4
Results for Phone, cold_load_time: median:1361, mean:1367, std: 40, max:1572, min:1336, all:1572,1361,1349,1367,1351,1362,1351,1361,1367,1357,1351,1363,1360,1336,1364,1398,1365,1390,1336,1368,1350,1352,1340,1349,1370,1351,1363,1358,1384,1389

43e063a
Results for Phone, cold_load_time: median:1360, mean:1369, std: 46, max:1606, min:1337, all:1606,1343,1360,1348,1340,1406,1344,1360,1365,1358,1380,1366,1337,1360,1364,1347,1376,1358,1383,1347,1362,1344,1357,1366,1348,1358,1393,1372,1376,1366

8f17ec2afc237bda25615fe74483c6a27f71a717
Results for Phone, cold_load_time: median:1500, mean:1502, std: 18, max:1535, min:1456, all:1529,1495,1490,1516,1488,1524,1506,1498,1456,1503,1524,1513,1500,1500,1508,1487,1495,1507,1500,1489,1535,1532,1504,1492,1515,1534,1479,1485,1491,1481

with fix from 1016483
Results for Phone, cold_load_time: median:1420, mean:1419, std: 22, max:1473, min:1386, all:1473,1421,1425,1425,1432,1419,1397,1440,1431,1404,1431,1388,1399,1456,1464,1422,1386,1397,1409,1444,1423,1404,1391,1389,1444,1430,1407,1392,1413,1418

Etienne, know any other place where you can shave off another 75ms? This may also be eclipsed as current start up times took a large improvement, a 150ms improvement, from bug 950673 relanding.
Flags: needinfo?(etienne)
bug 1016483 finally landed so crossing my fingers and keeping an eye on datazilla :)
Flags: needinfo?(etienne)
blocking-b2g: 2.0? → 2.0+
Etienne, did you have any comments in reply to Mason's question?

(In reply to Mason Chang [:mchang] from comment #2)
> Etienne, know any other place where you can shave off another 75ms? This may
> also be eclipsed as current start up times took a large improvement, a 150ms
> improvement, from bug 950673 relanding.
Flags: needinfo?(etienne)
(In reply to Eli Perelman, :Eli from comment #4)
> Etienne, did you have any comments in reply to Mason's question?
> 
> (In reply to Mason Chang [:mchang] from comment #2)
> > Etienne, know any other place where you can shave off another 75ms? This may
> > also be eclipsed as current start up times took a large improvement, a 150ms
> > improvement, from bug 950673 relanding.

Ha, missed it because of all the b2gperf logs.
Not sure, I know there's at least one other class toggle on #screen happening (.on-homescreen).

Kevin, do you know if we can do without it?
(I'm pretty sure I'm the one who asked for it originally :/ but don't know how it's used in the current homesearch bar)
Flags: needinfo?(etienne) → needinfo?(kgrandon)
I believe this is for custom rocketbar functionality on the homescreen. Is it just that toggling classes on #screen is bad? We may be able to get away with having the class on #statusbar instead, would that help at all? (I do think we'll still need to toggle it depending on homescreen state)
Flags: needinfo?(kgrandon)
(In reply to Kevin Grandon :kgrandon from comment #6)
> I believe this is for custom rocketbar functionality on the homescreen. Is
> it just that toggling classes on #screen is bad? 

Yes, my naive understanding is that we basically redo the rule matching for all the children of the element when toggling a class, and the #screen has a lot of children :)

> We may be able to get away
> with having the class on #statusbar instead, would that help at all? 

I think it would.
Hey Etienne. Can I assign this bug over to you then or Kevin? Thanks!
Flags: needinfo?(etienne)
Whiteboard: [c=regression p=3 s= u=] → [c=regression p=3 s= u=2.0]
(In reply to Mason Chang [:mchang] from comment #8)
> Hey Etienne. Can I assign this bug over to you then or Kevin? Thanks!

Kevin? :)

Also we might want to update the description since I'm not sure the whole 100ms are still at stake here.
Flags: needinfo?(etienne) → needinfo?(kgrandon)
Ben - I think you were the original author of the on-homescreen class so looping you in here. I haven't yet looked into the details of this, but do you think it's possible to implement this class on the statusbar? I don't mind doing the patch unless you want to take it, but I just wanted to check with you in case you remembered. Thanks!
Flags: needinfo?(kgrandon) → needinfo?(bfrancis)
If you take a look in window.css and rocketbar.css you'll see places where the on-homescreen class is used outside of the statusbar. Any replacement solution would need to fulfill those use cases too.

One example is when appWindows are translated down the screen as the Rocketbar expands, and .on-homescreen is used as an exception so that the homescreen's appWindow is never transformed, particularly when transitioning from the lock screen. An alternative implementation would also need to be careful that there are no conflicts with sheets transitions.
Flags: needinfo?(bfrancis)
Ben - due to the fact that this causes a regression, and is for a feature that is not in 2.0, can we take it out, and re-evaluate and land for 2.1? Would you mind taking this?
Assignee: mchang → nobody
Component: Gaia → Gaia::System
Flags: needinfo?(bfrancis)
(In reply to Kevin Grandon :kgrandon from comment #12)
> Ben - due to the fact that this causes a regression, and is for a feature
> that is not in 2.0, can we take it out, and re-evaluate and land for 2.1?
> Would you mind taking this?

Sorry again for suggesting the class in the first place :)
We should probably run a quick b2gperf with and without the toggle at first to make sure the gain is significant enough.
Whiteboard: [c=regression p=3 s= u=2.0] → [c=regression p= s= u=2.0]
Hey Kevin, I'm going to assign this to you. Please let me know if you need help.
Assignee: nobody → kgrandon
The #screen.on-homescreen class is now used in at least 12 places, including by the homescreen search in 2.0. Unfortunately I don't think it's as simple as just "taking it out".

Here are some uses by homescreen search in rocketbar.css:

body.homesearch-enabled #screen.on-homescreen #rocketbar.active
body.homesearch-enabled #screen.on-homescreen.rocketbar-expanded #statusbar-background
body.homesearch-enabled #screen.on-homescreen #rocketbar-form

(As a side note, if #screen.on-homescreen is bad, does that make body.homesearch-enabled even worse performance-wise?)

Mason, I have created a patch which turns this class off here https://github.com/benfrancis/gaia/commit/bebfea9e5395b770b70f09e21de683fee1d2c094 We can't use this patch because it causes regressions on the home search bar, but would you be able to do a comparison with and without this patch applied so we can get data on what effect this change would actually have on app startup time?

Trying to rip out this class at this stage poses significant risk of regressions.
Flags: needinfo?(bfrancis) → needinfo?(mchang)
Hi Ben, can you please test before and after with a hamachi device? Instructions on how to run b2gperf are here - https://wiki.mozilla.org/FirefoxOS/Performance/Automation_Testing. Please ping me if you have issues running it.

Also, please disable the vertical home screen before running the tests. The vertical homescreen causes some other issues. Thanks!
Flags: needinfo?(mchang) → needinfo?(bfrancis)
Whiteboard: [c=regression p= s= u=2.0] → [c=regression p= s= u=2.0][systemsfe]
Target Milestone: --- → 2.0 S4 (20june)
Hi Mason, I tried following the b2g-perf instructions and after some problems I finally managed to get it set up. But when I run...

b2gperf --delay=10 --reset --iterations=30 Settings

I get...

Traceback (most recent call last):
  File "/usr/local/bin/b2gperf", line 5, in <module>
    from pkg_resources import load_entry_point
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2707, in <module>
    working_set.require(__requires__)
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 686, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 584, in resolve
    raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: requests

Have you seen this error before, or am I doing something wrong?
Flags: needinfo?(bfrancis) → needinfo?(mchang)
Hmm that is odd.. are you building and flashing gecko locally from source? Can you try that. b2gperf requires an engineering build to work.
Flags: needinfo?(mchang)
Mason, I've tried today's nightly engineering build of Gecko for the hamachi and I still get the same error. I can build myself if you think that will make a difference, but what were the exact steps you took to create a build and run b2gperf that worked for you?
Flags: needinfo?(mchang)
Hmm, I've always locally built gecko from source and flashed that. I never built b2gperf myself, I've always used pip. Dave Hunt, any thoughts on the error from comment 17? Thanks!
Flags: needinfo?(dave.hunt)
Flags: needinfo?(mchang)
How did you install b2gperf? It looks like you haven't got the requests package, which is a dependency for b2gperf. It should install when you run |python setup.py install| or |pip install b2gperf|. Also, which version of b2gperf are you using? Please attach the output from |pip freeze| which will show all of the Python packages installed in your environment.
Flags: needinfo?(dave.hunt) → needinfo?(bfrancis)
Attached file pip.txt
I followed the instructions on the wiki. pip shows b2gperf 0.27. I have attached the output of pip freeze.
Flags: needinfo?(bfrancis)
That shows you have requests installed. It might be worth creating a fresh virtual environment and reinstalling b2gperf in that.
See http://virtualenv.readthedocs.org/en/latest/ for more information on virtual environments.
I'll just try the patch from comment 17. Will be back with results soon.
The patch from comment 17 saves about 20ms. Here are the results:

Gecko: 188818:80431d4fd0da
Gaia: 9cc5cd6627c475437886be9ac7f407cd5425a412

Normal:
Results for Settings, cold_load_time: median:826, mean:884, std: 257, max:2141, min:805, all:2141,901,866,904,860,834,830,805,817,826,812,853,833,820,811,826,826,822,823,831,812,826,813,809,815
Results for Settings, cold_load_time: median:829, mean:886, std: 233, max:2017, min:807, all:2017,956,868,901,836,818,837,813,813,808,830,848,849,830,829,807,829,813,826,813,815,816,817,948,829
Results for Settings, cold_load_time: median:820, mean:885, std: 263, max:2165, min:791, all:2165,899,874,886,840,845,817,820,827,813,812,813,836,812,810,815,803,820,820,824,943,819,827,791,813

with Patch
Results for Settings, cold_load_time: median:811, mean:867, std: 238, max:2030, min:783, all:2030,867,864,870,853,837,818,844,786,811,832,810,811,802,833,799,790,811,791,820,805,783,814,798,810
Results for Settings, cold_load_time: median:808, mean:862, std: 218, max:1924, min:786, all:1924,883,864,856,812,797,825,802,821,792,795,808,865,820,808,801,796,808,793,803,793,866,816,833,786
Results for Settings, cold_load_time: median:812, mean:870, std: 250, max:2088, min:779, all:2088,861,873,845,831,816,779,813,804,816,824,806,830,811,807,792,812,802,808,795,941,811,826,789,787

In addition to this, if we can shave another 30ms we're good.
Kevin, or Etienne, know anything else that might shave off another 30ms?
Flags: needinfo?(kgrandon)
Flags: needinfo?(etienne)
Are we talking about the patch in comment 15? I guess we would need to verify that we can actually use that, we may end up having to write a workaround that's even worse. 

Have we tried talking to any platform/layout guys to see if there's any improvements we can do given our displaylist?

I'm also not going to be able to look at this for another 2 weeks or so, so if anyone wants to steal it, feel free to.
Flags: needinfo?(kgrandon)
(In reply to Mason Chang [:mchang] from comment #27)
> Kevin, or Etienne, know anything else that might shave off another 30ms?

Ha! I might have something. I got a ~10ms gain on the flame, it might translate to something more significant on the buri, I'll send a WIP so you can tell me.
Flags: needinfo?(etienne)
Hey Mason could you run a b2gperf against this branch [1] to see if it's worth the trouble?

[1] https://github.com/etiennesegonzac/gaia/compare/bug-1017247-titlechange?expand=1
Flags: needinfo?(mchang)
Hey Etienne, Can you please try running b2gperf on a hamachi first? I have some instructions on how to set it up here - https://wiki.mozilla.org/FirefoxOS/Performance/Automation_Testing#Running_B2g_Perf. If you can run it on a linux machine, that seems to resolve a lot of the set up issues. Thanks!
Flags: needinfo?(mchang) → needinfo?(etienne)
Jim do you think the rest of the issues will be fixed as part of bug 1024779?
Flags: needinfo?(etienne) → needinfo?(squibblyflabbetydoo)
wow sorry, Comment 32 was a wrong window.
Flags: needinfo?(squibblyflabbetydoo)
(In reply to Mason Chang [:mchang] from comment #31)
> Hey Etienne, Can you please try running b2gperf on a hamachi first? I have
> some instructions on how to set it up here -
> https://wiki.mozilla.org/FirefoxOS/Performance/
> Automation_Testing#Running_B2g_Perf. If you can run it on a linux machine,
> that seems to resolve a lot of the set up issues. Thanks!

Got a buri to work and no significant gain (none at all actually on 30 runs) :/

Before
Results for Settings, cold_load_time: median:813, mean:891, std: 198, max:1351, min:571, all:750,1163,829,596,1191,805,839,803,987,822,794,571,1351,787,784,786,791,1055,791,827,829,778,1170,1260,1187,802,726,1147,847,664

After
Results for Settings, cold_load_time: median:891, mean:892, std: 57, max:1013, min:651, all:651,977,907,871,905,859,871,891,892,886,881,865,879,856,880,916,867,909,905,915,902,924,1013,916,864,917,911,885,870,979

BTW, I got a 150+ standard deviation multiple times when testing on master, is this consistent with what you're usually seeing?
Yeah it's really really noisy after the vertical home screen was turned on by default. If you run it multiple times it might settle down. Ideally, the best path forward is to use Eli's new start up events that are more accurate, which might help the noise. Eli is on PTO right now though, so maybe ping him next week?

@Eli - Do we have a bug to track the new start up events on the settings app?
Flags: needinfo?(eperelman)
If it's noisy after the vertical homescreen, I think it would take a framework change and not events to clean that up. We need to isolate the app tests from the homescreen, and it seems like even with events that won't be the case? Unless you change the point from which you measure.
(In reply to Mason Chang [:mchang] from comment #35)
> ...
> Eli is on PTO right now though, so maybe ping him next week?

Hub should be able to help you while Eli's on PTO.
 
> @Eli - Do we have a bug to track the new start up events on the settings app?

Yes. See Bug 1015405 and its parent Bug 996038.
Flags: needinfo?(eperelman) → needinfo?(hub)
Currently settings implement the new startup event tests, so we should get better metrics with now.
Flags: needinfo?(hub)
(In reply to Kevin Grandon :kgrandon from comment #36)
> If it's noisy after the vertical homescreen, I think it would take a
> framework change and not events to clean that up. We need to isolate the app
> tests from the homescreen, and it seems like even with events that won't be
> the case? Unless you change the point from which you measure.

We should be using a new framework with make test-perf and the new events. Hub, can you please help etienne run the new tests for this regression?

From https://bugzilla.mozilla.org/show_bug.cgi?id=1021037#c12, we may actually be doing better with the vertical home screen. It would be good to find out. If we're actually doing better, then we can close this out.
Flags: needinfo?(hub)
it is as simple as running make test-perf. There is nothing else involved. Set the number of RUNS and the APP as you see fit.

All is documented in the freshly update MDN article:

https://developer.mozilla.org/en-US/Firefox_OS/Platform/Automated_testing/Gaia_performance_tests
Flags: needinfo?(hub)
(In reply to Hubert Figuiere [:hub] from comment #38)
> Currently settings implement the new startup event tests, so we should get
> better metrics with now.

But do we know the root cause?
If the noise comes from the phone being busy during the app launch (maybe while the homescreen icons are loading?) using different events won't fix the noise (but adding a delay should).
Target Milestone: 2.0 S4 (20june) → 2.0 S5 (4july)
Severity: normal → blocker
Attached file Remove on-homescreen class toggling (obsolete) —
Rocketbar stuff has recently regressed, and since this is a 2.1 thing, we should remove the on-homescreen class for now to save on performance. In 2.1 we can find a new way of handling this as to not cause a regression.

Etienne - could you review this if you have a minute? Thanks!
Attachment #8448276 - Flags: review?(etienne)
Comment on attachment 8448276 [details] [review]
Remove on-homescreen class toggling

Good news: looks like a 30-50ms gain (on flame!)
Bad news: the status bar stays transparent when I launch an app (haven't pinpointed it)
Attachment #8448276 - Flags: review?(etienne)
Comment on attachment 8448276 [details] [review]
Remove on-homescreen class toggling

(In reply to Etienne Segonzac (:etienne) from comment #43)
> Comment on attachment 8448276 [details] [review]
> Remove on-homescreen class toggling
> 
> Bad news: the status bar stays transparent when I launch an app (haven't
> pinpointed it)

Ah right, this is because we need the on-homescreen class for the transparent background. We ran into this same problem when we tried to do this in 1.3 or 1.4 as well. Not sure what I was thinking trying to remove this, I don't think it's possible to do in this way. We will probably need to revisit the implementation.
Attachment #8448276 - Attachment is obsolete: true
Attached file Github pull request
Hey Etienne - same patch, but with a few more fixes. The homescreen now always sets statusbar state on visibilitychange, this ensures that app launches while at the top of the statusbar should get the proper statusbar state.

I'm also hooking into the statusbar background with lockscreen-appclosing, so we get an earlier change and avoid a background flash when unlocking.

Let me know if you see any other issues with this. Thanks!
Attachment #8448845 - Flags: review?(etienne)
Comment on attachment 8448845 [details] [review]
Github pull request

STR:
- on homescreen (transparent)
- open an app (opaque, ok)
- lock the screen (transparent, ok)
- unlock the screen (still transparent :/)

I think we need to toggle in system with appopen/homescreenopened/locksreen-opening/closing and make sure the homescreen doesn't postMessage when it's not visible.

Might be missing something still...
Attachment #8448845 - Flags: review?(etienne)
Target Milestone: 2.0 S5 (4july) → 2.0 S6 (18july)
Comment on attachment 8448845 [details] [review]
Github pull request

Hey Etienne - could you take a look one more time? I've added a test case for the app launching thing you noticed before. I also wanted to add one for lockscreen, but was having trouble interfacing with the lockscreen, so that will have to wait. I think the lockscreen thing shouldn't be an issue anymore as I've stopped listening to the lockscreen event. I think the appopened event should be able to take most of the heavy lifting, what do you think?
Attachment #8448845 - Flags: review?(etienne)
Comment on attachment 8448845 [details] [review]
Github pull request

Sorry for the review delay, this is awesome, should have landed yesterday :)

(tiny comment on github)
Attachment #8448845 - Flags: review?(etienne) → review+
Thanks Etienne! With this and bug 1016483 I think we are done here. Perf guys - can you verify?

Master: https://github.com/mozilla-b2g/gaia/commit/0b9620fbaed72b9cfd8d2557a1a8b6c3d59ca015
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Gaia: 7e75314
Gecko: 193437:a18ff2303b09
Device: Hamachi

Results:
Results for Phone, cold_load_time: median:987, mean:996, std: 33, max:1085, min:930, all:930,973,1007,980,986,986,988,973,1085,1022,1031,952,965,947,1052,982,1007,988,997,984,998,979,977,1000,1056,977,1038,1025,983,1029

Before:
Results for Phone, cold_load_time: median:1370, mean:1379, std: 39, max:1560, min:1342, all:1560,1388,1388,1360,1360,1358,1342,1392,1366,1360,1382,1367,1353,1350,1383,1365,1392,1375,1380,1386,1344,1398,1369,1450,1352,1372,1357,1379,1365,1404

Looks good to me! Thanks!
Whiteboard: [c=regression p= s= u=2.0][systemsfe] → [c=regression p= s=2014.07.18.t u=2.0][systemsfe]
Depends on: 1039180
Unable to verify due to b2g-perf automation STR.
QA Whiteboard: [QAnalyst-Triage?][QAnalyst-verify-]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?][QAnalyst-verify-] → [QAnalyst-Triage+][QAnalyst-verify-]
Flags: needinfo?(ktucker)
You need to log in before you can comment on or make changes to this bug.