952640 - 400ms launch regression on datazilla for Settings (Dec 20)

Reporter

Description

•

11 years ago

The launch times for settings on datazilla spiked up to the 1200ms range today.  Regression range:

  https://github.com/mozilla-b2g/gaia/compare/8307f797d8209b...5f1d2cbb99f4a9

No gecko changes.  No change in OEM firmware.

Possibly due to APZC being enabled?  It could also just be a manifestation of bug 951185, although this would be a very extreme case.

Ben Kelly [:bkelly, not reviewing]

Reporter

Updated

•

10 years ago

Whiteboard: [c=automation p= s= u=] → [c=automation p= s= u=][b2gperf_regression]

Mike Lee [:mlee]

Updated

•

10 years ago

Whiteboard: [c=automation p= s= u=][b2gperf_regression] → [c=progress p= s= u=] [b2gperf_regression]

Kevin Grandon :kgrandon

Comment 2

•

10 years ago

Tim - this bug came up during the Performance teams sprint planning. Would you be able to find an assignee to verify if this is an actual regression, or solely the result of APZ and the settings application?

Flags: needinfo?(timdream)

Tim Guan-tin Chien [:timdream] (please needinfo)

Comment 3

•

10 years ago

Arthur, does anything ring the bell?

Flags: needinfo?(timdream) → needinfo?(arthur.chen)

Arthur Chen [:arthurcc][inactive after 6/30]

Comment 4

•

10 years ago

This is the result of APZ but it only regressed settings app. I guess that's because no other apps have long panels as settings app.

Flags: needinfo?(arthur.chen)

Dave Huseby [:huseby]

Assignee

Updated

•

10 years ago

Assignee: nobody → dhuseby

Status: NEW → ASSIGNED

Dave Huseby [:huseby]

Assignee

Updated

•

10 years ago

Whiteboard: [c=progress p= s= u=] [b2gperf_regression] → [c=progress p=2 s= u=] [b2gperf_regression]

Dave Huseby [:huseby]

Assignee

Updated

•

10 years ago

Blocks: 953001

Dave Huseby [:huseby]

Assignee

Comment 5

•

10 years ago

Attached image 400ms_regression_in_settings.png — Details

This screenshot shows the 400ms regression in the Settings app launch time.  During launch, we're spending 400ms in convertAppsArray() in gecko's Webapps.js file.  I'm not sure exactly what the fix is for this.  I'm trying to optimize it now.

[:fabrice] Fabrice Desré

Comment 6

•

10 years ago

convertAppsArray() has always been slow because it's creating a lot of xpcom objects. Is there any evidence that it regressed recently?

Dave Huseby [:huseby]

Assignee

Comment 7

•

10 years ago

It's not so much the convertAppsArray() that's the problem.  The last known good build in the regression range for the bug, the Settings app doesn't experience a large number of convertAppsArray() calls during startup.  The revision with the regression has the 400ms of convertAppsArray() calls in it.

[:fabrice] Fabrice Desré

Comment 8

•

10 years ago

(In reply to Dave Huseby [:huseby] from comment #7)
> It's not so much the convertAppsArray() that's the problem.  The last known
> good build in the regression range for the bug, the Settings app doesn't
> experience a large number of convertAppsArray() calls during startup.  The
> revision with the regression has the 400ms of convertAppsArray() calls in it.

That comes from a call to navigator.mozApps.getAll()

Arthur Chen [:arthurcc][inactive after 6/30]

Comment 9

•

10 years ago

getAll() was called when idling. I thought it should not impact the startup performance.

Dave Huseby [:huseby]

Assignee

Comment 10

•

10 years ago

This is related to Bug 958318.  The patch for that bug tried to be smarter about scanning for home screens and it ended up adding an idle observer to delay the scan for 4 seconds after becoming idle.  For some reason, that 4 second delay in idle isn't long enough and the resulting call to appsMgmt.getAll() happens smack in the middle of the Settings app startup.

I suspect that there is a deeper bug here that is requiring the 4 second delay on the idle timer.  I don't think we're detecting idle correctly.  I'm also skeptical that the idle delay is working properly.  There's no other way I can explain the appsMgmt.getAll() call happening so early in the app startup.

I have confirmed that it is the tryShowHomescreenSection() call that causes the 400ms delay.  I changed the idle timer to a timed callback set for 10 seconds and the 400ms delay caused by appsMgmt.getAll() goes away.

I need to look deeper into the addIdleTimer() to see if I can figure out if it is working correctly.

Dave Huseby [:huseby]

Assignee

Comment 11

•

10 years ago

I should say, the 10 second timer doesn't make the stall go away, it just moves it 10 seconds after launch.

Hubert Figuiere [:hub]

Comment 12

•

10 years ago

Can't we use a more deterministic "we are read" event?

Kevin Grandon :kgrandon

Comment 13

•

10 years ago

Hey Guys - needinfoing on a few people as I'm not sure who would know this best. Basically we are seeing that navigator.addIdleObserver is extending the mozbrowserloadend event, which is causing delays in our datazilla measurements. My assumption would be that the mozbrowserloadend would fire at the same time, and that the idleObserver would be fired after that, but maybe my assumptions are wrong.

I was wondering if you guys could verify this - and let us know if that is the expected behavior or not. Also if someone knows the mozbrowserloadend event better, perhaps we could loop them in? Thanks!

Flags: needinfo?(kchen)

Flags: needinfo?(alive)

Dave Huseby [:huseby]

Assignee

Comment 14

•

10 years ago

This bug also exposes the current bad solution for figuring out how many homescreen apps are installed on the device.  Right now we have to go through the navigator.mozApps.mgmt.getAll() call and then for each app returned, we test it to see if it is a homescreen app.  In the settings app we do this just to count the number of homescreen apps to see if there is more than one.

Since the navigator.mozApps.mgmt object handles the dispatching of the install/uninstall events for apps as well as implements the getAll() function, I think it would be appropriate to modify its API to include some more query-type methods that apps can use instead of calling getAll() and inspecting each app returned.

I think functions like getAppsByType(<type>, <callback_fn>) and getAppCountByType(<type>) would be great to have.  I would save the Settings app a ton of time trying to figure out the total count of homescreen apps.  I'm sure it would be useful in other contexts as well.

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 15

•

10 years ago

Take a look at the pull request in bug 924565.  It modified some of our tests to use a pre-loaded array of app objects to avoid creating new app DOM objects repeatedly.  Could that help here?

Dave Huseby [:huseby]

Assignee

Comment 16

•

10 years ago

Alright, I can't explain this...  I used b2gperf to gather cold_load_time data for the Settings app.  Here's the results:

1. cold_load_time: median:3716, mean:3603, std: 425, max:4002, min:1550
2. cold_load_time: median:3643, mean:3594, std: 458, max:3936, min:1310
3. cold_load_time: median:3709, mean:3604, std: 463, max:4092, min:1289
4. cold_load_time: median:3679, mean:3590, std: 469, max:4050, min:1300

#1 is the default code with the idle timer delay of 4 seconds.  #2 is the idle timer with a 5 second delay.  #3 is the idle timer with a 10 second delay.  #4 is with the idle timer completely commented out so the call never happens.

See my latest attachment for a screenshot of the cleopatra screen of the profile data with the idle timer commented out.  There's no stall at all and yet the cold_load_time seems to be unchanged.

What's going on here?  The only thing I can think of is that the callback that scans for homescreen apps isn't affecting the cold load time numbers and is therefore not the cause of the regression at all.

hrm...

Dave Huseby [:huseby]

Assignee

Comment 17

•

10 years ago

Attached image no_getAll_call.png — Details

profile data with idle timer commented out.

(inactive after 6/18) Alive Kuo [:alive]

Comment 18

•

10 years ago

AFAIK loadend is the representation of stop state of web progress. Idle observer should have nothing to do with network request progress.

Flags: needinfo?(alive)

Kan-Ru Chen [:kanru] (UTC+9)

Updated

•

10 years ago

Flags: needinfo?(kchen)

Kan-Ru Chen [:kanru] (UTC+9)

Comment 19

•

10 years ago

Like Alive said, AddIdleObserver shouldn't delay browserloadend. Maybe you did other things that require additional resource load or page reflow that will delay browserloadend.

Dave Huseby [:huseby]

Assignee

Comment 20

•

10 years ago

I think the best solution for this regression is to fix the Apps.mgmt.getAll() function to take an optional filter parameter (e.g. { role: String }) that is used to filter down the resulting set of App objects passed back in onsuccess callbacks.  Then we won't have to resort to trying to delay this code from executing until after the UI is up and functioning.

Depends on: 972076

Ivan Tsay (:ITsay)

Comment 21

•

10 years ago

Hi Mike,

Mind if you can help to triage this perf bug? Thanks.

Flags: needinfo?(mlee)

Mike Lee [:mlee]

Updated

•

10 years ago

blocking-b2g: 1.4? → 1.4+

Flags: needinfo?(mlee)

Whiteboard: [c=progress p=2 s= u=] [b2gperf_regression] → [c=progress p=2 s= u=1.4] [b2gperf_regression]

Dave Huseby [:huseby]

Assignee

Updated

•

10 years ago

Depends on: 972075

Dave Huseby [:huseby]

Assignee

Updated

•

10 years ago

No longer depends on: 972075

Kyle Lahnakoski [:ekyle]

Comment 22

•

10 years ago

so I can see this in datazilla:

https://datazilla.mozilla.org/b2g/?branch=master&device=hamachi&range=90&test=cold_load_time&app_list=settings&app=settings&gaia_rev=e5cb62e63b3c2fbe&gecko_rev=23792bbdb46f25a0&plot=median

Dave Huseby [:huseby]

Assignee

Comment 23

•

10 years ago

I was trying to run the b2gperf cold_launch_time test against master today and marionette keeps crashing.  something changed in gaia that breaks the marionette tests:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/b2gperf/b2gperf.py", line 256, in measure_app_perf
    test.run()
  File "/usr/local/lib/python2.7/dist-packages/b2gperf/b2gperf.py", line 354, in run
    self.setup()
  File "/usr/local/lib/python2.7/dist-packages/b2gperf/b2gperf.py", line 401, in setup
    B2GPerfTest.setup(self)
  File "/usr/local/lib/python2.7/dist-packages/b2gperf/b2gperf.py", line 343, in setup
    gaiatest.LockScreen(self.marionette).unlock()
  File "/usr/local/lib/python2.7/dist-packages/gaiatest/gaia_test.py", line 42, in unlock
    result = self.marionette.execute_async_script('GaiaLockScreen.unlock()')
  File "/usr/local/lib/python2.7/dist-packages/marionette/marionette.py", line 1073, in execute_async_script
    filename=os.path.basename(frame[0]))
  File "/usr/local/lib/python2.7/dist-packages/marionette/marionette.py", line 577, in _send_message
    self._handle_error(response)
  File "/usr/local/lib/python2.7/dist-packages/marionette/marionette.py", line 612, in _handle_error
    raise JavascriptException(message=message, status=status, stacktrace=stacktrace)
JavascriptException: TypeError: window.wrappedJSObject.LockScreen.unlock is not a function
        stacktrace:
        execute_async_script @gaia_test.py, line 42
        inline javascript, line 882
        src: "        window.wrappedJSObject.LockScreen.unlock();"

Dave Huseby [:huseby]

Assignee

Comment 24

•

10 years ago

Apps are crashing all over the place.  Homescreen crashes over and over.  I can't get Settings to launch.  This all starts when I enable marionette.

Hubert Figuiere [:hub]

Comment 25

•

10 years ago

Do you have the latest b2gperf?

Dave Huseby [:huseby]

Assignee

Comment 26

•

10 years ago

That helped.  It's still crashing when building with both MOZ_PROFILING=1 and ENABLE_MARIONETTE=1.  See Bug 978458.

Dave Huseby [:huseby]

Assignee

Comment 27

•

10 years ago

I just built master tip and re-ran the b2perf to get some more recent data:

Results for Settings, cold_load_time: median:1175, mean:1199, std: 55, max:1417, min:1155

As you'll note, it is MUCH faster now.  I checked the datazilla graphs and the numbers are back in line with previous data.  The regression has gone away.

Dave Huseby [:huseby]

Assignee

Comment 28

•

10 years ago

Since the regression is gone and datazilla's numbers are back to normal, I'm closing this as won't fix.

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → WONTFIX

Dave Huseby [:huseby]

Assignee

Updated

•

10 years ago

No longer depends on: 972076

400ms_regression_in_settings.png 10 years ago Dave Huseby [:huseby] 478.36 KB, image/png		Details
no_getAll_call.png 10 years ago Dave Huseby [:huseby] 485.97 KB, image/png		Details