Closed Bug 952640 Opened 7 years ago Closed 7 years ago

400ms launch regression on datazilla for Settings (Dec 20)

Categories

(Firefox OS Graveyard :: Gaia::Settings, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:1.4+)

RESOLVED WONTFIX
blocking-b2g 1.4+

People

(Reporter: bkelly, Assigned: huseby)

References

Details

(Keywords: perf, regression, Whiteboard: [c=progress p=2 s= u=1.4] [b2gperf_regression])

Attachments

(2 files)

The launch times for settings on datazilla spiked up to the 1200ms range today.  Regression range:

  https://github.com/mozilla-b2g/gaia/compare/8307f797d8209b...5f1d2cbb99f4a9

No gecko changes.  No change in OEM firmware.

Possibly due to APZC being enabled?  It could also just be a manifestation of bug 951185, although this would be a very extreme case.
Duplicate of this bug: 955931
Whiteboard: [c=automation p= s= u=] → [c=automation p= s= u=][b2gperf_regression]
Whiteboard: [c=automation p= s= u=][b2gperf_regression] → [c=progress p= s= u=] [b2gperf_regression]
Tim - this bug came up during the Performance teams sprint planning. Would you be able to find an assignee to verify if this is an actual regression, or solely the result of APZ and the settings application?
Flags: needinfo?(timdream)
Arthur, does anything ring the bell?
Flags: needinfo?(timdream) → needinfo?(arthur.chen)
This is the result of APZ but it only regressed settings app. I guess that's because no other apps have long panels as settings app.
Flags: needinfo?(arthur.chen)
Assignee: nobody → dhuseby
Status: NEW → ASSIGNED
Whiteboard: [c=progress p= s= u=] [b2gperf_regression] → [c=progress p=2 s= u=] [b2gperf_regression]
Blocks: 953001
This screenshot shows the 400ms regression in the Settings app launch time.  During launch, we're spending 400ms in convertAppsArray() in gecko's Webapps.js file.  I'm not sure exactly what the fix is for this.  I'm trying to optimize it now.
convertAppsArray() has always been slow because it's creating a lot of xpcom objects. Is there any evidence that it regressed recently?
It's not so much the convertAppsArray() that's the problem.  The last known good build in the regression range for the bug, the Settings app doesn't experience a large number of convertAppsArray() calls during startup.  The revision with the regression has the 400ms of convertAppsArray() calls in it.
(In reply to Dave Huseby [:huseby] from comment #7)
> It's not so much the convertAppsArray() that's the problem.  The last known
> good build in the regression range for the bug, the Settings app doesn't
> experience a large number of convertAppsArray() calls during startup.  The
> revision with the regression has the 400ms of convertAppsArray() calls in it.

That comes from a call to navigator.mozApps.getAll()
getAll() was called when idling. I thought it should not impact the startup performance.
This is related to Bug 958318.  The patch for that bug tried to be smarter about scanning for home screens and it ended up adding an idle observer to delay the scan for 4 seconds after becoming idle.  For some reason, that 4 second delay in idle isn't long enough and the resulting call to appsMgmt.getAll() happens smack in the middle of the Settings app startup.

I suspect that there is a deeper bug here that is requiring the 4 second delay on the idle timer.  I don't think we're detecting idle correctly.  I'm also skeptical that the idle delay is working properly.  There's no other way I can explain the appsMgmt.getAll() call happening so early in the app startup.

I have confirmed that it is the tryShowHomescreenSection() call that causes the 400ms delay.  I changed the idle timer to a timed callback set for 10 seconds and the 400ms delay caused by appsMgmt.getAll() goes away.

I need to look deeper into the addIdleTimer() to see if I can figure out if it is working correctly.
I should say, the 10 second timer doesn't make the stall go away, it just moves it 10 seconds after launch.
Can't we use a more deterministic "we are read" event?
Hey Guys - needinfoing on a few people as I'm not sure who would know this best. Basically we are seeing that navigator.addIdleObserver is extending the mozbrowserloadend event, which is causing delays in our datazilla measurements. My assumption would be that the mozbrowserloadend would fire at the same time, and that the idleObserver would be fired after that, but maybe my assumptions are wrong.

I was wondering if you guys could verify this - and let us know if that is the expected behavior or not. Also if someone knows the mozbrowserloadend event better, perhaps we could loop them in? Thanks!
Flags: needinfo?(kchen)
Flags: needinfo?(alive)
This bug also exposes the current bad solution for figuring out how many homescreen apps are installed on the device.  Right now we have to go through the navigator.mozApps.mgmt.getAll() call and then for each app returned, we test it to see if it is a homescreen app.  In the settings app we do this just to count the number of homescreen apps to see if there is more than one.

Since the navigator.mozApps.mgmt object handles the dispatching of the install/uninstall events for apps as well as implements the getAll() function, I think it would be appropriate to modify its API to include some more query-type methods that apps can use instead of calling getAll() and inspecting each app returned.

I think functions like getAppsByType(<type>, <callback_fn>) and getAppCountByType(<type>) would be great to have.  I would save the Settings app a ton of time trying to figure out the total count of homescreen apps.  I'm sure it would be useful in other contexts as well.
Take a look at the pull request in bug 924565.  It modified some of our tests to use a pre-loaded array of app objects to avoid creating new app DOM objects repeatedly.  Could that help here?
Alright, I can't explain this...  I used b2gperf to gather cold_load_time data for the Settings app.  Here's the results:

1. cold_load_time: median:3716, mean:3603, std: 425, max:4002, min:1550
2. cold_load_time: median:3643, mean:3594, std: 458, max:3936, min:1310
3. cold_load_time: median:3709, mean:3604, std: 463, max:4092, min:1289
4. cold_load_time: median:3679, mean:3590, std: 469, max:4050, min:1300

#1 is the default code with the idle timer delay of 4 seconds.  #2 is the idle timer with a 5 second delay.  #3 is the idle timer with a 10 second delay.  #4 is with the idle timer completely commented out so the call never happens.

See my latest attachment for a screenshot of the cleopatra screen of the profile data with the idle timer commented out.  There's no stall at all and yet the cold_load_time seems to be unchanged.

What's going on here?  The only thing I can think of is that the callback that scans for homescreen apps isn't affecting the cold load time numbers and is therefore not the cause of the regression at all.

hrm...
Attached image no_getAll_call.png
profile data with idle timer commented out.
AFAIK loadend is the representation of stop state of web progress. Idle observer should have nothing to do with network request progress.
Flags: needinfo?(alive)
Flags: needinfo?(kchen)
Like Alive said, AddIdleObserver shouldn't delay browserloadend. Maybe you did other things that require additional resource load or page reflow that will delay browserloadend.
I think the best solution for this regression is to fix the Apps.mgmt.getAll() function to take an optional filter parameter (e.g. { role: String }) that is used to filter down the resulting set of App objects passed back in onsuccess callbacks.  Then we won't have to resort to trying to delay this code from executing until after the UI is up and functioning.
Depends on: 972076
Hi Mike,

Mind if you can help to triage this perf bug? Thanks.
Flags: needinfo?(mlee)
blocking-b2g: 1.4? → 1.4+
Flags: needinfo?(mlee)
Whiteboard: [c=progress p=2 s= u=] [b2gperf_regression] → [c=progress p=2 s= u=1.4] [b2gperf_regression]
Depends on: 972075
No longer depends on: 972075
I was trying to run the b2gperf cold_launch_time test against master today and marionette keeps crashing.  something changed in gaia that breaks the marionette tests:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/b2gperf/b2gperf.py", line 256, in measure_app_perf
    test.run()
  File "/usr/local/lib/python2.7/dist-packages/b2gperf/b2gperf.py", line 354, in run
    self.setup()
  File "/usr/local/lib/python2.7/dist-packages/b2gperf/b2gperf.py", line 401, in setup
    B2GPerfTest.setup(self)
  File "/usr/local/lib/python2.7/dist-packages/b2gperf/b2gperf.py", line 343, in setup
    gaiatest.LockScreen(self.marionette).unlock()
  File "/usr/local/lib/python2.7/dist-packages/gaiatest/gaia_test.py", line 42, in unlock
    result = self.marionette.execute_async_script('GaiaLockScreen.unlock()')
  File "/usr/local/lib/python2.7/dist-packages/marionette/marionette.py", line 1073, in execute_async_script
    filename=os.path.basename(frame[0]))
  File "/usr/local/lib/python2.7/dist-packages/marionette/marionette.py", line 577, in _send_message
    self._handle_error(response)
  File "/usr/local/lib/python2.7/dist-packages/marionette/marionette.py", line 612, in _handle_error
    raise JavascriptException(message=message, status=status, stacktrace=stacktrace)
JavascriptException: TypeError: window.wrappedJSObject.LockScreen.unlock is not a function
        stacktrace:
        execute_async_script @gaia_test.py, line 42
        inline javascript, line 882
        src: "        window.wrappedJSObject.LockScreen.unlock();"
Apps are crashing all over the place.  Homescreen crashes over and over.  I can't get Settings to launch.  This all starts when I enable marionette.
Do you have the latest b2gperf?
That helped.  It's still crashing when building with both MOZ_PROFILING=1 and ENABLE_MARIONETTE=1.  See Bug 978458.
I just built master tip and re-ran the b2perf to get some more recent data:

Results for Settings, cold_load_time: median:1175, mean:1199, std: 55, max:1417, min:1155

As you'll note, it is MUCH faster now.  I checked the datazilla graphs and the numbers are back in line with previous data.  The regression has gone away.
Since the regression is gone and datazilla's numbers are back to normal, I'm closing this as won't fix.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
No longer depends on: 972076
You need to log in before you can comment on or make changes to this bug.