908261 - Performance regression when running Marionette commands

Reporter

Description

•

12 years ago

I'm not sure if this is Marionette related yet, but we noticed that our B2G performance tests doubled in total run time recently. The results themselves do not appear to have been affected. I added a bunch of debug logging to the tool that's used to run the tests, and found the following: Last good build: 2013-08-22 00:34:14 | DEBUG | Killing all running apps 2013-08-22 00:34:14 | DEBUG | Disabling WiFi 2013-08-22 00:34:14 | DEBUG | Waiting for 5 seconds ... 2013-08-22 01:54:38 | DEBUG | Killing all running apps 2013-08-22 01:54:39 | DEBUG | Disabling WiFi 2013-08-22 01:54:40 | DEBUG | Waiting for 5 seconds First bad build: 2013-08-22 07:39:54 | DEBUG | Killing all running apps 2013-08-22 07:39:54 | DEBUG | Disabling WiFi 2013-08-22 07:39:55 | DEBUG | Waiting for 5 seconds ... 2013-08-22 09:42:57 | DEBUG | Killing all running apps 2013-08-22 09:43:05 | DEBUG | Disabling WiFi 2013-08-22 09:43:09 | DEBUG | Waiting for 5 seconds The above shows a snippet of the console log at the beginning of the test run, and at the end. You can see that in the 'good' build the command go from completing in within one second to being separated by a second. In the 'bad' build they increase by several seconds. We launch at least 10 apps during the tests, and repeat each launch 30 times, so this small increase is enough to make a large impact. The commands in question simply execute JavaScript snippets to kill all running apps, and disable the wifi. Of course, it may be something else that's causing these to run slowly, and may not simply be due to us calling these commands many times. Last good build: Gaia revision: 1a38ad4df2edb9b608f77f1691cb6c4450f98d8e Gecko revision: a62c3344f16902abe6a924fd96eadd2ea14672a2 First bad build: Gaia revision: 1a38ad4df2edb9b608f77f1691cb6c4450f98d8e Gecko revision: f91fb63da2d6cead5c03f8d075afce68d7fe1921

Jonathan Griffin (:jgriffin)

Comment 1

•

12 years ago

That's bad. Unfortunately, there was a significant Marionette landing made between the good and bad builds, the modal dialog support from bug 779284, so my first guess is that this introduced some sort of leak. Malini, can you make a build with your change backed out and see if this problem still reproduces?

Malini Das [:mdas] - Away, not checking bugmail

Comment 2

•

12 years ago

(In reply to Jonathan Griffin (:jgriffin) from comment #1) > That's bad. > > Unfortunately, there was a significant Marionette landing made between the > good and bad builds, the modal dialog support from bug 779284, so my first > guess is that this introduced some sort of leak. > > Malini, can you make a build with your change backed out and see if this > problem still reproduces? Sure, Dave, how do I run these tests?

Dave Hunt [:davehunt] [he/him] ⌚BST

Reporter

Comment 3

•

12 years ago

(In reply to Malini Das [:mdas] from comment #2) > Sure, Dave, how do I run these tests? To run the same as our CI you can pip install b2gperf and execute: $ b2gperf --delay=5 Phone Contacts Messages Settings Gallery Video Music Camera Email Calendar Browser Clock FM Radio Usage However if you would like the additional logging, you can install the latest from https://github.com/mozilla/b2gperf and include the command line argument --log-level=DEBUG Also, the above command will take a while to run (~3hrs). You can reduce the delay and settle time using the --delay and --settle-time arguments, and can avoid a restart using --no-restart, however I'm unsure how these will affect your ability to replicate the issue. You can also reduce the number of apps under test.

Mike Lee [:mlee]

Updated

•

12 years ago

Keywords: perf

Whiteboard: [c= p= s= u=]

Jonathan Griffin (:jgriffin)

Comment 4

•

12 years ago

Dave, can you confirm that this test is restarting B2G between each app launch? If that's the case, I don't see how this could be a Marionette (or gecko) issue at all, since restarting B2G would wipe the slate clean, memory-leak wise. Also, do you notice this same regression in mozilla-b2g18, or does it just affect mozilla-central?

Dave Hunt [:davehunt] [he/him] ⌚BST

Reporter

Comment 5

•

12 years ago

(In reply to Jonathan Griffin (:jgriffin) from comment #4) > Dave, can you confirm that this test is restarting B2G between each app > launch? If that's the case, I don't see how this could be a Marionette (or > gecko) issue at all, since restarting B2G would wipe the slate clean, > memory-leak wise. > > Also, do you notice this same regression in mozilla-b2g18, or does it just > affect mozilla-central? Yes, the b2g process is stopped/started between each block of 30 iterations. It's just affecting mozilla-central.

Malini Das [:mdas] - Away, not checking bugmail

Comment 6

•

12 years ago

(In reply to Dave Hunt (:davehunt) from comment #5) > (In reply to Jonathan Griffin (:jgriffin) from comment #4) > > Dave, can you confirm that this test is restarting B2G between each app > > launch? If that's the case, I don't see how this could be a Marionette (or > > gecko) issue at all, since restarting B2G would wipe the slate clean, > > memory-leak wise. > > > > Also, do you notice this same regression in mozilla-b2g18, or does it just > > affect mozilla-central? > > Yes, the b2g process is stopped/started between each block of 30 iterations. > It's just affecting mozilla-central. I haven't had the time to look at this yet, and I likely will not before I go on PTO, but the modal dialog change was added to both b2g18 and central, so it shouldn't be the cause of this regression. Also, since b2g is being restarted periodically, marionette shouldn't be the cause since we don't persist anything past our log file.

Malini Das [:mdas] - Away, not checking bugmail

Comment 7

•

12 years ago

Ah, jgriffin has shown that it can be leaking frames: https://bugzilla.mozilla.org/show_bug.cgi?id=779284#c51

Jonathan Griffin (:jgriffin)

Comment 8

•

12 years ago

Backing the modal dialog commit out of mozilla-central reduced the runtime of the master perf jobs from about 3hr to about 1.5hr, so apparently this was the cause. :(

Dave Hunt [:davehunt] [he/him] ⌚BST

Reporter

Updated

•

12 years ago

Depends on: 779284

Jonathan Griffin (:jgriffin)

Comment 9

•

12 years ago

This problem is caused by 909129; specifically, the frame manager patch in bug 779284 breaks session teardown in such a way that the imported script cache is never cleared.

Depends on: 909129

Malini Das [:mdas] - Away, not checking bugmail

Comment 10

•

12 years ago

This bug dropped off my radar, but this slowdown issue is partly due to changes in WiFi management: https://bugzilla.mozilla.org/show_bug.cgi?id=910216#c10 There was no traction on it at the time, and since then, the Wifi code has changed. I'd like to confirm that the changes are still causing regression, and then file a new bug against this code.

Dave Hunt [:davehunt] [he/him] ⌚BST

Reporter

Comment 11

•

12 years ago

The b2gperf tests have changed significantly since this bug was raised. I haven't seen any recent regressions in performance, and the data in Jenkins shows no overall increase since October 17th (we only keep data for the last 100 builds). I would be inclined to resolved this as fixed.

Malini Das [:mdas] - Away, not checking bugmail

Comment 12

•

12 years ago

The issue seen here is going to be resolved with Bug 910216.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → DUPLICATE

Mike Lee [:mlee]

Updated

•

12 years ago

Whiteboard: [c= p= s= u=] → [c= p= s=2013.11.08 u=]

Bugzilla

Performance regression when running Marionette commands

Categories

(Firefox OS Graveyard :: General, defect)

Tracking

(Not tracked)

People

(Reporter: davehunt, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [c= p= s=2013.11.08 u=])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Comment 10

Comment 11

Comment 12

Updated