Closed Bug 910216 Opened 6 years ago Closed 2 years ago

Tests get slower unless you restart the device between each one.

Categories

(Firefox OS Graveyard :: Gaia::UI Tests, defect, P4)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: roy.collings, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [c=automation p= s= u=] unittest)

Attachments

(2 files)

Using todays Unagi v1-train build (unagi-ICS.eng.v1-train.Rel0.4.Sprint12.B-200.Gecko-66fed8d.Gaia-4b2f1a1) with the following script ("test_slowing.py"):

    from gaiatest import GaiaTestCase
    class TestGallery(GaiaTestCase):
        def setUp(self):
            GaiaTestCase.setUp(self)
        def test_gallery_view(self):
            return

... and running it with multiple times (8 times for the sake of this test):

	gaiatest --testvars=./testvars.json --address=localhost:2828 ./test_slowing.py


PROBLEM:
=======

Each time I run it, the test takes longer and longer to complete. If you restart the device it 'resets' the timings, but they start getting longer and longer again as soon as you start running multiple tests.


However, running "b2gperf Gallery" doesn't show the 'slowing' effect:

Results for Gallery, cold_load_time: avg:1706, max:2569, min:697, all:697,2188,1095,2243,1203,2185,1167,2296,1098,2278,1043,2296,983,2281,1154,2305,1090,2297,1118,2200,1010,2466,1041,2471,1047,2513,1239,2433,1186,2569

As "b2gperf" initializes gaiatest once then uses a loop that launches repetatively it looked like the issue was in the gaiatest setUp() stages somewhere. 

After a lot of trial-and-error, I found that commenting out the cleanUp() -> "if self.device.has_wifi:" section of gaia_test.py stopped the test getting slower each time it was executed.

Seems to specifically be the 'enable_wifi()' call that's causing it...

	With the entire section commented out (time in seconds to execute the script as described at the start of this bug):
	11.172s 8.173s 7.545s 8.380s 8.055s 8.283s 8.551s
	
	With just the "self.data_layer.enable_wifi()" uncommented (and the preceeding "if self.device.has_wifi:"):
	11.618s 8.388s 8.448s 8.957s 9.175s 9.857s 10.353s 10.414s

If I run exactly the same script using the build from 2 days ago (unagi-ICS.eng.v1-train.Rel0.4.Sprint12.B-202.Gecko-61e302a.Gaia-d71eeb0) the problem isn't there - the timings don't get longer an they are much quicker too:
	7.300s 4.602s 4.647s 4.426s 4.535s 4.644s 4.436s 4.416s
Keywords: regression
So I'm a bit confused by your builds.

The build "from 2 days ago" is tagged with gecko-61e302a.  This is https://git.mozilla.org/?p=releases/gecko.git;a=commit;h=61e302a35458ee70aeaaadc8c9b67d8a7943d99e, from yesterday.

The build "todays Unagi" is tagged with gecko-66fed8d.  This is https://git.mozilla.org/?p=releases/gecko.git;a=commit;h=66fed8dbac07ae7ff1d6042762058daaa353b30f, from Sunday.

Just using the commits, it looks like there *was* a perf problem, but that it no longer occurs.  Can you verify your builds again?  I think what you are seeing is just fallout from bug 779284, which has been backed out.
Ah yes, sorry (I was getting dizzy with all the switching around I was doing!) the build are the wrong way around in my report there.

To clarify:

'today' = unagi-ICS.eng.v1-train.Rel0.4.Sprint12.B-202.Gecko-61e302a.Gaia-d71eeb0

'2 days ago' = unagi-ICS.eng.v1-train.Rel0.4.Sprint12.B-200.Gecko-66fed8d.Gaia-4b2f1a1
I'm seeing this too and it's affecting our test runs in Jenkins.
A build usually takes 1h 42min to run, but, since yesterday, the builds are aborted because it takes longer than 2h 30min to run.

The issue first started in yesterday's v1-train build: 

Gecko  http://hg.mozilla.org/releases/mozilla-b2g18/rev/328b3b8158ee
Gaia   4b2f1a103d046c92d201e8fcfb1ae224f59e7cf1
BuildID 20130827041201
Version 18.0

I have to mention that commenting 'self.data_layer.enable_wifi()' didn't make the tests faster for me.
Last good v1-train build was:

Gecko  http://hg.mozilla.org/releases/mozilla-b2g18/rev/c6f7c8ffc535
Gaia   4b2f1a103d046c92d201e8fcfb1ae224f59e7cf1
BuildID 20130826041201
Version 18.0
My last day at Mozilla is tomorrow; it's unlikely I'm the right person to investigate this.  Maybe khuey can help.
Flags: needinfo?(justin.lebar+bug) → needinfo?(khuey)
Are you running these tests on debug or opt builds?
Flags: needinfo?(khuey)
The obvious thing to try is removing the change in DOMWifiManager.js (http://hg.mozilla.org/releases/mozilla-b2g18/rev/bcd70a871ade#l10.11) and seeing if that improves things.  It's conceivable, though unlikely, that implementing those interfaces causes us to take a slower path somewhere else ...
Attached file aboutmem.zip
I can confirm that the changes for Bug 900221 are what's causing the performance regression with wifi. I tested with changeset 	e51b8e012b43 which was very speedy with the test_slowing.py tests, and then I tested with e85db8ff0a7c which was progressively slower.

I've attached the about-memory dump. Who can look into this now that jlebar has left?
This is the about memory dump taken during the 20th or so iteration of test_slowing.py.
Duplicate of this bug: 915489
Duplicate of this bug: 908261
Keywords: perf
Whiteboard: [c=automation p= s= u=]
Do we know if this is still a problem?
Flags: needinfo?(roy.collings)
Flags: needinfo?(moz.teodosia)
Both the ni? are to people who no longer work on the project!

Anyone else want to take this and get some fresh data?
Flags: needinfo?(roy.collings)
Flags: needinfo?(moz.teodosia)
Flags: needinfo?
Somebody in Gaia-UI team take this task
Component: Marionette → Gaia::UI Tests
Flags: needinfo?
Product: Testing → Firefox OS
I ran similar tests last month, and we'll see memory leaks in parts of setup, like turning wifi on and off, but from what I could see it wasn't marionette or gaiatest persisting any memory, it was additional memory being accumulated elsewhere in b2g.

I filed https://bugzilla.mozilla.org/show_bug.cgi?id=931045 to help us track these slowdowns, and as part of that work, I'd like to make tests for each set up step we do, and file appropriate bugs against b2g for leaked memory.
Mdas does 931045 dupe/obsolete this bug? can we mark this as a dupe or invalid?
(In reply to Zac C (:zac) from comment #19)
> Mdas does 931045 dupe/obsolete this bug? can we mark this as a dupe or
> invalid?

Well, this bug is about how you need to restart or else we'll get slower, specifically in the case of enabling/disabling wifi, and Bug 931045 isn't about fixing that, but about making sure the slowdown isn't from marionette/gaiatest, and to make sure that if we do slow down, we can find the regression that caused it. 

For this bug, I think we need to verify that the enabling/disabling of wifi is still causing a leak, and then get someone on the Wifi team to look at the memorydump.
Team this one would be an easy one to test using a Jenkins adhoc job and a long loop.

Make sure we're using Marionette 0.7.2 which fixes the test duration in the HTML report.
Priority: -- → P2
Priority: P2 → P4
Whiteboard: [c=automation p= s= u=] → [c=automation p= s= u=] unittest
QA Whiteboard: [fxosqa-auto-backlog-]
Depends on: 931045
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.