Gaia UI tests suffering from socket timeouts and Marionette status=500 errors (both on nightly and master jobs)

RESOLVED FIXED

Status

Testing
Marionette
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: stephend, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment)

(Reporter)

Description

5 years ago
Created attachment 710899 [details]
Test-run log

Our automation runs have become really unreliable lately, with on order of at least 15 errors (we used to have around 6-8), many of which are either socket timeouts or status=500 in Marionette, itself.

Here's a sample run: http://qa-selenium.mv.mozilla.com:8080/view/B2G/job/b2g.unagi.gaia.nightly.ui/245/console

I'm cramped for time right now, and there's a partial logcat up here for this particular run: http://qa-selenium.mv.mozilla.com:8080/view/B2G/job/b2g.unagi.gaia.nightly.ui/lastSuccessfulBuild/artifact/logcat.txt
Is this coincident with switching to the rel-eng builds?
(Reporter)

Comment 2

5 years ago
(In reply to Jonathan Griffin (:jgriffin) from comment #1)
> Is this coincident with switching to the rel-eng builds?

Doesn't seem to be; in fact, our nightly.ui job is using the one from releases, in fact (though it probably shouldn't be):

Copied 1 artifact from "b2g.unagi.download.releases" build number 126, which is:

http://qa-selenium.mv.mozilla.com:8080/job/b2g.unagi.download.releases/126/console

Downloading from: https://releases.mozilla.com/b2g/2013-02-06/unagi_2013-02-06_eng.zip
This shows Marionette getting out-of-sync somehow either during launching or killing of apps.
FYI, This has been happening from around the 30th of January
Whiteboard: [qa-automation-blocked]
(Reporter)

Updated

5 years ago
Depends on: 839675

Comment 5

5 years ago
Do we need developer help here from the gaia/devtools side?
(Reporter)

Comment 6

5 years ago
(In reply to Clint Talbert ( :ctalbert ) from comment #5)
> Do we need developer help here from the gaia/devtools side?

It wouldn't hurt, but we should land bug 839675, first, to see how much that helps us in the CI.  Also, we suspect Wi-Fi issues (which all of QA can reproduce manually with Mozilla Guest) that we should also work out too.
Depends on: 831149
Malini and I may have figured out what's going on.

There is a significant memory leak in SpecialPowers (bug 838786) that is difficult to fix because many mochitests rely on the current leaky implementation.  We think davehunt's recent changes to gaiatest to use SpecialPowers more often are causing this leak to cause out-of-memory problems on the device, which result in apps freezing, etc.

There is a patch in bug 831149 that reduces the impact of this somewhat for Marionette, but it wasn't uplifted to mozilla-b2g18; it just got approved so I'll push it now.

Other than that, the options to fix are:
1 - take the time to fix bug 838786
2 - mdas may try experimenting with using chrome calls instead of SpecialPowers to perform the same task
(Reporter)

Comment 8

5 years ago
(In reply to Jonathan Griffin (:jgriffin) from comment #7)
> Malini and I may have figured out what's going on.
> 
> There is a significant memory leak in SpecialPowers (bug 838786) that is

ITYM bug 825802, here.  Thanks so much for looking into this!
Depends on: 831367
So we recently took advantage of SpecialPowers to read/write settings without being in the context of the System app, and to create/remove contacts without being in the context of the Contacts app. Both of these could easily be reverted until we work out a way to perform these actions without needing to launch apps or switch frames.
(In reply to Jonathan Griffin (:jgriffin) from comment #7)
> 2 - mdas may try experimenting with using chrome calls instead of
> SpecialPowers to perform the same task

I removed the special_powers requirement, and I'm still seeing the bug. I filed a memleak bug about it here: Bug 841211
This no longer blocks automation as we're now restarting the b2g process between tests.
Whiteboard: [qa-automation-blocked]
Since we moved to restarting b2g, should the severity be moved back down to normal?
(Reporter)

Comment 13

5 years ago
(In reply to Malini Das [:mdas] from comment #12)
> Since we moved to restarting b2g, should the severity be moved back down to
> normal?

Sure, done.
Severity: critical → normal
I think this is fixed, please reopen if not.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.