Closed
Bug 1249158
Opened 8 years ago
Closed 8 years ago
Autophone mochitests use a very long no-output timeout
Categories
(Testing Graveyard :: Autophone, defect)
Testing Graveyard
Autophone
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gbrown, Assigned: gbrown)
References
Details
:bc noted recently that a hung dom/media mochitest caused the autophone job to time out with the 90 minute "application ran for longer than maximum" time-out. We don't see that behavior in the mozharness mochitests because hung tests time-out after 330 seconds of not producing output (the "no-output timeout"). We should consider updating the autophone no-output timeout to match the mozharness mochitest behavior.
Assignee | ||
Comment 1•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ef19173a585f&filter-tier=1&filter-tier=2&filter-tier=3&exclusion_profile=false added some relevant logging. On autophone, it looks like timeout=4800 is passed to runtestsremote.py; in mozharness jobs, no timeout option is passed to runtestsremote.py (it uses defaults). I think the autophone timeout comes from the "time_out" parameter in configs/unittest-defaults.ini, but configs/unittest-defaults.ini.example shows time_out = 2400, so there is a minor mystery here. :bc -- Is there history here? Is the timeout specification accidental, or is it needed for some reason? Is time_out = 4800 in the production unittest-defaults.ini?
Flags: needinfo?(bob)
Comment 2•8 years ago
|
||
It's been a while since I set those. I don't think they are used since the timeout is hardcoded in https://dxr.mozilla.org/mozilla-central/source/build/mobile/remoteautomation.py#290. I'll take a look tonight and see what effect changing these values in the config files has on the tests.
Assignee | ||
Comment 3•8 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #2) https://dxr.mozilla.org/mozilla-central/source/build/mobile/remoteautomation.py#290 is the 90 minute "application timeout", which means, if the browser has been running tests for 90 minutes, kill it. There's another time-out, the "no-output timeout" which means, if there has been no new output in the test log for X seconds, kill it. That's handy because you can usually set X to about 5 minutes safely, and then fail much faster when there's a hang. If you don't pass the timeout option to runtestsremote.py, I think you'll start using the default no-output timeout, 330 seconds. https://dxr.mozilla.org/mozilla-central/source/build/mobile/remoteautomation.py#105
Comment 4•8 years ago
|
||
I changed the time_out in the unittest_defaults.ini to 900 and started Mdm for my local devices. The Nexus One crashed but finished the test in 3 minutes. The GS3 crashed and finished in 20 minutes. It timed out after running for 34 minutes on my 6P: https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=5658512697c4&exclusion_profile=false&filter-tier=1&filter-tier=2&filter-tier=3&filter-searchStr=autophone&selectedJob=19299725 I'll try with a couple more values and see how it looks. If this looks promising we can commit a change to the time_out parameter and do a try run in production.
Assignee | ||
Comment 5•8 years ago
|
||
That looks right to me: https://autophone-dev.s3.amazonaws.com/pub/mozilla.org/mobile/tinderbox-builds/mozilla-inbound-android-api-15/1455800829/mochitest-dom-media-mochitests-dom-media-settings.ini-1-nexus-6p-8-f58763a6-0780-4e55-ab3a-fe3d320530ca.log 1763 INFO Test timed out. Remaining tests=street.mp4-2 1764 INFO TEST-OK | dom/media/test/test_unseekable.html | took 925308ms Compare to https://bugzilla.mozilla.org/show_bug.cgi?id=1247027#c11.
Comment 6•8 years ago
|
||
I think you are right. 300 seconds -> https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=c8c3a6292311&exclusion_profile=false&filter-tier=1&filter-tier=2&filter-tier=3&filter-searchStr=autophone&selectedJob=19310789 the Nexus 6P took 24 minutes. Still too long to let run in production but much better. 4800 seconds -> https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=5626cc714fb1&exclusion_profile=false&filter-tier=1&filter-tier=2&filter-tier=3&filter-searchStr=autophone&selectedJob=19319394 the Nexus 6P is at 40 minutes and counting. I have to run into town for an errand, but when I get back I'll change the time out on the servers and do a try run. Thanks for finding this.
Comment 7•8 years ago
|
||
Try run in production with 300s timeout: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7bd17a9edf11&exclusion_profile=false&filter-tier=1&filter-tier=2&filter-tier=3
Comment 8•8 years ago
|
||
Ok, that looks much better. 10 minutes for nexus 4 and 5, 30 for nexus 6. the nexus 9 is suffering backlog due to battery issues. Still too long to run these on real branches in production though. Anything else you want to do here?
Flags: needinfo?(bob)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → gbrown
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•2 years ago
|
Product: Testing → Testing Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•