Closed Bug 1249158 Opened 8 years ago Closed 8 years ago

Autophone mochitests use a very long no-output timeout

Categories

(Testing Graveyard :: Autophone, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

:bc noted recently that a hung dom/media mochitest caused the autophone job to time out with the 90 minute "application ran for longer than maximum" time-out. We don't see that behavior in the mozharness mochitests because hung tests time-out after 330 seconds of not producing output (the "no-output timeout").

We should consider updating the autophone no-output timeout to match the mozharness mochitest behavior.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ef19173a585f&filter-tier=1&filter-tier=2&filter-tier=3&exclusion_profile=false added some relevant logging.

On autophone, it looks like timeout=4800 is passed to runtestsremote.py; in mozharness jobs, no timeout option is passed to runtestsremote.py (it uses defaults).

I think the autophone timeout comes from the "time_out" parameter in configs/unittest-defaults.ini, but configs/unittest-defaults.ini.example shows time_out = 2400, so there is a minor mystery here.

:bc -- Is there history here? Is the timeout specification accidental, or is it needed for some reason? Is time_out = 4800 in the production unittest-defaults.ini?
Flags: needinfo?(bob)
It's been a while since I set those. I don't think they are used since the timeout is hardcoded in https://dxr.mozilla.org/mozilla-central/source/build/mobile/remoteautomation.py#290.

I'll take a look tonight and see what effect changing these values in the config files has on the tests.
(In reply to Bob Clary [:bc:] from comment #2)

https://dxr.mozilla.org/mozilla-central/source/build/mobile/remoteautomation.py#290 is the 90 minute "application timeout", which means, if the browser has been running tests for 90 minutes, kill it.

There's another time-out, the "no-output timeout" which means, if there has been no new output in the test log for X seconds, kill it. That's handy because you can usually set X to about 5 minutes safely, and then fail much faster when there's a hang. If you don't pass the timeout option to runtestsremote.py, I think you'll start using the default no-output timeout, 330 seconds.

https://dxr.mozilla.org/mozilla-central/source/build/mobile/remoteautomation.py#105
I changed the time_out in the unittest_defaults.ini to 900 and started Mdm for my local devices.

The Nexus One crashed but finished the test in 3 minutes. The GS3 crashed and finished in 20 minutes. It timed out after running for 34 minutes on my 6P: https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=5658512697c4&exclusion_profile=false&filter-tier=1&filter-tier=2&filter-tier=3&filter-searchStr=autophone&selectedJob=19299725

I'll try with a couple more values and see how it looks. If this looks promising we can commit a change to the time_out parameter and do a try run in production.
See Also: → 1247027
I think you are right.

300 seconds -> https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=c8c3a6292311&exclusion_profile=false&filter-tier=1&filter-tier=2&filter-tier=3&filter-searchStr=autophone&selectedJob=19310789 the Nexus 6P took 24 minutes. Still too long to let run in production but much better.

4800 seconds -> https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=5626cc714fb1&exclusion_profile=false&filter-tier=1&filter-tier=2&filter-tier=3&filter-searchStr=autophone&selectedJob=19319394 the Nexus 6P is at 40 minutes and counting.

I have to run into town for an errand, but when I get back I'll change the time out on the servers and do a try run.

Thanks for finding this.
Ok, that looks much better. 10 minutes for nexus 4 and 5, 30 for nexus 6. the nexus 9 is suffering backlog due to battery issues. Still too long to run these on real branches in production though.

Anything else you want to do here?
Flags: needinfo?(bob)
Assignee: nobody → gbrown
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.