Closed Bug 953212 Opened 11 years ago Closed 10 years ago

Automation Error: mozprocess timed out after 330 seconds running ['make', 'test-integration', 'NPM_REGISTRY=http://npm-mirror.pub.build.mozilla.org', 'REPORTER=mocha-tbpl-reporter', 'TEST_MANIFEST=./shared/test/integration/tbpl-manifest.json']

Categories

(Testing :: General, defect)

x86
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

https://tbpl.mozilla.org/php/getParsedLog.php?id=32392445&tree=Mozilla-Inbound and a crapload more appear to be test timeouts but without any harness handling of timeouts, they sit until buildbot kills them after 20 minutes without output, not only making for useless failure output but also making a less-than-20-minute test suite take 40 minutes to finish.
Summary: gaia-integration needs to manage timeouts internally → gaia-integration needs to manage timeouts internally rather than relying on "command timed out: 1200 seconds without output, attempting to kill"
Blocks: 866909
Jonathan, please could you find an owner for this? :-)
Flags: needinfo?(jgriffin)
I have a hunch this owner will be me :p.. I think gaia integration tests are a priority so unless told otherwise I'll need to push some of my other work back in favor of this.
Assignee: nobody → ahalberstadt
Status: NEW → ASSIGNED
Hmm, I forgot that this uses the js marionette client. I still don't mind working on this, but it might take less time if someone more familiar with the code base owned it. Feel free to re-assign to me if there is no one else.
Assignee: ahalberstadt → nobody
Status: ASSIGNED → NEW
One thing we could do is add a mozprocess timeout to the command invocation in the mozharness script, to something like 300s.  This would reduce the time it takes for the suite to time out on buildbot.  It's just a band-aid, but it should help until James or Gareth find time to tackle this.
Flags: needinfo?(jgriffin)
This should help; I'm going to test this on ash.
Attachment #8356352 - Flags: review?(ahalberstadt)
Assignee: nobody → jgriffin
Status: NEW → ASSIGNED
Comment on attachment 8356352 [details] [diff] [review]
Run gaia-integration tests with a 330s output timeout,

Review of attachment 8356352 [details] [diff] [review]:
-----------------------------------------------------------------

We'll still need to fix the useful failure output problem, e.g kill and get stack etc., but yeah, this should help in the meantime.
Attachment #8356352 - Flags: review?(ahalberstadt) → review+
https://hg.mozilla.org/build/mozharness/rev/efd8f0cabadb

I'm unassigning myself now, since this doesn't fix the actual problem (it just makes its impact less severe).
Assignee: jgriffin → nobody
Merged mozharness (not getting CCed to this bug).
Summary: gaia-integration needs to manage timeouts internally rather than relying on "command timed out: 1200 seconds without output, attempting to kill" → gaia-integration needs to manage timeouts internally rather than relying on "command timed out: 1200 seconds without output, attempting to kill" or "timed out after 330 seconds of no output"
Status: ASSIGNED → NEW
Gareth, Gi is failing frequently enough that I'm seriously considering making it hidden by default on b2g-inbound. This bug is a major contributor to that. Can we please get some traction on this?
Flags: needinfo?(gaye)
Blocks: 960072
Updating summary now that the old timeout no longer occurs. We still need to fix this with proper harness timeouts that print the last test name seen etc.
Summary: gaia-integration needs to manage timeouts internally rather than relying on "command timed out: 1200 seconds without output, attempting to kill" or "timed out after 330 seconds of no output" → gaia-integration needs to manage timeouts internally (and print test names) rather than relying on mozprocess "timed out after 330 seconds of no output"
Summary: gaia-integration needs to manage timeouts internally (and print test names) rather than relying on mozprocess "timed out after 330 seconds of no output" → gaia-integration needs to manage timeouts internally (and print test names) rather than relying on "Automation Error: mozprocess timed out after 330 seconds running ['make', 'test-integration', .....]"
And we're back.

https://tbpl.mozilla.org/php/getParsedLog.php?id=36369960&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=36370853&tree=Mozilla-Inbound
Summary: gaia-integration needs to manage timeouts internally (and print test names) rather than relying on "Automation Error: mozprocess timed out after 330 seconds running ['make', 'test-integration', .....]" → Automation Error: mozprocess timed out after 330 seconds running ['make', 'test-integration', 'NPM_REGISTRY=http://npm-mirror.pub.build.mozilla.org', 'REPORTER=mocha-tbpl-reporter', 'TEST_MANIFEST=./shared/test/integration/tbpl-manifest.json']
Depends on: 992220
Note the last ~14 hrs worth of these failures on inbound were actually caused by bug 948269, which has since been backed out. We really need bug 992220 asap, so we can depreciate this bug and avoid missing things like this (bug 992220 had a try run that showed the failure, but was ignored since it matched this intermittent failure bug).
It seems this no longer happens on 2.0+. Jonathan - do you know if we are good here now?
Flags: needinfo?(jgriffin)
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(jgriffin)
Resolution: --- → WORKSFORME
Flags: needinfo?(gaye)
Component: New Frameworks → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: