Closed Bug 992220 Opened 10 years ago Closed 10 years ago

gaia-integration should handle test timeouts gracefully, outputting a failure message that includes the test name

Categories

(Firefox OS Graveyard :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: emorley, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: sheriffing-P1)

All other test suites currently handle test timeouts, outputting a failure message along the lines of:
TST-UNEXPECTED-FAIL | test-file-path-foo | Test timed out
or
TST-UNEXPECTED-FAIL | test-file-path-foo | Test timed out after 330 seconds with no output

gaia-integration does not do this, and instead outputs:
{
07:00:11     INFO -  TEST-START | Contacts > Form Click phone number Add a simple contact
07:00:24     INFO -  TEST-PASS | Contacts > Form Click phone number Add a simple contact
07:00:24     INFO -  TEST-END | Contacts > Form Click phone number Add a simple contact
07:00:24     INFO -  TEST-PENDING | Contacts > Form Click phone number Can create custom label
07:00:24     INFO -  TEST-END | Contacts > Form Click phone number Can create custom label
07:00:24     INFO -  TEST-START | Contacts > Form Facebook contacts Add phone number from Dialer to existing Facebook contact
07:05:54     INFO - Automation Error: mozprocess timed out after 330 seconds running ['make', 'test-integration', 'NPM_REGISTRY=http://npm-mirror.pub.build.mozilla.org', 'REPORTER=mocha-tbpl-reporter', 'TEST_MANIFEST=./shared/test/integration/tbpl-manifest.json']
07:05:55    ERROR - timed out after 330 seconds of no output
07:05:55    ERROR - Return code: -9
07:05:55     INFO - TinderboxPrint: gaia-integration-tests: <em class="testfail">T-FAIL</em>
07:05:55    ERROR - Tests exited with return code -9: harness failures
07:05:55    ERROR - # TBPL FAILURE #
07:05:55     INFO - Running post-action listener: _resource_record_post_action
07:05:55     INFO - Running post-run listener: _resource_record_post_run
07:05:56     INFO - Total resource usage - Wall time: 779s; CPU: 46.0%; Read bytes: 109092864; Write bytes: 1693396992; Read time: 31840; Write time: 8234968
07:05:56     INFO - install - Wall time: 29s; CPU: 100.0%; Read bytes: 0; Write bytes: 114384896; Read time: 0; Write time: 660068
07:05:56     INFO - run-tests - Wall time: 750s; CPU: 44.0%; Read bytes: 108515328; Write bytes: 1565970432; Read time: 31696; Write time: 7515164
07:05:56     INFO - Copying logs to upload dir...
07:05:56     INFO - mkdir: /builds/slave/test/build/upload/logs
program finished with exit code 2
elapsedTime=866.268347
========= Finished '/tools/buildbot/bin/python scripts/scripts/gaia_integration.py ...' failed (results: 2, elapsed: 14 mins, 26 secs) (at 2014-04-04 07:05:56.161891) =========
}

The generic "Automation Error: mozprocess timed out after 330 seconds ..." line just leads to all timeouts being dumped into a generic bug (in this case bug 953212) and specific tests not being held accountable.

Gracefully handling timeouts is required by:
https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy#6.29_Outputs_failures_in_a_TBPL-starrable_format

Notably: "Exceptions & timeouts must be handled with appropriate log output (eg: the failure line must state in which test the timeout occurred, not just that the entire run has timed out)."

I won't be hiding the gaia-integration tests immediately, but we need something done here, since the bandaid created in bug 953212 comment 47 onwards (in the first week of January) hasn't been followed up by anything else.
Blocks: 778688
Keywords: sheriffing-P1
Note this bug just caused a failure in bug 948269's try run to not be noticed and cause a half day of breakage in this test suite on inbound.
James, would you mind finding an owner for this? :-)
Flags: needinfo?(jlal)
The current plan is to stand these on top of a common runner, based on the exiting Marionette one.  When that's done, we'll get timeout handling "for free".  Ahal, is there a bug for that work we can link to?
Flags: needinfo?(jlal)
So far just bug 990635, I can file a separate one for tracking progress on the python side if it makes sense. To clarify, I don't plan on trying to shoe-horn this into the existing marionette runner, I think that would be messy and probably wouldn't make a lot of sense.

I do plan on creating a simplistic runner based on the same components that the marionette runner is though (e.g mozbase), timeouts should still be simple.
Great, thank you :-)
cc'ing a few relevant folks; we really need this bug fixed for gaia-integration tests to fully compliant with the visibility guidelines.
I haven't seen output like this in a long time, and I believe timeouts should now be working properly. When we timeout, we should now see some output like: https://tbpl.mozilla.org/php/getParsedLog.php?id=46364924&tree=B2g-Inbound

Gareth/Ed - Can you verify if this is fixed for Gij? I'm not sure of the commit/bug that would've fixed this.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(gaye)
Flags: needinfo?(emorley)
Resolution: --- → WORKSFORME
Looking at the last ~8 days of https://tbpl.mozilla.org/?showall=1&jobname=gaia-integration I can't see any failures of the form in comment 0, though there weren't any timeouts in that range. Happy to call this fixed until proven otherwise :-)
Flags: needinfo?(emorley)
Flags: needinfo?(gaye)
You need to log in before you can comment on or make changes to this bug.