Closed Bug 859065 Opened 7 years ago Closed 7 years ago

Avoid "command timed out: 1200 seconds without output, attempting to kill" by providing an inner xpcshell timeout of 5 minutes

Categories

(Testing :: XPCShell Harness, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
mozilla23

People

(Reporter: Paolo, Assigned: Paolo)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

When an asynchronous xpcshell test times out, we are only able to get the
message "command timed out: 1200 seconds without output, attempting to kill"
and see the name of the last test file (not test function) that was executed.

We can easily get more useful logs by forcing a shorter inner timeout for
each xpcshell file (5 minutes seems long enough). This way, we see the entire
log of the test that times out, and are able to figure out more precisely
what is going on.

The timeout applies to each xpcshell file individually.

Example:
https://tbpl.mozilla.org/php/getParsedLog.php?id=21441817&tree=Try&full=1

I can see that the "test_download_cancel_midway_restart" function is where
the test actually hangs.
Attached patch The patchSplinter Review
See comment 0 for the patch description.
Assignee: nobody → paolo.mozmail
Status: NEW → ASSIGNED
Attachment #734346 - Flags: review?(jmaher)
I don't really understand this patch.  Are we setting a 5 minute timer and if the test case is still running we time out?
(In reply to Joel Maher (:jmaher) from comment #2)
> I don't really understand this patch.  Are we setting a 5 minute timer and
> if the test case is still running we time out?

Yes, we do_throw when the timeout occurs, forcing the main test function to
quit before the external watchdog terminates the entire xpcshell suite. I can
add this as a comment to the patch if you think it makes things clearer.
so you assert that all tests will finish in 300 seconds instead of 1200 seconds?  This just shortens the failure time, if that is the case we could adjust the buildbot scripts.
When the outer 20 minutes timeout is hit, the entire test suite is terminated and
we don't even see the output of the failing test file.

With this patch, the outer timeout is ideally never hit, so we continue with
other tests, and we get the output from the test file that times out, for
example the part between >>>>>>> and <<<<<<< from the log in comment 0.
Comment on attachment 734346 [details] [diff] [review]
The patch

Review of attachment 734346 [details] [diff] [review]:
-----------------------------------------------------------------

this is good.  Please ensure this runs great on try server for all platforms.  I would suggest just running xpcshell (not the other tests) and then retrigger the X jobs a few times each to ensure this works well and hopefully to catch an error.
Attachment #734346 - Flags: review?(jmaher) → review+
Thank you for doing this :-)
Blocks: 778688
https://hg.mozilla.org/integration/mozilla-inbound/rev/47c5b47655c8
Target Milestone: --- → mozilla23
For the record, this patch catches timeouts in asynchronous tests, but does not
affect main thread hangs, that may still result in the message "1200 seconds
without output". If I understand correctly, bug 597064 will handle this case.
Blocks: 597064
https://hg.mozilla.org/mozilla-central/rev/47c5b47655c8
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Blocks: 869638
Blocks: 889317
You need to log in before you can comment on or make changes to this bug.