Closed Bug 1443654 Opened 7 years ago Closed 3 years ago

Poor diagnostics when mozharness script times out waiting for harness

Categories

(Release Engineering :: Applications: MozharnessCore, enhancement, P2)

enhancement

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

Various current intermittent failure bugs occur when a test harness hangs; eventually mozharness notices and reports "Automation Error: mozprocess timed out after X seconds running ...". For example: bug 1436237 bug 1391545 bug 1441401 This originates at https://dxr.mozilla.org/mozilla-central/rev/709eae4e54ffa3f3518745516dd5d27a05255af2/testing/mozharness/mozharness/base/script.py#1383 These bugs are notoriously difficult to resolve. Could we get better diagnostics? At least a screenshot?
I have not made progress on the screenshot idea. I realized that we'd need to introduce some sort of screenshot binary to the mozharness configuration and ensure that those screenshot utilities were available on the test machines -- added to the docker images, etc? That seems like a fair amount of work...and I'm not getting around to it.
I had another idea: On mozharness timeout, instead of just killing the unresponsive harness, try sending SIGINT or similar, to try to coax a python traceback from the harness. I prototyped that but had no luck: After the timeout, mozprocess is no longer collecting or reporting output, so any extra output is just lost anyway.
See Also: → 1450938
I'm still interested in this - at least the screenshot on timeout - but not finding time for it.
Assignee: gbrown → nobody
Assignee: nobody → gbrown
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.