Closed Bug 1450938 Opened 7 years ago Closed 6 years ago

[meta] Intermittent "Automation Error: mozprocess timed out after 1000 seconds running"

Categories

(Testing :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: whimboo, Unassigned)

References

Details

(Keywords: meta)

Lets have a meta bug for all of the known mozprocess timeout issues after 1000 seconds of runtime. Maybe it gives us a chance to figure out the underlying issue.
All the "Automation Error" failures for mozprocess actually are coming from mozharness: https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/base/script.py#1400 Nothing in that file has been changed in the last couple months which would be related to the timeout issues with mozprocess. But something which comes into my mind are the updates of the copy of mozprocess for mozharness. Maybe one of those caused a regression. https://hg.mozilla.org/mozilla-central/log/c44f60c43432d468639b5fe078420e60c13fd3de/testing/mozharness/mozprocess/processhandler.py I wonder if we should better forward the maximum execution timeout of 1000s to the harnesses itself, or have better fine-granded timeouts for them. It might / should help us to figure out if the hang is inside the harness code itself, or when those are controlling the Firefox binary. Geoff, what do you think?
Flags: needinfo?(gbrown)
Summary: [meta] Intermittent "mozprocess timed out after 1000 seconds running" → [meta] Intermittent "Automation Error: mozprocess timed out after 1000 seconds running"
Please also note my comment on bug 1444831 comment 6, where we had such a hang because of too much logging output of the geckodriver executable. Once we reduced the amount of logged lines the hang was gone. So maybe we are hitting the case in `Popen.wait()` due to the usage of PIPE.
(In reply to Henrik Skupin (:whimboo) from comment #1) > I wonder if we should better forward the maximum execution timeout of 1000s > to the harnesses itself, or have better fine-granded timeouts for them. It > might / should help us to figure out if the hang is inside the harness code > itself, or when those are controlling the Firefox binary. > > Geoff, what do you think? The reftest "mozprocess timed out after 1000" logs in bug 1436237 look very much like the crashtest/jsreftest "application timed out after 370" logs in bug 1441580: if the logs can be trusted, browser startup is not completing, and the harness is waiting for the browser. It seems like sometimes the harness 370 second timeout is reported correctly and sometimes that mechanism fails -- an intermittent fault in mozprocess, I suppose. At any rate, since we already have the 370 second timeout in the harnesses (is it actually in mozrunner?), I'm not sure what else/where else we can watch for timeouts. What do you have in mind?
Flags: needinfo?(gbrown)
See Also: → 1443654
Priority: -- → P3
Geoff, ok so what I miss to make further progress for reftests are screenshots similar to mochitests. Do we have a plan to get those added?
Flags: needinfo?(gbrown)
I filed bug 1443654 for that, but it seems more complicated than I had hoped for, and I'm not finding time to pursue it.
Flags: needinfo?(gbrown)
Can we please try to classify failures against the depending bugs per test harness instead of this meta bug? Currently this destroys the OF metrics. Thanks.
Flags: needinfo?(aryx.bugmail)
All classifications by the same person and explained it to them last week.
Flags: needinfo?(aryx.bugmail)
All dependencies of this meta bug have been fixed. As such I don't wee why we have to keep this bug open anytime longer. Closing as WFM.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.