Closed Bug 1318765 Opened 3 years ago Closed Last year

Intermittent TC Windows build Automation Error: mozprocess timed out after 4800 seconds running ['C:\\mozilla-build\\msys\\bin\\bash.exe', 'Z:\\task_1479490521\\build\\src\\mach', '--log-no-times', 'build', '-v']

Categories

(Taskcluster :: General, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: bulk-close-intermittents, intermittent-failure, Whiteboard: [stockwell unknown])

One big spike on autoland yesterday for several hours. Apparently fixed by backout https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=5c5b0537712323cf1d9610a6fe1b127c65902182.
See Also: → 1147271
One thing we've seen in the past that has caused problems is bad patches causing failures to load DLLs, which by default creates a dialog saying "Entry Point Not Found", or other crap like that. That will cause the build job to hang until it times out. In Firefox itself we call `SetErrorMode` to suppress these, but it's possible for it to happen at startup before we get to that code, or for us to hit the problem in other tools that don't do that. However! I found out today that the error mode is inherited by child processes by default, so if we simply did something like this in the mozharness script:
http://stackoverflow.com/a/985166/69326

We should be able to suppress those as a cause of hangs, at least.
Specifically (I just tested locally), doing this should suppress those dialogs as well as dialogs from trying to access invalid drive letters:
```
import ctypes
ctypes.windll.kernel32.SetErrorMode(0x8001)
```
See Also: → 1333949
I believe this spike yesterday was caused by a new taskcluster worker on the windows machines, that is being backed out.
Assignee: nobody → gbrown
Depends on: 1337807
Assignee: gbrown → nobody
Whiteboard: [stockwell infra]
This appears to happen for all of my try pushes lately (I appear to have logged all 6 failures from comment 10).  I am using artifact builds on try, if that matters here.
See Also: → 1311861
the recent failures look to be buildbot win32 and win64 builds.  This looks to be a new problem, and with 24 failures in one day, this seems like a higher priority to fix.

:ted, could you look at this or find someone on the build team to look into this?
Flags: needinfo?(ted)
Whiteboard: [stockwell infra] → [stockwell needswork]
This looks like it peaked and then fell off again. Maybe there was an infra issue related to this? If it shows up again I'll take another look at it.
Flags: needinfo?(ted)
Whiteboard: [stockwell needswork] → [stockwell unknown]
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → INCOMPLETE
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=168680007&repo=mozilla-inbound&lineNumber=36665

11:30:17     INFO -  rm -f jarlog/en-US.log
11:30:17     INFO -  mozmake.EXE[1]: Leaving directory 'z:/build/build/src/obj-firefox'
11:30:17     INFO -  mozmake.EXE[1]: Entering directory 'z:/build/build/src/obj-firefox'
11:30:17     INFO -  mozmake.EXE[1]: Leaving directory 'z:/build/build/src/obj-firefox'
11:30:17     INFO -  mozmake.EXE[1]: Entering directory 'z:/build/build/src/obj-firefox'
11:30:17     INFO -  mozmake.EXE[1]: Leaving directory 'z:/build/build/src/obj-firefox'
12:50:17     INFO - Automation Error: mozprocess timed out after 4800 seconds running ['C:\\mozilla-build\\msys\\bin\\bash.exe', 'z:\\build\\build\\src\\mach', '--log-no-times', 'build', '-v']
[taskcluster:error] Aborting task - max run time exceeded!
[taskcluster:error]    Exit Code: 0
[taskcluster:error] Success Code: 0x4
[taskcluster:error]    User Time: 15.625ms
[taskcluster:error]  Kernel Time: 0s
[taskcluster:error]    Wall Time: 2h52m10.2739998s
[taskcluster:error]  Peak Memory: 1679360
[taskcluster:error]       Result: MAX_RUNTIME_EXCEEDED
[taskcluster 2018-03-17T13:46:06.231Z] === Task Finished ===
[taskcluster 2018-03-17T13:46:06.231Z] Task Duration: 2h59m58.779998s
[taskcluster:error] Uploading error artifact public/build from file public/build with message "Could not read directory 'Z:\\task_1521282458\\public\\build'", reason "file-missing-on-worker" and expiry 2019-03-17T10:45:43.983Z
[taskcluster:error] TASK FAILURE during artifact upload: file-missing-on-worker: Could not read directory 'Z:\task_1521282458\public\build'
[taskcluster 2018-03-17T13:46:07.498Z] Uploading artifact public/logs/certified.log from file generic-worker\certified.log with content encoding "gzip", mime type "text/plain; charset=utf-8" and expiry 2019-03-17T10:45:43.983Z
[taskcluster 2018-03-17T13:46:08.120Z] Uploading artifact public/chainOfTrust.json.asc from file generic-worker\chainOfTrust.json.asc with content encoding "gzip", mime type "text/plain; charset=utf-8" and expiry 2019-03-17T10:45:43.983Z
[taskcluster 2018-03-17T13:46:08.718Z] Uploading redirect artifact public/logs/live.log to URL https://queue.taskcluster.net/v1/task/GNtTkMecRsGbaHME7XP7tA/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2019-03-17T10:45:43.983Z
[taskcluster:error] Task not successful due to following exception(s):
[taskcluster:error] Exception 1)
[taskcluster:error] []
[taskcluster:error] 
[taskcluster:error] Exit code: 0
[taskcluster:error] Exception 2)
[taskcluster:error] file-missing-on-worker: Could not read directory 'Z:\task_1521282458\public\build'
[taskcluster:error]
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: REOPENED → RESOLVED
Closed: 2 years agoLast year
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.