Closed Bug 1349420 Opened 3 years ago Closed 2 years ago

Win64 | Automation Error: mozprocess timed out after 1000 seconds running ... easily

Categories

(Taskcluster :: General, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: ting, Assigned: gbrown)

References

(Blocks 1 open bug)

Details

This spins out from bug 1333003, the tests seem to be timed out easily even with normal windows x64 build on TaskCluster. I can't reproduce the timeout locally, and am trying to remote desktop to a loan machine per :grenade's help, but I haven't managed to connect to any.

Jobs:
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=c79d609647bcc939405fe8e19753020c52182513&filter-tier=1&filter-tier=2&filter-tier=3&group_state=expanded

  https://treeherder.mozilla.org/#/jobs?repo=try&revision=7fbb7bf724c009657ee0427ab870f939a764e705&filter-tier=1&filter-tier=2&filter-tier=3&group_state=expanded&selectedJob=85257391

Some logs:
  08:51:00     INFO -  525 INFO TEST-START | dom/base/test/test_audioNotification.html
  08:56:10     INFO -  JavaScript error: resource://gre/modules/FormHistory.jsm, line 373: NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIProperties.get]
  08:56:10     INFO -  JavaScript error: resource://gre/modules/PlacesUtils.jsm, line 1968: NS_ERROR_XPC_GS_RETURNED_FAILURE: Component returned failure code: 0x80570016 (NS_ERROR_XPC_GS_RETURNED_FAILURE) [nsIJSCID.getService]
  08:56:24     INFO -  TEST-INFO | started process screenshot
  09:13:04     INFO - Automation Error: mozprocess timed out after 1000 seconds running ['Z:\\task_1490085136\\build\\venv\\Scripts\\python', '-u', 'Z:\\task_1490085136\\build\\tests\\mochitest\\runtests.py', '--disable-e10s', '--total-chunks', '5', '--this-chunk', '1', '--appname=Z:\\task_1490085136\\build\\application\\firefox\\firefox.exe', '--utility-path=tests/bin', '--extra-profile-file=tests/bin/plugins', '--symbols-path=https://queue.taskcluster.net/v1/task/AWkDOxhmSuenQnL--9bLww/artifacts/public/build/firefox-55.0a1.en-US.win64-asan.crashreporter-symbols.zip', '--certificate-path=tests/certs', '--quiet', '--log-raw=Z:\\task_1490085136\\build\\blobber_upload_dir\\plain-chunked_raw.log', '--log-errorsummary=Z:\\task_1490085136\\build\\blobber_upload_dir\\plain-chunked_errorsummary.log', '--screenshot-on-fail', '--cleanup-crashes', '--chunk-by-dir=4']
  09:13:04    ERROR - timed out after 1000 seconds of no output

  09:13:29     INFO -  0 INFO *** Start BrowserChrome Test Results ***
  09:13:29     INFO -  GECKO(11456) | Completed ShutdownLeaks collections in process 11456
  09:13:29     INFO -  1 INFO TEST-START | Shutdown
  09:13:29     INFO -  TEST-INFO | started process screenshot
  09:19:14     INFO -  JavaScript error: resource://gre/modules/FormHistory.jsm, line 373: NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIProperties.get]
  09:19:14     INFO -  JavaScript error: resource://gre/modules/PlacesUtils.jsm, line 1968: NS_ERROR_XPC_GS_RETURNED_FAILURE: Component returned failure code: 0x80570016 (NS_ERROR_XPC_GS_RETURNED_FAILURE) [nsIJSCID.getService]
  09:35:54     INFO - Automation Error: mozprocess timed out after 1000 seconds running ['Z:\\task_1490087410\\build\\venv\\Scripts\\python', '-u', 'Z:\\task_1490087410\\build\\tests\\mochitest\\runtests.py', '--disable-e10s', '--total-chunks', '8', '--this-chunk', '5', '--appname=Z:\\task_1490087410\\build\\application\\firefox\\firefox.exe', '--utility-path=tests/bin', '--extra-profile-file=tests/bin/plugins', '--symbols-path=https://queue.taskcluster.net/v1/task/AWkDOxhmSuenQnL--9bLww/artifacts/public/build/firefox-55.0a1.en-US.win64-asan.crashreporter-symbols.zip', '--certificate-path=tests/certs', '--quiet', '--log-raw=Z:\\task_1490087410\\build\\blobber_upload_dir\\mochitest-devtools-chrome-chunked_raw.log', '--log-errorsummary=Z:\\task_1490087410\\build\\blobber_upload_dir\\mochitest-devtools-chrome-chunked_errorsummary.log', '--screenshot-on-fail', '--cleanup-crashes', '--flavor=browser', '--subsuite=devtools', '--chunk-by-runtime']
  09:35:54    ERROR - timed out after 1000 seconds of no output
Maybe pete can help?
Flags: needinfo?(pmoore)
https://public-artifacts.taskcluster.net/VnG93mVrT4G-f3yEufgePA/0/public/logs/live_backing.log

ProcessManager UNABLE to use job objects to manage child processes
ProcessManager NOT managing child processes

reminds me of bug 1303455.
I tried to dig into the logs here, but I couldn't tease out what's going wrong.  There are a few warnings/errors that I think are red herrings in the logs.
:grenade helped me to use loan-t-win10-64-gpu-01 instead with id/pwd for remote desktop, but the test jobs are pending now which I don't have an instance to rdp:

  https://treeherder.mozilla.org/#/jobs?repo=try&revision=f73f52f68a602d134efbecc0c4471f6c021bd635&filter-tier=1&filter-tier=2&filter-tier=3&group_state=expanded

  https://treeherder.mozilla.org/#/jobs?repo=try&revision=910732ee0f2a008bf99c6c707fdda013803c2029&filter-tier=1&filter-tier=2&filter-tier=3&group_state=expanded

This happened once last week, it seems was resolved and I received updated id/pwd from :grenade, but it happens again.
(In reply to Geoff Brown [:gbrown] from comment #2)
> https://public-artifacts.taskcluster.net/VnG93mVrT4G-f3yEufgePA/0/public/
> logs/live_backing.log
> 
> ProcessManager UNABLE to use job objects to manage child processes
> ProcessManager NOT managing child processes
> 
> reminds me of bug 1303455.

Thanks Geoff.

Technically, mozprocess *can* use job objects on Windows 10 (which supports nested jobs) but it doesn't, as it was written before nested job objects were a thing. So we were anticipating some fallout on Win 10 due to this in bug 1330718. See https://github.com/taskcluster/runlib/pull/2#pullrequestreview-17247538 for details. In short, mozprocess should ideally be updated to fix this. IIRC, If on Windows 8 or higher, mozprocess shouldn't care if a process is in a job group or not, because it can assign it to an additional job group, regardless.

Some further details in: https://msdn.microsoft.com/en-us/library/windows/desktop/hh448388(v=vs.85).aspx
Ted, do you know if any work is already underway in mozprocess to adapt job group handling based on windows version?
Flags: needinfo?(ted.clancy)
Flags: needinfo?(ted)
Flags: needinfo?(ted.clancy)
Flags: needinfo?(pmoore)
(In reply to Pete Moore [:pmoore][:pete] from comment #6)
> Ted, do you know if any work is already underway in mozprocess to adapt job
> group handling based on windows version?

Nobody is doing that work that I know of. I don't think it would be a large amount of work, though.
Flags: needinfo?(ted)
We have lots of "timed out after 1000 seconds" bugs open currently. This one seems inactive, so closing.
Assignee: nobody → gbrown
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.