Open
Bug 1325609
Opened 9 years ago
Updated 3 years ago
Intermittent xpcshell timeout during xpcom/tests/unit/test_nsIProcess.js
Categories
(Core :: XPCOM, defect, P3)
Core
XPCOM
Tracking
()
NEW
People
(Reporter: gbrown, Unassigned)
References
Details
(Whiteboard: [test disabled])
Bug 1317155 tracks intermittent xpcshell test timeouts. Nearly all of the recent failures occur in test_pipe.js on linux64/debug.
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=40761877#L2247
[task 2016-12-18T22:46:51.369651Z] 22:46:51 INFO - TEST-START | xpcom/tests/unit/test_pipe.js
[task 2016-12-18T23:03:31.421937Z] 23:03:31 INFO - Automation Error: mozprocess timed out after 1000 seconds running ['/home/worker/workspace/build/venv/bin/python', '-u', '/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py', '--total-chunks', '10', '--this-chunk', '10', '--symbols-path=/home/worker/workspace/build/symbols', '--test-plugin-path=/home/worker/workspace/build/application/firefox/plugins', '--log-raw=/home/worker/workspace/build/blobber_upload_dir/xpcshell_raw.log', '--log-errorsummary=/home/worker/workspace/build/blobber_upload_dir/xpcshell_errorsummary.log', '--utility-path=tests/bin', '--xpcshell=/home/worker/workspace/build/application/firefox/xpcshell', '--manifest=tests/xpcshell/tests/xpcshell.ini']
[task 2016-12-18T23:03:31.467854Z] 23:03:31 ERROR - timed out after 1000 seconds of no output
When test_pipe.js succeeds on linux64/debug, it completes in about 5 seconds; it seems likely that the test is intermittently hanging.
Comment 1•9 years ago
|
||
Might be related to landing bug 1134372. Does the increase in test_pipe.js failures correlate to 12/03/2016?
Flags: needinfo?(gbrown)
| Reporter | ||
Comment 2•9 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #1)
> Does the increase in test_pipe.js
> failures correlate to 12/03/2016?
No, I don't think so. There were definitely test_pipe.js failures as early as Nov 12. I think there was a slight increase in frequency from Dec 8 until now.
Flags: needinfo?(gbrown)
| Reporter | ||
Comment 3•9 years ago
|
||
I think I erred in blaming test_pipe.js: I forgot that xpcshell runs multiple tests in parallel. Many failures immediately follow the test_pipe.js START message, but there are other tests running too.
There are also some failures which do not show test_pipe.js running. For instance:
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=65880357&lineNumber=2220
[task 2017-01-03T16:58:34.798096Z] 16:58:34 INFO - TEST-PASS | xpcom/tests/unit/test_ioutil.js | took 4185ms
[task 2017-01-03T16:58:34.949024Z] 16:58:34 INFO - TEST-START | xpcom/tests/unit/test_nsIMutableArray.js
[task 2017-01-03T16:58:35.502039Z] 16:58:35 INFO - TEST-PASS | xpcom/tests/unit/test_localfile.js | took 4385ms
[task 2017-01-03T16:58:35.680612Z] 16:58:35 INFO - TEST-START | xpcom/tests/unit/test_nsIProcess.js
[task 2017-01-03T16:58:36.629717Z] 16:58:36 INFO - TEST-PASS | xpcom/tests/unit/test_mac_bundle.js | took 4145ms
[task 2017-01-03T16:58:36.800661Z] 16:58:36 INFO - TEST-START | xpcom/tests/unit/test_nsIProcess_stress.js
[task 2017-01-03T17:15:16.854540Z] 17:15:16 INFO - Automation Error: mozprocess timed out after 1000 seconds
Reviewing several failures from bug 1317155, it looks to me like test_nsIMutableArray.js is always running when we hit the timeout.
test_nsIMutableArray.js was modified in bug 1311191, a few weeks before the earliest known failure.
:erahm -- Any thoughts/interest?
Flags: needinfo?(erahm)
Summary: Intermittent timeout running xpcom/tests/unit/test_pipe.js → Intermittent xpcshell timeout following xpcom/tests/unit/test_nsIMutableArray.js
| Reporter | ||
Comment 4•9 years ago
|
||
I've found a few examples where test_nsIMutableArray.js completed before the timeout:
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-aurora&job_id=68444593&lineNumber=2273
test_nsIProcess_stress.js is another possibility, but there have been no recent changes to that test.
| Reporter | ||
Comment 5•9 years ago
|
||
Sequential runs show it's test_nsIProcess.js.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=70a0cdcc818193761f54119f7e7b2e698944c45d
Flags: needinfo?(erahm)
See Also: → 1317605
Summary: Intermittent xpcshell timeout following xpcom/tests/unit/test_nsIMutableArray.js → Intermittent xpcshell timeout during xpcom/tests/unit/test_nsIProcess.js
| Reporter | ||
Comment 6•9 years ago
|
||
With sequential and verbose logging I get: https://public-artifacts.taskcluster.net/YgbyHH9-T1SaFk-vVUw5og/0/public/logs/live_backing.log
[task 2017-01-13T20:16:43.806630Z] 20:16:43 INFO - TEST-START | xpcom/tests/unit/test_nsIProcess.js
[task 2017-01-13T20:16:43.810024Z] 20:16:43 INFO - xpcom/tests/unit/test_nsIProcess.js | full command: ['/home/worker/workspace/build/application/firefox/xpcshell', '-g', '/home/worker/workspace/build/application/firefox', '-a', '/home/worker/workspace/build/application/firefox', '-r', '/home/worker/workspace/build/application/firefox/components/httpd.manifest', '-m', '-s', '-e', 'const _HEAD_JS_PATH = "/home/worker/workspace/build/tests/xpcshell/head.js";', '-e', 'const _MOZINFO_JS_PATH = "/tmp/xpc-profile-DyEEAl/mozinfo.json";', '-e', 'const _TESTING_MODULES_DIR = "/home/worker/workspace/build/tests/modules/";', '-f', '/home/worker/workspace/build/tests/xpcshell/head.js', '-p', '/tmp/xpc-plugins-_tMmSS', '-e', 'const _SERVER_ADDR = "localhost"', '-e', u'const _HEAD_FILES = ["/home/worker/workspace/build/tests/xpcshell/tests/xpcom/tests/unit/head_xpcom.js"];', '-e', 'const _TAIL_FILES = [];', '-e', 'const _JSDEBUGGER_PORT = 0;', '-e', u'const _TEST_FILE = ["/home/worker/workspace/build/tests/xpcshell/tests/xpcom/tests/unit/test_nsIProcess.js"];', '-e', u'const _TEST_NAME = "xpcom/tests/unit/test_nsIProcess.js"', '-e', '_execute_test(); quit(0);']
[task 2017-01-13T20:16:43.810421Z] 20:16:43 INFO - xpcom/tests/unit/test_nsIProcess.js | current directory: u'/home/worker/workspace/build/tests/xpcshell/tests/xpcom/tests/unit'
[task 2017-01-13T20:16:43.812892Z] 20:16:43 INFO - xpcom/tests/unit/test_nsIProcess.js | environment: ['XPCOM_DEBUG_BREAK=stack-and-abort', 'LD_LIBRARY_PATH=/home/worker/workspace/build/application/firefox', 'XPCSHELL_TEST_TEMP_DIR=/tmp/xpc-other-KOYPJi', 'MOZHTTP2_PORT=54271', 'XPCSHELL_TEST_PROFILE_DIR=/tmp/xpc-profile-DyEEAl', 'MOZ_CRASHREPORTER=1', 'MOZ_DISABLE_NONLOCAL_CONNECTIONS=1', 'MOZ_CRASHREPORTER_NO_REPORT=1']
[task 2017-01-13T20:16:44.401597Z] 20:16:44 INFO - PROCESS | 13340 | [13340] WARNING: Couldn't get the user appdata directory. Crash events may not be produced.: file /home/worker/workspace/build/src/toolkit/crashreporter/nsExceptionHandler.cpp, line 2861
[task 2017-01-13T20:16:44.983550Z] 20:16:44 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
[task 2017-01-13T20:16:44.996228Z] 20:16:44 INFO - TEST-PASS | xpcom/tests/unit/test_nsIProcess.js | test_kill - [test_kill : 23] true == true
[task 2017-01-13T20:16:45.014244Z] 20:16:45 INFO - TEST-PASS | xpcom/tests/unit/test_nsIProcess.js | test_kill - [test_kill : 33] true == true
[task 2017-01-13T20:33:25.135355Z] 20:33:25 INFO - Automation Error: mozprocess timed out after 1000 seconds running ['/home/worker/workspace/build/venv/bin/python', '-u', '/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py', '--total-chunks', '8', '--this-chunk', '8', '--symbols-path=/home/worker/workspace/build/symbols', '--test-plugin-path=/home/worker/workspace/build/application/firefox/plugins', '--log-raw=/home/worker/workspace/build/blobber_upload_dir/xpcshell_raw.log', '--log-errorsummary=/home/worker/workspace/build/blobber_upload_dir/xpcshell_errorsummary.log', '--utility-path=tests/bin', '--sequential', '--xpcshell=/home/worker/workspace/build/application/firefox/xpcshell', '--manifest=tests/xpcshell/tests/xpcshell.ini']
[task 2017-01-13T20:33:25.237115Z] 20:33:25 ERROR - timed out after 1000 seconds of no output
...just like bug 1317605, but caught by taskcluster instead of the harness.
| Reporter | ||
Comment 7•9 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #6)
> ...just like bug 1317605, but caught by taskcluster instead of the harness.
In bug 1317605, proc.kill() fails at this point in the test; in this bug, it seems proc.kill() hangs. I'm trying to reproduce and narrow down the source of the hang on try.
I notice that test_nsIProcess_stress.js has similar code, but puts the kill() in a try/catch and ignores errors.
| Reporter | ||
Comment 8•9 years ago
|
||
In https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c15d2d38301c7f0fd1564db1f6b69c5596a551e, I added logging and reproduced the hang in test_nsIProcess.js 4 times.
https://public-artifacts.taskcluster.net/RbB1nGddQP2f4teQJSq1nQ/0/public/logs/live_backing.log
[task 2017-01-18T00:11:39.946060Z] 00:11:39 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
[task 2017-01-18T00:11:39.958644Z] 00:11:39 INFO - TEST-PASS | xpcom/tests/unit/test_nsIProcess.js | test_kill - [test_kill : 23] true == true
[task 2017-01-18T00:11:39.961259Z] 00:11:39 INFO - PROCESS | 13367 | zzz nsProcess::Kill
[task 2017-01-18T00:11:39.963130Z] 00:11:39 INFO - PROCESS | 13367 | zzz no mThread!!
[task 2017-01-18T00:11:39.981014Z] 00:11:39 INFO - TEST-PASS | xpcom/tests/unit/test_nsIProcess.js | test_kill - [test_kill : 33] true == true
[task 2017-01-18T00:28:20.100591Z] 00:28:20 INFO - Automation Error: mozprocess timed out after 1000 seconds
"[test_kill : 33]" is logged just before the test calls process.kill() but there is no "zzz nsProcess::Kill", as though nsProcess::Kill() was not called.
| Reporter | ||
Comment 9•9 years ago
|
||
Nathan - I feel stuck. Can you find someone to look into this? Bug 1317155 is one of the most frequent intermittent test failures now and is almost entirely caused by the apparent hang in test_nsIProcess.js illustrated in comment 8.
Flags: needinfo?(nfroyd)
Comment 10•9 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #9)
> Nathan - I feel stuck. Can you find someone to look into this? Bug 1317155
> is one of the most frequent intermittent test failures now and is almost
> entirely caused by the apparent hang in test_nsIProcess.js illustrated in
> comment 8.
Are you sure stderr is getting flushed before that timeout, either in the C runtime, or on the Python side? It seems very unlikely that nsProcess::Kill *isn't* getting called, but it seems reasonable that we might not be draining stderr somewhere along the way, so we're not seeing the messages.
Flags: needinfo?(nfroyd)
| Reporter | ||
Comment 11•9 years ago
|
||
I'm not sure. We use 'python -u runxpcshelltests.py', but I'm not sure about the xpcshell binary, and I suppose something could be buffered at the mozharness or taskcluster levels.
| Reporter | ||
Comment 12•9 years ago
|
||
I tried to coax additional logging out of the hang, but found no change.
Bug 1317155 test failures are quite frequent and are eliminated if test_nsIProcess.js is skipped; I'm going to skip test_nsIProcess.js on linux.
Comment 13•9 years ago
|
||
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8d9d333eac5e
Skip test_nsIProcess.js on linux, for frequent timeouts; r=me
| Reporter | ||
Updated•9 years ago
|
Keywords: leave-open
Whiteboard: [test disabled]
Comment 14•9 years ago
|
||
| bugherder | ||
| Reporter | ||
Updated•9 years ago
|
Keywords: leave-open
Whiteboard: [test disabled] → [test disabled][checkin-needed-aurora][checkin-needed-beta]
Comment 15•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/2548ffbed0f8
https://hg.mozilla.org/releases/mozilla-beta/rev/c1cef2154b81
status-firefox52:
--- → disabled
status-firefox53:
--- → disabled
status-firefox54:
--- → disabled
Whiteboard: [test disabled][checkin-needed-aurora][checkin-needed-beta] → [test disabled]
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Updated•8 years ago
|
Priority: -- → P3
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•