Closed Bug 1584397 Opened 5 months ago Closed 4 months ago

Intermittent netwerk/test/unit/test_trr.js | xpcshell return code: 0

Categories

(Core :: Networking, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox-esr68 --- unaffected
firefox70 --- unaffected
firefox71 --- fixed
firefox72 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: valentin)

References

Details

(Keywords: intermittent-failure, regression, Whiteboard: [necko-triaged][stockwell disable-recommended])

Attachments

(1 file)

Filed by: shindli [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=268677182&repo=mozilla-central
Full log: https://queue.taskcluster.net/v1/task/UVLthxfcTIujJZXpw828MQ/runs/0/artifacts/public/logs/live_backing.log


[task 2019-09-26T23:34:09.097Z] 23:34:09 INFO - TEST-PASS | toolkit/mozapps/update/tests/unit_service_updater/marRMRFDirFileInUseStageFailurePartialSvc_win.js | took 6769ms
[task 2019-09-26T23:34:09.124Z] 23:34:09 INFO - TEST-START | toolkit/mozapps/update/tests/unit_service_updater/marAppApplyDirLockedStageFailureSvc_win.js
[task 2019-09-26T23:34:12.119Z] 23:34:12 INFO - TEST-PASS | toolkit/mozapps/update/tests/unit_service_updater/marAppApplyDirLockedStageFailureSvc_win.js | took 2992ms
[task 2019-09-26T23:34:12.146Z] 23:34:12 INFO - TEST-START | toolkit/mozapps/update/tests/unit_service_updater/marAppApplyUpdateAppBinInUseStageSuccessSvc_win.js
[task 2019-09-26T23:34:20.889Z] 23:34:20 INFO - TEST-PASS | toolkit/mozapps/update/tests/unit_service_updater/marAppApplyUpdateAppBinInUseStageSuccessSvc_win.js | took 8744ms
[task 2019-09-26T23:34:20.924Z] 23:34:20 INFO - TEST-START | toolkit/mozapps/update/tests/unit_service_updater/marAppApplyUpdateSuccessSvc.js
[task 2019-09-26T23:34:24.819Z] 23:34:24 INFO - TEST-PASS | toolkit/mozapps/update/tests/unit_service_updater/marAppApplyUpdateSuccessSvc.js | took 3899ms
[task 2019-09-26T23:34:24.846Z] 23:34:24 INFO - TEST-START | toolkit/mozapps/update/tests/unit_service_updater/marAppApplyUpdateStageSuccessSvc.js
[task 2019-09-26T23:34:30.565Z] 23:34:30 INFO - TEST-PASS | toolkit/mozapps/update/tests/unit_service_updater/marAppApplyUpdateStageSuccessSvc.js | took 5718ms
[task 2019-09-26T23:34:30.592Z] 23:34:30 INFO - TEST-START | toolkit/xre/test/test_launch_without_hang.js
[task 2019-09-26T23:34:37.959Z] 23:34:37 INFO - TEST-PASS | toolkit/xre/test/test_launch_without_hang.js | took 7370ms
[task 2019-09-26T23:34:37.994Z] 23:34:37 INFO - TEST-START | uriloader/exthandler/tests/unit/test_getTypeFromExtension_ext_to_type_mapping.js
[task 2019-09-26T23:34:38.435Z] 23:34:38 INFO - TEST-PASS | uriloader/exthandler/tests/unit/test_getTypeFromExtension_ext_to_type_mapping.js | took 448ms
[task 2019-09-26T23:34:38.470Z] 23:34:38 INFO - TEST-START | uriloader/exthandler/tests/unit/test_getTypeFromExtension_with_empty_Content_Type.js
[task 2019-09-26T23:34:38.930Z] 23:34:38 INFO - TEST-PASS | uriloader/exthandler/tests/unit/test_getTypeFromExtension_with_empty_Content_Type.js | took 463ms
[task 2019-09-26T23:34:38.957Z] 23:34:38 INFO - TEST-START | uriloader/exthandler/tests/unit/test_badMIMEType.js
[task 2019-09-26T23:34:39.412Z] 23:34:39 INFO - TEST-PASS | uriloader/exthandler/tests/unit/test_badMIMEType.js | took 454ms
[task 2019-09-26T23:34:39.439Z] 23:34:39 INFO - TEST-START | uriloader/exthandler/tests/unit/test_handlerService.js
[task 2019-09-26T23:34:42.322Z] 23:34:42 INFO - TEST-PASS | uriloader/exthandler/tests/unit/test_handlerService.js | took 2883ms
[task 2019-09-26T23:34:42.357Z] 23:34:42 INFO - TEST-START | uriloader/exthandler/tests/unit/test_handlerService_store.js
[task 2019-09-26T23:34:43.457Z] 23:34:43 INFO - TEST-PASS | uriloader/exthandler/tests/unit/test_handlerService_store.js | took 1102ms
[task 2019-09-26T23:34:43.492Z] 23:34:43 INFO - TEST-START | uriloader/exthandler/tests/unit/test_punycodeURIs.js
[task 2019-09-26T23:34:44.953Z] 23:34:44 INFO - TEST-PASS | uriloader/exthandler/tests/unit/test_punycodeURIs.js | took 1466ms
[task 2019-09-26T23:34:44.968Z] 23:34:44 INFO - Retrying tests that failed when run in parallel.
[task 2019-09-26T23:34:44.983Z] 23:34:44 INFO - TEST-START | netwerk/test/unit/test_trr.js
[task 2019-09-26T23:34:46.882Z] 23:34:46 WARNING - TEST-UNEXPECTED-FAIL | netwerk/test/unit/test_trr.js | xpcshell return code: 0
[task 2019-09-26T23:34:46.882Z] 23:34:46 INFO - TEST-INFO took 1897ms
[task 2019-09-26T23:34:46.883Z] 23:34:46 INFO - >>>>>>>
[task 2019-09-26T23:34:46.883Z] 23:34:46 INFO - PID 5748 | Couldn't convert chrome URL: chrome://branding/locale/brand.properties
[task 2019-09-26T23:34:46.883Z] 23:34:46 INFO - PID 5748 | [5748, Main Thread] WARNING: Failed to get directory to cache.: file z:/build/build/src/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 83
[task 2019-09-26T23:34:46.883Z] 23:34:46 INFO - PID 5748 | [5748, Main Thread] WARNING: Failed to get directory to cache.: file z:/build/build/src/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 83
[task 2019-09-26T23:34:46.884Z] 23:34:46 INFO - PID 5748 | [5748, Main Thread] WARNING: Failed to get directory to cache.: file z:/build/build/src/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 83
[task 2019-09-26T23:34:46.884Z] 23:34:46 INFO - PID 5748 | [5748, Main Thread] WARNING: Failed to get directory to cache.: file z:/build/build/src/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 83
[task 2019-09-26T23:34:46.884Z] 23:34:46 INFO - PID 5748 | [5748, Main Thread] WARNING: Failed to get directory to cache.: file z:/build/build/src/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 83
[task 2019-09-26T23:34:46.884Z] 23:34:46 INFO - PID 5748 | [5748, Main Thread] WARNING: Couldn't get the user appdata directory. Crash events may not be produced.: file z:/build/build/src/toolkit/crashreporter/nsExceptionHandler.cpp, line 2632
[task 2019-09-26T23:34:46.884Z] 23:34:46 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
[task 2019-09-26T23:34:46.884Z] 23:34:46 INFO - (xpcshell/head.js) | test run_next_test 0 pending (2)

Junior, can this failure and bug 1584743 be related to the new node? I do not think they are related, but you mentioned that we had some test failure related to the new node.

Flags: needinfo?(juhsu)

(a) I can't figure out the rationale of bug 1584743. node doesn't leave any dying message.
I believe she's alive till the end of test chunk.

(b) This bug is related to new node.
The retry trr fails at the test1, which indicates node is dead before the test start.

The right solution should be bug 1547823 IMO.
Please see bug 1547823 comment 17 for details.

Valentin, should Bug 1582515 ease this kind of intermittent?
There's still intermittent sample recently.

[task 2019-09-26T23:12:19.781Z] 23:12:19     INFO -  TEST-FAIL | netwerk/test/unit/test_trr.js | took 38910ms
...
[task 2019-09-26T23:12:20.165Z] 23:12:20     INFO -  netwerk/test/unit/test_trr.js failed or timed out, will retry.
...
[task 2019-09-26T23:34:46.884Z] 23:34:46     INFO -  netwerk/test/unit/test_trr.js | Starting setup
[task 2019-09-26T23:34:46.884Z] 23:34:46     INFO -  (xpcshell/head.js) | test setup pending (2)
[task 2019-09-26T23:34:46.885Z] 23:34:46     INFO -  PID 5748 | start!
[task 2019-09-26T23:34:46.885Z] 23:34:46     INFO -  TEST-PASS | netwerk/test/unit/test_trr.js | setup - [setup : 34] "49320" != null
[task 2019-09-26T23:34:46.885Z] 23:34:46     INFO -  TEST-PASS | netwerk/test/unit/test_trr.js | setup - [setup : 35] "49320" != ""
[task 2019-09-26T23:34:46.885Z] 23:34:46     INFO -  PID 5748 | [5748, Main Thread] WARNING: This method is lossy. Use GetCanonicalPath !: file z:/build/build/src/xpcom/io/nsLocalFileWin.cpp, line 3201
[task 2019-09-26T23:34:46.885Z] 23:34:46     INFO -  PID 5748 | [5748, Main Thread] WARNING: This method is lossy. Use GetCanonicalPath !: file z:/build/build/src/xpcom/io/nsLocalFileWin.cpp, line 3201
[task 2019-09-26T23:34:46.885Z] 23:34:46     INFO -  PID 5748 | [5748, Main Thread] WARNING: This method is lossy. Use GetCanonicalPath !: file z:/build/build/src/xpcom/io/nsLocalFileWin.cpp, line 3201
[task 2019-09-26T23:34:46.885Z] 23:34:46     INFO -  (xpcshell/head.js) | test run_next_test 0 finished (2)
[task 2019-09-26T23:34:46.885Z] 23:34:46     INFO -  (xpcshell/head.js) | test run_next_test 1 pending (2)
[task 2019-09-26T23:34:46.885Z] 23:34:46     INFO -  (xpcshell/head.js) | test setup finished (2)
[task 2019-09-26T23:34:46.885Z] 23:34:46     INFO -  netwerk/test/unit/test_trr.js | Starting test1
[task 2019-09-26T23:34:46.885Z] 23:34:46     INFO -  (xpcshell/head.js) | test test1 pending (2)
[task 2019-09-26T23:34:46.885Z] 23:34:46     INFO -  (xpcshell/head.js) | test run_next_test 1 finished (2)
[task 2019-09-26T23:34:46.886Z] 23:34:46     INFO -  PID 5748 | [5748, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, kKnownEsrVersion) failed with result 0x80004002: file z:/build/build/src/toolkit/components/resistfingerprinting/nsRFPService.cpp, line 661
[task 2019-09-26T23:34:46.886Z] 23:34:46     INFO -  TEST-PASS | netwerk/test/unit/test_trr.js | test1 - [test1 : 114] true == true
[task 2019-09-26T23:34:46.886Z] 23:34:46     INFO -  TEST-PASS | netwerk/test/unit/test_trr.js | test1 - [test1 : 123] 0 == 0
[task 2019-09-26T23:34:46.886Z] 23:34:46  WARNING -  TEST-UNEXPECTED-FAIL | netwerk/test/unit/test_trr.js | test1 - [test1 : 125] "127.0.0.1" == "2.2.2.2"
[task 2019-09-26T23:34:46.886Z] 23:34:46     INFO -  Z:/task_1569538423/build/tests/xpcshell/tests/netwerk/test/unit/test_trr.js:onLookupComplete:125
...
[task 2019-09-26T23:34:54.581Z] 23:34:54     INFO -  Node moz-http2 server shutting down ...
[task 2019-09-26T23:34:54.581Z] 23:34:54     INFO -  Node server moz-http2 already dead -1073741819
Flags: needinfo?(juhsu) → needinfo?(valentin.gosu)
Depends on: 1547823

Addendum:
There's one sample of bug 1584743 is due to node's death.
https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=269454561&revision=31bea2d0f0a68746bc960dff8a2e0399f92f7139

I can't tell for the other linux failures.

It seems the reason we don't get any info from node is that it actually segfaults - similar to here:
https://github.com/serialport/node-serialport/issues/74

It's quite likely it's happening because of the native http2 we're using - the experimental warning here:

[task 2019-10-02T15:52:25.823Z] 15:52:25     INFO -  Node moz-http2 server shutting down ...
[task 2019-10-02T15:52:25.823Z] 15:52:25     INFO -  Node server moz-http2 already dead -1073741819
[task 2019-10-02T15:52:25.823Z] 15:52:25     INFO -  Process stderr
[task 2019-10-02T15:52:25.823Z] 15:52:25     INFO -  (node:8672) ExperimentalWarning: The http2 module is an experimental API.

I think upgrading node has a high chance of fixing this.
Another option would be to use child_process.fork so any operation in the proxy test doesn't crash the current node server. Bug 1582515 doesn't improve things here since everything still runs in the same process.

Flags: needinfo?(valentin.gosu)

I also wonder if using stream.close(); instead of stream.close(http2_internal.constants.NGHTTP2_STREAM_CLOSED); might improve things here - the constants thing was causing some errors locally, when I was running the tests. And considering that code didn't really get exercised before Bug 1582515, the change shouldn't break things (hopefully).

Valentin, are you willing to take this? Feel free to bounce.

Assignee: nobody → valentin.gosu
Priority: P5 → P2
Whiteboard: [necko-triaged]

Over the last 7 days there are 31 failures on this bug.
These happen on: windows7-32-shippable, windows7-32, windows10-64
Here is the most recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=271963714&repo=autoland&lineNumber=8882

Flags: needinfo?(valentin.gosu)

I have a fix incoming.

Flags: needinfo?(valentin.gosu)
  • Adds a new moz-http2-child.js file which gets spawned into a new process. When calling NodeServer.execute, the code gets passed to the existing moz-http2.js process which then sends it to be evaluated in the child process. Any crash in the child should not be able to kill the main node process.
  • Moves the proxy creation code into test_http2-proxy.js
  • Adds the new NodeServer.fork() and NodeServer.kill() static methods to spawn a new server
  • Makes it easier to isolate a test's behaviour from another's. It also opens the way to moving some of the logic to individual unit tests, like we do for the proxy creation code, rather than keeping it all in moz-http2.js

Can you please review the patch from https://bugzilla.mozilla.org/show_bug.cgi?id=1584397#c12?

Flags: needinfo?(honzab.moz)

Sure, no need to do any additional ni?, I see the request. If you want me to do it with a higher priority, feel free to push me :)

Flags: needinfo?(honzab.moz)

Please note that Valentin (the author) is OOO for the rest of the week. I have already started the review and drafted few comments, but it doesn't make sense to publish when there is no one to update the patch anyway.

Blocks: 1584743
Blocks: 1582030
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/2a40bdf17bea
Make http2 proxy code execute in a separate process r=mayhemer
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/31f84aac7e91
Make http2 proxy code execute in a separate process r=mayhemer
Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72
Flags: needinfo?(valentin.gosu)
See Also: → 1598825
You need to log in before you can comment on or make changes to this bug.