Closed Bug 1498706 Opened 3 years ago Closed 1 year ago

Intermittent browser/components/extensions/test/browser/browser_ext_port_disconnect_on_crash.js | Test timed out -

Categories

(WebExtensions :: Untriaged, defect, P5)

defect

Tracking

(firefox77 fixed)

RESOLVED FIXED
mozilla77
Tracking Status
firefox77 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: gsvelto)

References

(Depends on 1 open bug, Regression)

Details

(Keywords: intermittent-failure, Whiteboard: [retriggered][stockwell disable-recommended])

Attachments

(3 files)

Filed by: dvarga [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=205145308&repo=autoland

https://queue.taskcluster.net/v1/task/DxWilEpRRcyzSwdvSV-Ifw/runs/0/artifacts/public/logs/live_backing.log

19:39:50     INFO - TEST-PASS | browser/components/extensions/test/browser/test-oop-extensions/browser_ext_port_disconnect_on_crash.js | port should be disconnected without errors - Expected: null, Actual: null - 
19:39:50     INFO - Buffered messages finished
19:39:50     INFO - TEST-UNEXPECTED-FAIL | browser/components/extensions/test/browser/test-oop-extensions/browser_ext_port_disconnect_on_crash.js | Test timed out - 
19:39:50     INFO - Not taking screenshot here: see the one that was previously logged
19:39:50     INFO - TEST-UNEXPECTED-FAIL | browser/components/extensions/test/browser/test-oop-extensions/browser_ext_port_disconnect_on_crash.js | message queue is empty - Got ["port_disconnected"], expected []
19:39:50     INFO - Stack trace:
19:39:50     INFO - chrome://mochikit/content/browser-test.js:test_is:1295
19:39:50     INFO - chrome://mochikit/content/tests/SimpleTest/ExtensionTestUtils.js:ExtensionTestUtils.loadExtension/<:27
19:39:50     INFO - chrome://mochikit/content/browser-test.js:nextTest:695
19:39:50     INFO - chrome://mochikit/content/browser-test.js:timeoutFn:1190
19:39:50     INFO - setTimeout handler*chrome://mochikit/content/browser-test.js:Tester_execTest:1152
19:39:50     INFO - chrome://mochikit/content/browser-test.js:nextTest/<:986
19:39:50     INFO - chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:795
19:39:50     INFO - Not taking screenshot here: see the one that was previously logged
19:39:50     INFO - TEST-UNEXPECTED-FAIL | browser/components/extensions/test/browser/test-oop-extensions/browser_ext_port_disconnect_on_crash.js | Extension left running at test shutdown - 
19:39:50     INFO - Stack trace:
19:39:50     INFO - chrome://mochikit/content/tests/SimpleTest/ExtensionTestUtils.js:ExtensionTestUtils.loadExtension/<:109
19:39:50     INFO - chrome://mochikit/content/browser-test.js:nextTest:695
19:39:50     INFO - chrome://mochikit/content/browser-test.js:timeoutFn:1190
19:39:50     INFO - setTimeout handler*chrome://mochikit/content/browser-test.js:Tester_execTest:1152
19:39:50     INFO - chrome://mochikit/content/browser-test.js:nextTest/<:986
19:39:50     INFO - chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:795
19:39:50     INFO - GECKO(4328) | MEMORY STAT | vsize 863MB | vsizeMaxContiguous 422MB | residentFast 345MB | heapAllocated 109MB
19:39:50     INFO - TEST-OK | browser/components/extensions/test/browser/test-oop-extensions/browser_ext_port_disconnect_on_crash.js | took 45072ms
19:39:50     INFO - Not taking screenshot here: see the one that was previously logged
19:39:50     INFO - TEST-UNEXPECTED-FAIL | browser/components/extensions/test/browser/test-oop-extensions/browser_ext_port_disconnect_on_crash.js | Found a tab after previous test timed out: http://example.com/?crashme - 
19:39:50     INFO - checking window state
Summary: Intermittent browser/components/extensions/test/browser/test-oop-extensions/browser_ext_port_disconnect_on_crash.js | Test timed out - → Intermittent browser/components/extensions/test/browser/browser_ext_port_disconnect_on_crash.js | Test timed out -

I don't think that the patch you've linked could have a significant impact on this test, as the test in question does not come close to triggering related code there. Theoretically, my change shouldn't have had any impact here at all, as it deletes dead code, and adds an assertion to the one entry point to that code.

Did a quick scan through the logs, and I noticed that the "expected crash" leak logs never show up for the lagging test, so I'm guessing we're failing to load or get the correct information from the extension, or are failing to load the document which the extension is loading a content script attached to.

It might be useful to include more information in the test so that the progress we've made is more clear..

Flags: needinfo?(nika)
Depends on: 1611913

Shane can you assign someone to take a look?

Flags: needinfo?(mixedpuppy)

Tom, you've made this depend on a bug you're assigned to, can you take this one?

Flags: needinfo?(mixedpuppy) → needinfo?(tomica)

Sure, though I doubt this is caused by bug 1611913, and I don't actually intend to investigate here before bug 1583484, because all the code will be changing.

If this becomes too common in the interim, we can just disable it on windows+qr.

Assignee: nobody → tomica
Depends on: 1583484
Flags: needinfo?(tomica)
Whiteboard: [retriggered][stockwell needswork:owner] → [retriggered][stockwell needswork:owner][comment 72]

In the last 7 days there have been 22 occurrences, all on windows10-64 debug.

Recent failure: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=298273723&repo=autoland&lineNumber=32255

...
...
task 2020-04-19T11:10:24.218Z] 11:10:24     INFO - GECKO(8076) | [Parent 6328: Main Thread]: I/DocShellAndDOMWindowLeak --DOMWINDOW == 12 (000001536FEEAC00) [pid = 6328] [serial = 609] [outer = 0000000000000000] [url = chrome://mozapps/content/extensions/aboutaddons.html]
[task 2020-04-19T11:10:24.219Z] 11:10:24     INFO - GECKO(8076) | [Parent 6328: Main Thread]: I/DocShellAndDOMWindowLeak --DOMWINDOW == 11 (000001536632A400) [pid = 6328] [serial = 590] [outer = 0000000000000000] [url = chrome://mozapps/content/extensions/aboutaddons.html]
[task 2020-04-19T11:10:24.219Z] 11:10:24     INFO - GECKO(8076) | [Parent 6328: Main Thread]: I/DocShellAndDOMWindowLeak --DOMWINDOW == 10 (000001536FE3B000) [pid = 6328] [serial = 618] [outer = 0000000000000000] [url = about:addons]
[task 2020-04-19T11:11:41.122Z] 11:11:41     INFO - TEST-INFO | started process screenshot
[task 2020-04-19T11:11:41.192Z] 11:11:41     INFO - TEST-INFO | screenshot: exit 0
[task 2020-04-19T11:11:41.193Z] 11:11:41     INFO - Buffered messages logged at 11:10:11
[task 2020-04-19T11:11:41.193Z] 11:11:41     INFO - Entering test bound connect_from_tab_to_bg_and_crash_tab
[task 2020-04-19T11:11:41.193Z] 11:11:41     INFO - Extension loaded
[task 2020-04-19T11:11:41.194Z] 11:11:41     INFO - Console message: Warning: attempting to write 23080 bytes to preference extensions.webextensions.uuids. This is bad for general performance and memory usage. Such an amount of data should rather be written to an external file. This preference will not be sent to any content processes.
[task 2020-04-19T11:11:41.194Z] 11:11:41     INFO - TEST-PASS | browser/components/extensions/test/browser/browser_ext_port_disconnect_on_crash.js | expected port - Expected: tab_to_bg, Actual: tab_to_bg - 
[task 2020-04-19T11:11:41.194Z] 11:11:41     INFO - TEST-PASS | browser/components/extensions/test/browser/browser_ext_port_disconnect_on_crash.js | correct frameId - Expected: 0, Actual: 0 - 
[task 2020-04-19T11:11:41.194Z] 11:11:41     INFO - Buffered messages finished
[task 2020-04-19T11:11:41.194Z] 11:11:41     INFO - TEST-UNEXPECTED-FAIL | browser/components/extensions/test/browser/browser_ext_port_disconnect_on_crash.js | Test timed out - 
[task 2020-04-19T11:11:41.194Z] 11:11:41     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-04-19T11:11:41.194Z] 11:11:41     INFO - TEST-UNEXPECTED-FAIL | browser/components/extensions/test/browser/browser_ext_port_disconnect_on_crash.js | Extension left running at test shutdown - 
[task 2020-04-19T11:11:41.194Z] 11:11:41     INFO - Stack trace:
[task 2020-04-19T11:11:41.194Z] 11:11:41     INFO - chrome://mochikit/content/browser-test.js:test_ok:1269
[task 2020-04-19T11:11:41.194Z] 11:11:41     INFO - chrome://mochikit/content/tests/SimpleTest/ExtensionTestUtils.js:ExtensionTestUtils.loadExtension/<:117
[task 2020-04-19T11:11:41.194Z] 11:11:41     INFO - chrome://mochikit/content/browser-test.js:nextTest:549
[task 2020-04-19T11:11:41.195Z] 11:11:41     INFO - chrome://mochikit/content/browser-test.js:timeoutFn:1160
[task 2020-04-19T11:11:41.195Z] 11:11:41     INFO - setTimeout handler*chrome://mochikit/content/browser-test.js:Tester_execTest:1107
[task 2020-04-19T11:11:41.195Z] 11:11:41     INFO - chrome://mochikit/content/browser-test.js:nextTest/<:904
[task 2020-04-19T11:11:41.195Z] 11:11:41     INFO - chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:918
[task 2020-04-19T11:11:41.195Z] 11:11:41     INFO - GECKO(8076) | [Child 7908: Main Thread]: I/DocShellAndDOMWindowLeak --DOCSHELL 0000026E9F8BA800 == 4 [pid = 7908] [id = {6eed2194-85f9-4cc6-a0c1-0cd29c14a483}] [url = moz-extension://6a6210f7-3367-458e-873f-5c599ee47b26/_generated_background_page.html]
[task 2020-04-19T11:11:49.133Z] 11:11:49     INFO - Console message: [JavaScript Error: "Timeout while waiting for extension child to shutdown: "Generated extension" (ID: {fe8e0031-1fff-4c8f-b57d-e8712d8e3c24}, moz-extension://6a6210f7-3367-458e-873f-5c599ee47b26/)" {file: "resource://gre/modules/Extension.jsm" line: 2611}]
[task 2020-04-19T11:11:49.133Z] 11:11:49     INFO - shutdown@resource://gre/modules/Extension.jsm:2611:10
[task 2020-04-19T11:11:49.133Z] 11:11:49     INFO - async*receiveMessage@resource://specialpowers/SpecialPowersParent.jsm:1033:26
[task 2020-04-19T11:11:49.133Z] 11:11:49     INFO - JSWindowActor query*unload@resource://specialpowers/SpecialPowersChild.jsm:2062:19
[task 2020-04-19T11:11:49.133Z] 11:11:49     INFO - ExtensionTestUtils.loadExtension/<@chrome://mochikit/content/tests/SimpleTest/ExtensionTestUtils.js:118:23
[task 2020-04-19T11:11:49.133Z] 11:11:49     INFO - nextTest@chrome://mochikit/content/browser-test.js:549:35
[task 2020-04-19T11:11:49.133Z] 11:11:49     INFO - async*timeoutFn@chrome://mochikit/content/browser-test.js:1160:18
[task 2020-04-19T11:11:49.133Z] 11:11:49     INFO - setTimeout handler*Tester_execTest@chrome://mochikit/content/browser-test.js:1107:80
[task 2020-04-19T11:11:49.133Z] 11:11:49     INFO - nextTest/<@chrome://mochikit/content/browser-test.js:904:14
[task 2020-04-19T11:11:49.133Z] 11:11:49     INFO - SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:918:23
[task 2020-04-19T11:11:49.133Z] 11:11:49     INFO - 
[task 2020-04-19T11:11:49.438Z] 11:11:49     INFO - GECKO(8076) | [Child 7908: Main Thread]: I/DocShellAndDOMWindowLeak --DOMWINDOW == 9 (0000026E9F896600) [pid = 7908] [serial = 1420] [outer = 0000000000000000] [url = moz-extension://6a6210f7-3367-458e-873f-5c599ee47b26/_generated_background_page.html]
[task 2020-04-19T11:11:53.500Z] 11:11:53     INFO - GECKO(8076) | [Child 7908: Main Thread]: I/DocShellAndDOMWindowLeak --DOMWINDOW == 8 (0000026EA125B400) [pid = 7908] [serial = 1422] [outer = 0000000000000000] [url = moz-extension://6a6210f7-3367-458e-873f-5c599ee47b26/_generated_background_page.html]
[task 2020-04-19T11:12:15.275Z] 11:12:15     INFO - GECKO(8076) | [Parent 6328, Jump List] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012 (NS_ERROR_FILE_NOT_FOUND): file /builds/worker/checkouts/gecko/widget/windows/WinUtils.cpp, line 1348
[task 2020-04-19T11:14:15.314Z] 11:14:15     INFO - GECKO(8076) | [Parent 6328, Jump List] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012 (NS_ERROR_FILE_NOT_FOUND): file /builds/worker/checkouts/gecko/widget/windows/WinUtils.cpp, line 1348
[task 2020-04-19T11:15:07.832Z] 11:15:07     INFO -  JavaScript error: resource://gre/modules/PlacesExpiration.jsm, line 894: TypeError: can't access property "createAsyncStatement", this._db is undefined
[task 2020-04-19T11:16:15.354Z] 11:16:15     INFO - GECKO(8076) | [Parent 6328, Jump List] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012 (NS_ERROR_FILE_NOT_FOUND): file /builds/worker/checkouts/gecko/widget/windows/WinUtils.cpp, line 1348
[task 2020-04-19T11:18:15.399Z] 11:18:15     INFO - GECKO(8076) | [Parent 6328, Jump List] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012 (NS_ERROR_FILE_NOT_FOUND): file /builds/worker/checkouts/gecko/widget/windows/WinUtils.cpp, line 1348
[task 2020-04-19T11:21:50.724Z] 11:21:50     INFO - GECKO(8076) | [Parent 6328, Jump List] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012 (NS_ERROR_FILE_NOT_FOUND): file /builds/worker/checkouts/gecko/widget/windows/WinUtils.cpp, line 1348
[task 2020-04-19T11:23:50.754Z] 11:23:50     INFO - GECKO(8076) | [Parent 6328, Jump List] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012 (NS_ERROR_FILE_NOT_FOUND): file /builds/worker/checkouts/gecko/widget/windows/WinUtils.cpp, line 1348
[task 2020-04-19T11:24:07.998Z] 11:24:07     INFO -  JavaScript error: resource://gre/modules/PlacesExpiration.jsm, line 894: TypeError: can't access property "createAsyncStatement", this._db is undefined
[task 2020-04-19T11:30:00.766Z] 11:30:00     INFO - Buffered messages finished
[task 2020-04-19T11:30:00.766Z] 11:30:00    ERROR - TEST-UNEXPECTED-TIMEOUT | browser/components/extensions/test/browser/browser_ext_port_disconnect_on_crash.js | application timed out after 370 seconds with no output
[task 2020-04-19T11:30:00.766Z] 11:30:00    ERROR - Force-terminating active process(es).
[task 2020-04-19T11:30:00.766Z] 11:30:00     INFO - Determining child pids from psutil...
[task 2020-04-19T11:30:00.770Z] 11:30:00     INFO - [6328]
[task 2020-04-19T11:30:00.770Z] 11:30:00     INFO - ==> process 6328 launched child process 2408 ("Z:\task_1587291963\build\application\firefox\firefox.exe" -contentproc --channel="6328.0.698680986\373458506" -parentBuildID 20200419093659 -prefsHandle 2156 -prefMapHandle 2148 -prefsLen 1 -prefMapSize 231493 -appdir "Z:\task_1587291963\build\application\firefox\browser" - 6328 "\\.\pipe\gecko-crash-server-pipe.6328" 2212 gpu)
[task 2020-04-19T11:30:00.771Z] 11:30:00     INFO - ==> process 6328 launched child process 6376 ("Z:\task_1587291963\build\application\firefox\firefox.exe" -contentproc --channel="6328.6.1146339964\1152202969" -childID 1 -isForBrowser -prefsHandle 2112 -prefMapHandle 2104 -prefsLen 1541 -prefMapSize 231493 -parentBuildID 20200419093659 -appdir "Z:\task_1587291963\build\application\firefox\browser" - 6328 "\\.\pipe\gecko-crash-server-pipe.6328" 2852 tab)
[task 2020-04-19T11:30:00.771Z] 11:30:00     INFO - ==> process 6328 launched child process 7908 ("Z:\task_1587291963\build\application\firefox\firefox.exe" -contentproc --channel="6328.13.1934267169\1550682123" -childID 2 -isForBrowser -prefsHandle 3128 -prefMapHandle 3124 -prefsLen 1682 -prefMapSize 231493 -parentBuildID 20200419093659 -appdir "Z:\task_1587291963\build\application\firefox\browser" - 6328 "\\.\pipe\gecko-crash-server-pipe.6328" 3144 tab)
[task 2020-04-19T11:30:00.772Z] 11:30:00     INFO - ==> process 6328 launched child process 10744 ("Z:\task_1587291963\build\application\firefox\firefox.exe" -contentproc --channel="6328.20.901724061\2117109011" -childID 3 -isForBrowser -prefsHandle 5600 -prefMapHandle 5596 -prefsLen 1743 -prefMapSize 231493 -parentBuildID 20200419093659 -appdir "Z:\task_1587291963\build\application\firefox\browser" - 6328 "\\.\pipe\gecko-crash-server-pipe.6328" 5604 tab)
[task 2020-04-19T11:30:00.772Z] 11:30:00     INFO - ==> process 6328 launched child process 6060 ("Z:\task_1587291963\build\application\firefox\firefox.exe" -contentproc --channel="6328.27.1065885093\385237790" -childID 4 -isForBrowser -prefsHandle 5532 -prefMapHandle 5356 -prefsLen 11301 -prefMapSize 231493 -parentBuildID 20200419093659 -appdir "Z:\task_1587291963\build\application\firefox\browser" - 6328 "\\.\pipe\gecko-crash-server-pipe.6328" 4648 tab)
[task 2020-04-19T11:30:00.772Z] 11:30:00     INFO - ==> process 6328 launched child process 7836 ("Z:\task_1587291963\build\application\firefox\firefox.exe" -contentproc --channel="6328.34.476187199\2079073301" -childID 5 -isForBrowser -prefsHandle 4428 -prefMapHandle 4368 -prefsLen 11514 -prefMapSize 231493 -parentBuildID 20200419093659 -appdir "Z:\task_1587291963\build\application\firefox\browser" - 6328 "\\.\pipe\gecko-crash-server-pipe.6328" 4536 tab)
[task 2020-04-19T11:30:00.773Z] 11:30:00     INFO - ==> process 6328 launched child process 3848 ("Z:\task_1587291963\build\application\firefox\firefox.exe" -contentproc --channel="6328.41.89191683\2125991772" -childID 6 -isForBrowser -prefsHandle 3608 -prefMapHandle 4376 -prefsLen 11515 -prefMapSize 231493 -parentBuildID 20200419093659 -appdir "Z:\task_1587291963\build\application\firefox\browser" - 6328 "\\.\pipe\gecko-crash-server-pipe.6328" 3888 tab)
[task 2020-04-19T11:30:00.773Z] 11:30:00     INFO - ==> process 6328 launched child process 9920 ("Z:\task_1587291963\build\application\firefox\firefox.exe" -contentproc --channel="6328.48.1127504475\956165749" -childID 7 -isForBrowser -prefsHandle 3976 -prefMapHandle 4696 -prefsLen 13008 -prefMapSize 231493 -parentBuildID 20200419093659 -appdir "Z:\task_1587291963\build\application\firefox\browser" - 6328 "\\.\pipe\gecko-crash-server-pipe.6328" 4184 tab)
[task 2020-04-19T11:30:00.774Z] 11:30:00     INFO - ==> process 6328 launched child process 1404 ("Z:\task_1587291963\build\application\firefox\firefox.exe" -contentproc --channel="6328.55.1201293546\1005746686" -childID 8 -isForBrowser -prefsHandle 3676 -prefMapHandle 3900 -prefsLen 13473 -prefMapSize 231493 -parentBuildID 20200419093659 -appdir "Z:\task_1587291963\build\application\firefox\browser" - 6328 "\\.\pipe\gecko-crash-server-pipe.6328" 3828 tab)
[task 2020-04-19T11:30:00.774Z] 11:30:00     INFO - ==> process 6328 launched child process 6700 ("Z:\task_1587291963\build\application\firefox\firefox.exe" -contentproc --channel="6328.62.1550096543\490685109" -childID 9 -isForBrowser -prefsHandle 4540 -prefMapHandle 4068 -prefsLen 14302 -prefMapSize 231493 -parentBuildID 20200419093659 -appdir "Z:\task_1587291963\build\application\firefox\browser" - 6328 "\\.\pipe\gecko-crash-server-pipe.6328" 5064 tab)
[task 2020-04-19T11:30:00.775Z] 11:30:00     INFO - ==> process 6328 launched child process 2468 ("Z:\task_1587291963\build\application\firefox\firefox.exe" -contentproc --channel="6328.69.715687081\802182021" -childID 10 -isForBrowser -prefsHandle 5820 -prefMapHandle 3972 -prefsLen 14768 -prefMapSize 231493 -parentBuildID 20200419093659 -appdir "Z:\task_1587291963\build\application\firefox\browser" - 6328 "\\.\pipe\gecko-crash-server-pipe.6328" 3428 tab)
[task 2020-04-19T11:30:00.775Z] 11:30:00     INFO - ==> process 6328 launched child process 5728 ("Z:\task_1587291963\build\application\firefox\plugin-container.exe" --channel="6328.76.1866714843\1041309691" "C:\Users\task_1587291963\AppData\Local\Temp\tmpa60ppc.mozrunner\plugins\nptest.dll" "C:\Users\task_1587291963\AppData\LocalLow\Mozilla\Temp-{441e3f5e-3b01-4ad9-b706-2d2303ad886c}" "Z:\task_1587291963\AppData\Roaming\Adobe\\" -appdir "Z:\task_1587291963\build\application\firefox\browser" 748E3A83B77782F2 6328 "\\.\pipe\gecko-crash-server-pipe.6328" 3892 plugin)
[task 2020-04-19T11:30:00.775Z] 11:30:00     INFO - ==> process 6328 launched child process 10448 ("Z:\task_1587291963\build\application\firefox\firefox.exe" -contentproc --channel="6328.81.376508988\517073491" -childID 11 -isForBrowser -prefsHandle 10396 -prefMapHandle 9448 -prefsLen 14725 -prefMapSize 231493 -parentBuildID 20200419093659 -appdir "Z:\task_1587291963\build\application\firefox\browser" - 6328 "\\.\pipe\gecko-crash-server-pipe.6328" 10316 tab)
[task 2020-04-19T11:30:00.775Z] 11:30:00     INFO - Found child pids: set([9920, 5728, 7908, 2408, 1404, 6060, 3848, 10448, 10744, 6328, 2468, 7836, 6376, 6700])
[task 2020-04-19T11:30:00.775Z] 11:30:00     INFO - Failed to get child procs
[task 2020-04-19T11:30:00.775Z] 11:30:00     INFO - Killing process: 9920
...
Flags: needinfo?(tomica)
Flags: needinfo?(gsvelto)

Right, this looks related to bug 1434933, since that was about child processes crashing, and this tests explicitly crashes a tab's process.

Flags: needinfo?(tomica)
Regressed by: 1434933

It seems like I've been trading bug 1548654 (which shouldn't happen anymore) for this one. Tomislav, do you mind if I take this bug from you?

Flags: needinfo?(gsvelto)
Whiteboard: [retriggered][comment 72][stockwell unknown] → [retriggered][stockwell needswork:owner]

Gabriele are there any updates here? Failure rate is really high.

Flags: needinfo?(gsvelto)

I'm actively working on it but I cannot reproduce it locally, no matter how hard I try. I'm now running a try push with debug statements to try and figure out what's going on here.

Alright, I figured out what's wrong. When we hit the exception handler twice in the same process - which somehow happens often in this test - the second call does not generate a minidump anymore, but somehow gets stuck in the exception handler. Or rather, it keeps re-entering the exception handler. I assumed that the process would be terminated after the exception handler bailed out but it didn't, this might be due to how breakpad deals with multiple exceptions in this particular scenario. Either way this needs a different fix, I'll have a patch ready by tonight.

Assignee: tomica → gsvelto
Status: NEW → ASSIGNED
Flags: needinfo?(gsvelto)

I've got a fix under test here. With some luck it'll land tomorrow.

My patch mitigates the issue but doesn't seem to solve it. What I figured out is that the issue stems from a different problem than what I anticipated: when we get stuck in this test we're entering the exception handler twice from the same thread. I don't know how that's possible since the process should terminate after we've finished handling the first exception so I'll have to investigate that. I'll land the first fix today to reduce the volume and then I'll try to fix the remaining issue.

Yesterday's fix didn't cut it. What seems to be happening is that when two threads race into the exception handler the second one will prevent the process from being terminated. I'm not sure what's causing it but I think I can fix that by specifically marking ensuring the exception is marked as handled if we enter the exception handler more than once. Try run for this fix here.

Also I figured out why I saw the exception handler being re-entered twice by the same thread: that's what happens if the exception isn't marked as handled. Breakpad will look for another exception handler, won't find it, and will try ours again.

The try run looks good, no crashes or hangs after 12 retries. Time for review, fingers crossed that this is fixed for good.

This changes the way filter callbacks work in Breakpad so that the
Gecko-provided exception handler can instruct breakpad to just consider an
exception handled and terminate the process without further work. This change
is needed because when two threads enter the exception handler at the same
time the second can get stuck. The logic behind this is convoluted: when the
filter callback "accepts" an exception Breakpad will request minidump
generation and then notify Windows exception search mechanism that the
exception was handled. The process will terminate at this point. However if
a second thread enters the exception handler before this happens it will also
try to handle the exception, either by generating a minidump (prior to my fix
for bug 1434933) or by simply ignoring it (after bug 1434933). However under
both conditions Breakpad will consider the exception not handled and inform
Windows exception search logic to look for another handler. But we're already
at the top-level handler so Windows will try it again, and again, and again
... hanging the process instead of terminating it.

With this patch applied the first time we hit the exception handler we'll
request minidump generation and report the exception as handled. If we hit the
exception handler again we will not request minidump generation but we will
also report the exception as handled so that Windows can terminate the process
right away.

Attachment #9143290 - Attachment description: Bug 1498706 - Prevent Breakpad exception handler from hanging a process instead of terminating it r=dmajor → Bug 1498706 - Prevent the Breakpad exception handler from hanging a process instead of terminating it r=dmajor

Changing the stockwell tag since this bug is being worked on.

Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell needswork:owner]
Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell needswork:owner]

(In reply to Gabriele Svelto [:gsvelto] from comment #90)

I've got a fix under test here. With some luck it'll land tomorrow.

Just to set expectations, it's going to take me some time to review the patch. This area of code is very tricky and it always takes me a lot of concentration to think about. I'll want to apply the patch and watch it under a debugger.

Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell needswork:owner]

Can we disable the test until the fix is reviewed? Failure rate is very high.

Flags: needinfo?(dmajor)

(In reply to Andreea Pavel [:apavel] from comment #102)

Can we disable the test until the fix is reviewed? Failure rate is very high.

I can't speak for this, it is a question for the module owner.

Flags: needinfo?(dmajor)
Flags: needinfo?(mixedpuppy)
Flags: needinfo?(gsvelto)

The patch is ready and I've already tested it so I think that's the faster way to fix this. That being said I'll leave that decision to the component owner.

Flags: needinfo?(gsvelto)

Scratch my previous comment, I missed David's comment above. Let's disable the test then, I'll re-enable it after the fix lands.

Flags: needinfo?(mixedpuppy)
Pushed by nbeleuzu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/792842055c97
temporarily disable browser_ext_port_disconnect_on_crash.js on win10 64bits debug r=gsvelto,egao
Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell disabled]
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla77
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7b28d7381c31
Prevent the Breakpad exception handler from hanging a process instead of terminating it r=dmajor
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---
Target Milestone: mozilla77 → ---

Gabriele, did you forget to enable the test or should we leave it disabled?

Flags: needinfo?(gsvelto)

Yes, I forgot, sorry.

Flags: needinfo?(gsvelto)
Pushed by shindli@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2d9bdc474c19
Re enabled browser_ext_port_disconnect_on_crash.js that was previously disabled r=gsvelto
Keywords: leave-open
Whiteboard: [retriggered][stockwell disabled] → [retriggered]
Pushed by shindli@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/43d0002aebf8
correct enable condition r=test-fix CLOSED TREE
Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla77
You need to log in before you can comment on or make changes to this bug.