Closed Bug 1185045 Opened 9 years ago Closed 8 years ago

Intermittent browser_net_timing-division.js | FATAL ERROR: AsyncShutdown timeout in ShutdownLeaks: Wait for cleanup to be finished before checking for leaks Conditions: [{"name":"DevTools: Wait until toolbox is destroyed","state":"(none)","filename":" ...

Categories

(DevTools :: Framework, defect, P5)

x86
Linux
defect

Tracking

(e10s+, firefox42 affected)

RESOLVED WORKSFORME
Tracking Status
e10s + ---
firefox42 --- affected

People

(Reporter: RyanVM, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash, intermittent-failure)

08:33:44 INFO - 4383 INFO TEST-START | browser/devtools/netmonitor/test/browser_net_timing-division.js
08:33:50 INFO - MEMORY STAT | vsize 712MB | residentFast 210MB | heapAllocated 98MB
08:33:50 INFO - 4384 INFO TEST-OK | browser/devtools/netmonitor/test/browser_net_timing-division.js | took 5307ms
08:33:50 INFO - console.error:
08:33:50 INFO - Protocol error (noSuchActor): No such actor for ID: server1.conn80.child1/profilerActor20
08:34:00 INFO - WARNING: At least one completion condition is taking too long to complete. Conditions: [{"name":"DevTools: Wait until toolbox is destroyed","state":"(none)","filename":"resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js","lineNumber":1927,"stack":["resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js:Toolbox.prototype.destroy/leakCheckObserver:1927","chrome://mochikit/content/browser-test.js:Tester.prototype.nextTest</<:632","chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:736","null:null:0"]}] Barrier: ShutdownLeaks: Wait for cleanup to be finished before checking for leaks
08:34:51 INFO - FATAL ERROR: AsyncShutdown timeout in ShutdownLeaks: Wait for cleanup to be finished before checking for leaks Conditions: [{"name":"DevTools: Wait until toolbox is destroyed","state":"(none)","filename":"resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js","lineNumber":1927,"stack":["resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js:Toolbox.prototype.destroy/leakCheckObserver:1927","chrome://mochikit/content/browser-test.js:Tester.prototype.nextTest</<:632","chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:736","null:null:0"]}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources.
08:34:51 INFO - [Parent 3382] ###!!! ABORT: file resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js, line 1927
08:34:51 INFO - [Parent 3382] ###!!! ABORT: file resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js, line 1927
08:34:52 INFO - [Child 3443] ###!!! ABORT: Aborting on channel error.: file /builds/slave/fx-team-lx-pgo-000000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1753
08:34:52 INFO - [Child 3443] ###!!! ABORT: Aborting on channel error.: file /builds/slave/fx-team-lx-pgo-000000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1753
08:34:52 INFO - TEST-INFO | Main app process: killed by SIGSEGV
08:34:52 INFO - 4385 INFO checking window state
08:34:52 WARNING - TEST-UNEXPECTED-FAIL | browser/devtools/netmonitor/test/browser_net_timing-division.js | application terminated with exit code 11
See Also: → 1240580
This and related bugs all appear to be Windows e10s failures. Sami, any chance you'd be up for taking a look? :)
Blocks: e10s-tests
tracking-e10s: --- → ?
See Also: → 1235148
This is pretty easy to hit on Try on OSX 10.6 with e10s enabled too.
I looked at the logs from older failures and based on the created and destroyed DOMWindows I was able to determine two tests that created DOMWindows which were not destroyed before the suite ended:
* browser_net_pane-collapse.js for some failures
* browser_net_prefs-reload.js for other failures

However, I don't see anything clearly wrong with the tests other than the fact that ALL netmonitor tests directly access the content window via CPOWs and teardown the test by 1) removing the tab synchronously and 2) waiting for the netmonitor to be destroyed. Other devtools tests do this the other way around: destroy the toolbox first and then remove the tab.

Since the netmonitor tests wait for the "destroyed" event [1], it's not the netmonitor that is blocking the devtools shutdown but something else in the toolbox destroyer promise mess [2] that just pushes bunch of promises to an array and hopes them to either resolve or reject at some point in the future.

So in this case one of these promises (not the netmonitor one) hangs somewhere failing to resolve.

[1] https://dxr.mozilla.org/mozilla-central/source/devtools/client/netmonitor/test/head.js#178
[2] https://dxr.mozilla.org/mozilla-central/source/devtools/client/framework/toolbox.js#2033
With some extra debugging I've been able to determine that the cause for the never finishing cleanup is not any particular test; the issue is a bit more generic.

The most likely culprit here is that all the tests forcibly remove the tab without closing the toolbox first. This causes starts a race between the server and the client since both start destruction from their respective ends. The toolbox detects that the tab it lived in was removed and starts the destruction process. Similarly, the tab actor at the server side notices that the tab it's representing has closed and starts to destroy itself. If the stars are perfectly aligned and the timings are spot on, the toolbox destruction blocks infinitely for some undetermined reason.

What I know is that the toolbox destroyer promise never resolves since target.destroy() at [1] never resolves. This in turn is caused by the fact that DebuggerClient.close() never calls the callback function at [2]. The callback is supposed to be called when the transport is closed but it never seem to happen. There's two possible causes for that:
1) The transport is closed but the client misses the notification somehow.
2) The transport is never closed since DC.close() doesn't call cleanup() at [3] which would close the transport. This means that one of the actor clients the debugger client was managing never responds to the detach request and the call to cleanup() never happens since it only happens once all clients have been detached.

Here's a couple of suggestions that might fix this:
1) Find the root cause and fix it. This will also make the duplicates go away but since this code is so fragile something is bound to break in the process.
2) Change the way netmonitor tests teardown themselves and deal with the fallout. Instead of removing the tab directly, destroy the toolbox first and only after that remove he tab. This might break tests that assume the teardown happens the way it does at the moment but at least the possible bustage is localized to netmonitor tests.

Anyway, this is such a huge can of worms that I'll leave this to the engineers that are paid to look into this stuff.

[1] https://dxr.mozilla.org/mozilla-central/source/devtools/client/framework/toolbox.js#2029
[2] https://dxr.mozilla.org/mozilla-central/source/devtools/client/framework/target.js#567
[3] https://dxr.mozilla.org/mozilla-central/source/devtools/shared/client/main.js#408
Blocks: 1280289
Blocks: 1283725
Intermittent e10s test failure
Priority: -- → P5
Nicolas, is this something you'd be interested in looking at? Seems to be a fairly common issue with toolbox-using tests (see deps).
No longer blocks: 1280289, 1283725
Component: Developer Tools: Netmonitor → Developer Tools: Framework
Flags: needinfo?(chevobbe.nicolas)
See Also: → 1280289, 1283725, 1283948
I'll have a look in the next days
Assignee: nobody → chevobbe.nicolas
Flags: needinfo?(chevobbe.nicolas)
Assignee: chevobbe.nicolas → nobody
Looks like this went away around mid-July.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.