Closed Bug 1099370 Opened 5 years ago Closed 5 years ago

Intermittent browser_widget.js | application crashed [@ mozalloc_abort(char const*)] after FATAL ERROR: AsyncShutdown timeout in ShutdownLeaks: Wait for cleanup to be finished before checking for leaks Conditions: [{"name":"DevTools: Wait until toolbox...

Categories

(DevTools :: Inspector, defect)

x86
Linux
defect
Not set

Tracking

(firefox36 unaffected, firefox37 fixed, firefox38 fixed, firefox-esr31 unaffected)

RESOLVED FIXED
Firefox 38
Tracking Status
firefox36 --- unaffected
firefox37 --- fixed
firefox38 --- fixed
firefox-esr31 --- unaffected

People

(Reporter: RyanVM, Assigned: sjakthol)

References

Details

(Keywords: crash, intermittent-failure)

Attachments

(2 files)

11:04:09     INFO -  3279 INFO TEST-START | chrome://mochitests/content/browser/browser/devtools/webide/test/browser_widget.js
11:04:10     INFO -  3280 INFO TEST-PASS | chrome://mochitests/content/browser/browser/devtools/webide/test/browser_widget.js | Found WebIDE button
11:04:10     INFO -  3281 INFO TEST-PASS | chrome://mochitests/content/browser/browser/devtools/webide/test/browser_widget.js | WebIDE button uninstalled
11:04:10     INFO -  3282 INFO Closing WebIDE
11:04:10     INFO -  3283 INFO Console message: [JavaScript Error: "not well-formed" {file: "file:///builds/slave/test/build/tests/mochitest/chrome/browser/devtools/webide/test/addons/simulators.json" line: 1 column: 1 source: "{"}]
11:04:10     INFO -  3284 INFO WebIDE closed
11:04:11     INFO -  3285 INFO TEST-OK | chrome://mochitests/content/browser/browser/devtools/webide/test/browser_widget.js | took 1332ms
11:04:21     INFO -  WARNING: At least one completion condition is taking too long to complete. Conditions: [{"name":"DevTools: Wait until toolbox is destroyed","state":"(none)","filename":"resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js","lineNumber":1660,"stack":["resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js:Toolbox.prototype.destroy/leakCheckObserver:1660","chrome://mochikit/content/browser-test.js:Tester.prototype.nextTest</<:548","chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/maybeRunTests/<:683","null:null:0"]}] Barrier: ShutdownLeaks: Wait for cleanup to be finished before checking for leaks
11:05:12     INFO -  FATAL ERROR: AsyncShutdown timeout in ShutdownLeaks: Wait for cleanup to be finished before checking for leaks Conditions: [{"name":"DevTools: Wait until toolbox is destroyed","state":"(none)","filename":"resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js","lineNumber":1660,"stack":["resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js:Toolbox.prototype.destroy/leakCheckObserver:1660","chrome://mochikit/content/browser-test.js:Tester.prototype.nextTest</<:548","chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/maybeRunTests/<:683","null:null:0"]}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources.
11:05:12     INFO -  [Parent 1806] ###!!! ABORT: file resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js, line 1660
11:05:12     INFO -  [Parent 1806] ###!!! ABORT: file resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js, line 1660
11:05:13     INFO -  TEST-INFO | Main app process: killed by SIGSEGV
11:05:13     INFO -  3286 INFO checking window state
11:05:13     INFO -  3287 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/devtools/webide/test/browser_widget.js | application terminated with exit code 11
11:05:13     INFO -  runtests.py | Application ran for: 0:36:00.869964
11:05:13     INFO -  zombiecheck | Reading PID log: /tmp/tmpOxqJLNpidlog
11:05:13     INFO -  ==> process 1806 launched child process 1846
11:05:13     INFO -  zombiecheck | Checking for orphan process with PID: 1846
11:05:13     INFO -  mozcrash Downloading symbols from: https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/fx-team-linux/1415987176/firefox-36.0a1.en-US.linux-i686.crashreporter-symbols.zip
11:05:39     INFO -  mozcrash Saved minidump as /builds/slave/test/build/blobber_upload_dir/5a45c836-50fc-e333-09a84154-05d656f2.dmp
11:05:39     INFO -  mozcrash Saved app info as /builds/slave/test/build/blobber_upload_dir/5a45c836-50fc-e333-09a84154-05d656f2.extra
11:05:39  WARNING -  PROCESS-CRASH | chrome://mochitests/content/browser/browser/devtools/webide/test/browser_widget.js | application crashed [@ mozalloc_abort(char const*)]
11:05:39     INFO -  Crash dump filename: /tmp/tmpSkYDwW.mozrunner/minidumps/5a45c836-50fc-e333-09a84154-05d656f2.dmp
11:05:39     INFO -  Operating system: Linux
11:05:39     INFO -                    0.0.0 Linux 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686
11:05:39     INFO -  CPU: x86
11:05:39     INFO -       GenuineIntel family 6 model 44 stepping 2
11:05:39     INFO -       1 CPU
11:05:39     INFO -  Crash reason:  SIGSEGV
11:05:39     INFO -  Crash address: 0x0
11:05:39     INFO -  Thread 0 (crashed)
11:05:39     INFO -   0  libmozalloc.so!mozalloc_abort(char const*) [mozalloc_abort.cpp:91b67b9a26bb : 37 + 0x0]
11:05:39     INFO -      eip = 0xb730d8a9   esp = 0xbf9e3730   ebp = 0xbf9e3748   ebx = 0xb73105c0
11:05:39     INFO -      esi = 0xb75f9d9c   edi = 0xbf9e3788   eax = 0x0000000a   ecx = 0xffffffff
11:05:39     INFO -      edx = 0xb75fa8ac   efl = 0x00210282
11:05:39     INFO -      Found by: given as instruction pointer in context
11:05:39     INFO -   1  libxul.so!NS_DebugBreak [nsDebugImpl.cpp:91b67b9a26bb : 469 + 0xe]
11:05:39     INFO -      eip = 0xb367453a   esp = 0xbf9e3750   ebp = 0xbf9e3b98   ebx = 0xb6e58804
11:05:39     INFO -      esi = 0xbf9e3770   edi = 0xbf9e3788
11:05:39     INFO -      Found by: call frame info
11:05:39     INFO -   2  libxul.so!nsDebugImpl::Abort(char const*, int) [nsDebugImpl.cpp:91b67b9a26bb : 150 + 0xd]
11:05:39     INFO -      eip = 0xb36745af   esp = 0xbf9e3ba0   ebp = 0xbf9e3bc8   ebx = 0xb6e58804
11:05:39     INFO -      esi = 0xbf9e3d10   edi = 0xbf9e3cb8
11:05:39     INFO -      Found by: call frame info
11:05:39     INFO -   3  libxul.so!NS_InvokeByIndex + 0x2f
11:05:39     INFO -      eip = 0xb36ba628   esp = 0xbf9e3bd0   ebp = 0xbf9e3bf8   ebx = 0xb6e58804
11:05:39     INFO -      esi = 0xbf9e3d10   edi = 0xbf9e3cb8
11:05:39     INFO -      Found by: call frame info
11:05:39     INFO -   4  libxul.so!XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) [XPCWrappedNative.cpp:91b67b9a26bb : 2395 + 0x15]
11:05:39     INFO -      eip = 0xb3a458dc   esp = 0xbf9e3c00   ebp = 0xbf9e3da8
11:05:39     INFO -      Found by: previous frame's frame pointer
11:05:39     INFO -   5  libxul.so!XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) [XPCWrappedNativeJSOps.cpp:91b67b9a26bb : 1244 + 0x9]
11:05:39     INFO -      eip = 0xb3a47ddc   esp = 0xbf9e3db0   ebp = 0xbf9e3e78   ebx = 0xb6e58804
11:05:39     INFO -      esi = 0xb7260aa0   edi = 0xbf9e3e04
11:05:39     INFO -      Found by: call frame info
11:05:39     INFO -   6  libxul.so!js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) [jscntxtinlines.h:91b67b9a26bb : 231 + 0x1b]
11:05:39     INFO -      eip = 0xb5313c53   esp = 0xbf9e3e80   ebp = 0xbf9e4118   ebx = 0xb6e58804
11:05:39     INFO -      esi = 0xb7260aa0   edi = 0xbf9e3ec8
11:05:39     INFO -      Found by: call frame info
This test does not open a toolbox, so maybe a previous test is bleeding into it?
Paul, that doesn't look good. Can you dispatch someone to look at it?
Flags: needinfo?(paul)
Bug 1053463 seems related with view._elementStyle.rules errors.

I believe this is really a Style Inspector error.  browser_widget just happens to be the last test in the DevTools browser suite at the moment.
Component: Developer Tools: WebIDE → Developer Tools: Style Editor
Flags: needinfo?(paul)
See Also: → 1053463
Component: Developer Tools: Style Editor → Developer Tools: Inspector
Patrick, any guess what the issue is here?  I believe you've done some recent Inspector test improvements, so thought you might have an idea.

Ignore the WebIDE test name, it just happens to be the last test of the suite.
Flags: needinfo?(pbrosset)
My understanding so far is that it sometimes takes longer for the toolbox to close, and that makes the whole thing crash because of the blocker code we added to the toolbox destroy function in bug 1060840.

I don't exactly know what this does, but it looks like it prevents the tab from closing until the toolbox is fully destroy (to avoid leaks).
But in doing this, it just waits for some reasonable amount of time, and declares the thing dead if it takes longer to complete.

Here is the relevant code:
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/asyncshutdown/AsyncShutdown.jsm#760

Tim, you worked on bug 1060840, can you confirm my understanding of things so far? Do you agree with me that this shutdown crash can simply be caused by the toolbox taking more time than usual (maybe the machine is slower)?
Flags: needinfo?(pbrosset) → needinfo?(ttaubert)
Bug 1060840 just makes the shutdown leak detector wait until all tabs/toolboxes have been closed. Otherwise it would report those as leaked not knowing that they're still alive. If we see a shutdown crash then it seems like they're just never destroyed. AsyncShutdown stops waiting after 60s and just kills the process. Looks like for some reason the toolbox is never destroyed.
Flags: needinfo?(ttaubert)
(In reply to Tim Taubert [:ttaubert] (away Dec 19th, back Jan 5th) from comment #119)
> Bug 1060840 just makes the shutdown leak detector wait until all
> tabs/toolboxes have been closed. Otherwise it would report those as leaked
> not knowing that they're still alive. If we see a shutdown crash then it
> seems like they're just never destroyed. AsyncShutdown stops waiting after
> 60s and just kills the process. Looks like for some reason the toolbox is
> never destroyed.
Or maybe the toolbox takes a whole lot of time to get destroyed. 60s really seems like an awful lot of time, but what if the machine is running very slow?

Anyway, I've looked at the logs, there are plenty of exceptions in there and uncaught promise rejections, but it's hard to relate any of this to the toolbox not being able to close.
I've also run a small script on the log that calculates the time between 2 lines, and I didn't find anything that was taking more than 60s. There's a couple of occurrences of stuff that take about 20seconds, but they were not about closing the toolbox anyway.

So basically, I can't reproduce, I don't know where it comes from, and I don't know where to look :) 
Yay, best type of bug!

Tim, one more question: is it right to assume that this error message is just displayed here because that's the end of the testsuite, but really, any test in the suite could have caused it, right?
It seems pretty obvious that browser_widget.js isn't to blame here because it doesn't even open the toolbox, but I just want to make sure.
Flags: needinfo?(ttaubert)
(In reply to Patrick Brosset [:pbrosset] [:patrick] from comment #124)
> Tim, one more question: is it right to assume that this error message is
> just displayed here because that's the end of the testsuite, but really, any
> test in the suite could have caused it, right?
> It seems pretty obvious that browser_widget.js isn't to blame here because
> it doesn't even open the toolbox, but I just want to make sure.

Yeah, exactly. Some test running before will have opened a toolbox but never destroyed it properly.

It probably wouldn't be too hard to add some debug output to find out which of the tests creates a toolbox and never destroys it. From there on you could dig deeper and try to find the actual cause, unfortunately try is your best bet to reproduce as it seems.
Flags: needinfo?(ttaubert)
> Or maybe the toolbox takes a whole lot of time to get destroyed. 60s really seems like an awful lot of time, but what if the machine is running very slow?

In that case, crashing will be a mercy for the user. A shutdown that takes more than 60 seconds is clearly unacceptable, even on a slow machine.
I noticed when running canvasdebugger tests that they hang and the AsyncShutdown warning is printed to terminal. After a bit of bisection I was able to narrow the cause down to browser_canvas-frontend-open.js which would hang the shutdown IF another browser_canvas-frontend-* test was also executed.

For example, I can reproduce the AsyncShutdown hang with following command:
./mach mochitest-devtools --e10s browser/devtools/canvasdebugger/test/browser_canvas-frontend-call-stack-03.js browser/devtools/canvasdebugger/test/browser_canvas-frontend-open.js

I don't know if canvasdebugger is solely responsible for these intermittent failures but it sure seems to contribute to the result.
(In reply to Sami Jaktholm from comment #143)
> I noticed when running canvasdebugger tests that they hang and the
> AsyncShutdown warning is printed to terminal. After a bit of bisection I was
> able to narrow the cause down to browser_canvas-frontend-open.js which would
> hang the shutdown IF another browser_canvas-frontend-* test was also
> executed.
> 
> For example, I can reproduce the AsyncShutdown hang with following command:
> ./mach mochitest-devtools --e10s
> browser/devtools/canvasdebugger/test/browser_canvas-frontend-call-stack-03.
> js browser/devtools/canvasdebugger/test/browser_canvas-frontend-open.js
> 
> I don't know if canvasdebugger is solely responsible for these intermittent
> failures but it sure seems to contribute to the result.
Thanks Sami for finding this out. It sure does seem like it contributes to this bug.

Victor, here's a slight change the canvasdebugger test teardown function that makes sure the toolbox is destroyed before closing the tab which seems to avoid the hang.
Attachment #8540601 - Flags: review?(vporof)
Comment on attachment 8540601 [details] [diff] [review]
bug1099370-fix-canvasdebugger-teardown v1.patch

Review of attachment 8540601 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM
Attachment #8540601 - Flags: review?(vporof) → review+
Thanks Victor.
Let's check this in and keep the bug open in case this wasn't the only cause for the toolbox destroy hang.
Keywords: leave-open
If a debugger test times out, this problem seems to be triggered (see below). Here's a patch should fix that case.

It modifies the cleanup function in head.js to
a) destroy all toolboxes and wait for them to be destroyed
b) close all extra tabs to avoid leaks

It also removes calls to removeTab(gBrowser.selectedTab) from some tests that called that in their own cleanup functions. Now that the tabs are destroyed in the global cleanup function those calls would remove last tab and close the browser in the middle of the test suite.

Try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2c183694d706

----
When a timeout occurs test harness will execute the cleanup function registered in debugger/test/head.js [1] which destroys the debugger server. Next the harness will remove leaked tabs. The open tab will be closed and the toolbox starts to destroy itself.

At this point I kind of lost track of what's happening. The client destroyer sequence seems to hang in TabTarget._client.close(cleanupAndResolve)[2] which is waiting for the transport to be closed. When the cleanup function called DebuggerServer.destroy it closed the transport on the server side.

The transport is apparently being closed from both ends at the same time (though I have no idea how e10s fits into this mix) which seem to cause the closure callback [3] to be never executed. Thus the destroyer promise does not resolve and the leak check barrier is never removed.

I think this could occur even without a timeout. If a test destroys the tab but doesn't wait for the toolbox to be destroyed before finishing it could trigger similar chain of events so this might also help to reduce the general frequency of this bug.

[1] https://hg.mozilla.org/mozilla-central/annotate/369a8f14ccf8/browser/devtools/debugger/test/head.js#l46
[2] https://hg.mozilla.org/mozilla-central/annotate/369a8f14ccf8/browser/devtools/framework/target.js#l591
[3] https://hg.mozilla.org/mozilla-central/annotate/369a8f14ccf8/browser/devtools/framework/target.js#l577
Attachment #8550719 - Flags: review?(vporof)
Comment on attachment 8550719 [details] [diff] [review]
debugger-automatic-cleanup-after-test.patch

Review of attachment 8550719 [details] [diff] [review]:
-----------------------------------------------------------------

I think ejpbruel should review this instead
Attachment #8550719 - Flags: review?(vporof)
Attachment #8550719 - Flags: review?(ejpbruel)
This is on my radar for tomorrow. Sorry it took so long.
Comment on attachment 8550719 [details] [diff] [review]
debugger-automatic-cleanup-after-test.patch

Review of attachment 8550719 [details] [diff] [review]:
-----------------------------------------------------------------

Patch looks good to me
Attachment #8550719 - Flags: review?(ejpbruel) → review+
Keywords: checkin-needed
Comment 232 seems like a false positive, since it's not the DevTools test suite.

So far, no more occurrences since the last patch.