Closed Bug 900453 Opened 11 years ago Closed 9 years ago

Intermittent test_fullscreen-api.html | application crashed [@ JSContext::restoreFrameChain()] (after "Assertion failure: savedFrameChain_ > 0, at ../../../js/src/vm/Stack.h:1209")

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox39 --- wontfix
firefox40 --- wontfix
firefox41 --- fixed
firefox-esr31 --- unaffected
firefox-esr38 --- unaffected

People

(Reporter: cbook, Assigned: xidorn)

References

()

Details

(Keywords: crash, intermittent-failure)

Crash Data

Rev4 MacOSX Lion 10.7 mozilla-inbound debug test mochitest-1 on 2013-08-01 03:12:29 PDT for push db6598c7675f

slave: talos-r4-lion-034

Assertion failure: savedFrameChain_ > 0, at ../../../js/src/vm/Stack.h:1208

 PROCESS-CRASH | /tests/content/html/content/test/test_fullscreen-api.html | application crashed [@ JSContext::restoreFrameChain()]
03:38:59     INFO -  Crash dump filename: /var/folders/qd/srwd5f710sj0fcl9z464lkj00000gn/T/tmpBr6RXR/minidumps/6A743EF3-88DC-4664-B757-9A8948F160D1.dmp
03:38:59     INFO -  Operating system: Mac OS X
03:38:59     INFO -                    10.7.2 11C74
03:38:59     INFO -  CPU: amd64
03:38:59     INFO -       family 6 model 23 stepping 10
03:38:59     INFO -       2 CPUs
03:38:59     INFO -  Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
03:38:59     INFO -  Crash address: 0x0
03:38:59     INFO -  Thread 0 (crashed)
03:38:59     INFO -   0  XUL!JSContext::restoreFrameChain() [Stack.h:db6598c7675f : 1208 + 0x0]
03:38:59     INFO -      rbx = 0x00007fff78967630   r12 = 0x0000000000000003
03:38:59     INFO -      r13 = 0x00007fff5fbfd617   r14 = 0x000000011f8b3228
03:38:59     INFO -      r15 = 0x0000000106aff200   rip = 0x00000001037b4743
03:38:59     INFO -      rsp = 0x00007fff5fbfd490   rbp = 0x00007fff5fbfd4b0
03:38:59     INFO -      Found by: given as instruction pointer in context
03:38:59     INFO -   1  XUL!JS_RestoreFrameChain(JSContext*) [jsapi.cpp:db6598c7675f : 5466 + 0x7]
03:38:59     INFO -      rbx = 0x000000011f8b3140   r12 = 0x000000010017c1a8
03:38:59     INFO -      r13 = 0x00007fff5fbfd617   r14 = 0x00007fff5fbfd4c8
03:38:59     INFO -      r15 = 0x000000010017c1b0   rip = 0x00000001037857ae
03:38:59     INFO -      rsp = 0x00007fff5fbfd4c0   rbp = 0x00007fff5fbfd4e0
03:38:59     INFO -      Found by: call frame info
03:38:59     INFO -   2  XUL!XPCJSContextStack::Pop() [XPCJSContextStack.cpp:db6598c7675f : 52 + 0x8]
03:38:59     INFO -      rbx = 0x000000011f8b3140   r12 = 0x000000010017c1a8
03:38:59     INFO -      r13 = 0x00007fff5fbfd617   r14 = 0x0000000000000000
03:38:59     INFO -      r15 = 0x000000010017c1b0   rip = 0x00000001022b48cf
03:38:59     INFO -      rsp = 0x00007fff5fbfd4f0   rbp = 0x00007fff5fbfd510
03:38:59     INFO -      Found by: call frame info
03:38:59     INFO -   3  XUL!nsXPConnect::AfterProcessNextEvent(nsIThreadInternal*, unsigned int) [nsXPConnect.cpp:db6598c7675f : 1334 + 0x4]
03:38:59     INFO -      rbx = 0x000000010017aea0   r12 = 0x0000000000000000
03:38:59     INFO -      r13 = 0x00007fff5fbfd617   r14 = 0x000000010013d410
03:38:59     INFO -      r15 = 0x0000000000000000   rip = 0x000000010230c9dd
03:38:59     INFO -      rsp = 0x00007fff5fbfd520   rbp = 0x00007fff5fbfd530
03:38:59     INFO -      Found by: call frame info
03:38:59     INFO -   4  XUL!_ZThn8_N11nsXPConnect21AfterProcessNextEventEP17nsIThreadInternalj [nsXPConnect.cpp:db6598c7675f : 1197 + 0x8]
03:38:59     INFO -      rbx = 0x000000010013d3d0   r12 = 0x0000000000000000
03:38:59     INFO -      r13 = 0x00007fff5fbfd617   r14 = 0x000000010013d410
03:38:59     INFO -      r15 = 0x0000000000000000   rip = 0x000000010230ce5d
03:38:59     INFO -      rsp = 0x00007fff5fbfd540   rbp = 0x00007fff5fbfd540
03:38:59     INFO -      Found by: call frame info
I added this assert as part of the interpreter stack rewrite. It means JS_RestoreFrameChain is called but there's no matching JS_SaveFrameChain call for the topmost activation Activation::savedFrameChain_ is incremented when we save the frame chain and decremented when we restore it.

Bobby, any idea what's going on here based on the stack trace?
Flags: needinfo?(bobbyholley+bmo)
Yikes, that's bad.

Aside from one well-understood exception in nsJSUtils.cpp, the browser only ever calls JS_SaveFrameChain when pushing JSContexts. And for the most part, we only do _that_ with stack-scoped classes. Some of those classes have annoying potentially-non-LIFO Maybe<> semantics, but we try to catch those with assertions.

The two places we don't use stack-scoped classes are callback situations - one is a callback from JS-Ctypes ( mxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCJSRuntime.cpp#1221 ), and the other is a callback from XPCOM when we spin the event loop ( http://mxr.mozilla.org/mozilla-central/source/js/xpconnect/src/nsXPConnect.cpp#1162 ).

We appear to be triggering this assertion in the latter case, though there's no guarantee that the XPCOM callback is the one botching the invariant. Indeed, we have pretty explicit machinery in place there with mEventDepth to make sure we never call Pop() more than we call Push().

Tomcat, can we get the rest of the stack, or a link to the log? How often is this happening? Any correlations?
Flags: needinfo?(bobbyholley+bmo) → needinfo?(cbook)
Log link is in the URL field of the bug.
bholley: as ryan mentioned. 

Also there was/is a minidump file Crash dump filename: /var/folders/qd/srwd5f710sj0fcl9z464lkj00000gn/T/tmpBr6RXR/minidumps/6A743EF3-88DC-4664-B757-9A8948F160D1.dmp on talos-r4-lion-034 

we could try to see if the minidump is already there and for that we would need to ask releng.
Flags: needinfo?(cbook)
My gut feeling is that it has something to do with the complicated interactions between OS code and XPCOM when we use the fullscreen API. A minidump wouldn't be nearly as useful as STR, and those STR would probably be better directed at smichaud, though I can't say for sure. It looks like there are a number of other bugs with the fullscreen tests, and it's disabled on quite a few platforms.
(In reply to Bobby Holley (:bholley) from comment #5)
> My gut feeling is that it has something to do with the complicated
> interactions between OS code and XPCOM when we use the fullscreen API. A
> minidump wouldn't be nearly as useful as STR, and those STR would probably
> be better directed at smichaud, though I can't say for sure. It looks like
> there are a number of other bugs with the fullscreen tests, and it's
> disabled on quite a few platforms.

(In reply to TinderboxPushlog Robot from comment #6)
> philor
> https://tbpl.mozilla.org/php/getParsedLog.php?id=26677373&tree=Mozilla-Aurora
> Rev4 MacOSX Lion 10.7 mozilla-aurora opt test mochitest-1 on 2013-08-17
> 01:48:59
> revision: 5ccb6826afa7
> slave: talos-r4-lion-055
> 
> TEST-UNEXPECTED-FAIL |
> /tests/content/html/content/test/test_fullscreen-api.html | Exited with code
> 1 during test run
> PROCESS-CRASH | /tests/content/html/content/test/test_fullscreen-api.html |
> application crashed [@ JSContext::findVersion() const]
> Return code: 256

The failure in c#6 was also the first job to fail today for our "hdiutil" failures on this slave class, which ends up NOT allowing automation to reboot the system (causing it to fail many many more times) and prior to teh crash the log has messages about inconsistent OS state.

c.f. Bug 898519 and my copy/paste of the crash and some lines of context in its attachment https://bugzilla.mozilla.org/attachment.cgi?id=791741
Different signature, but same issue it appears. This one also led to the slave being stuck in an hdiutil error loop.
https://tbpl.mozilla.org/php/getParsedLog.php?id=26723180&tree=Fx-Team
Summary: Intermittent TEST-UNEXPECTED-FAIL | /tests/content/html/content/test/test_fullscreen-api.html | Exited with code 1 during test run | Assertion failure: savedFrameChain_ > 0, at ../../../js/src/vm/Stack.h:1208| crashed [@ JSContext::restoreFrameChain()] → Intermittent test_fullscreen-api.html | application crashed [@ JSContext::restoreFrameChain()] (after "Assertion failure: savedFrameChain_ > 0, at ../../../js/src/vm/Stack.h:1209")
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #8)
> Different signature, but same issue it appears. This one also led to the
> slave being stuck in an hdiutil error loop.
> https://tbpl.mozilla.org/php/getParsedLog.php?id=26723180&tree=Fx-Team

Same again:
https://tbpl.mozilla.org/php/getParsedLog.php?id=27214461&tree=Mozilla-Inbound
Test disabled on OS X 10.7 and 10.8, for intermittently breaking the OS X slaves (was already disabled on 10.8) meaning they require a reboot:
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/bbd8b29a4261
Whiteboard: [test disabled on OS X 10.7 & 10.8][leave open]
Assignee: general → nobody
We have been using the same system API for Fullscreen API among all OS X versions since bug 1105939. As that API has been using on OS X 10.6 for a long time, and doesn't seem to break 10.6 slaves, I suppose it is unlikely to break 10.7+ slaves now.
Blocks: 1170369
Status: NEW → RESOLVED
Closed: 9 years ago
Depends on: 1105939
Resolution: --- → FIXED
Whiteboard: [test disabled on OS X 10.7 & 10.8][leave open]
Assignee: nobody → quanxunzhen
Target Milestone: --- → mozilla41
You need to log in before you can comment on or make changes to this bug.