Closed Bug 975876 Opened 6 years ago Closed 6 years ago

Intermittent test_TLD.html | application crashed [@ MarkInternal<JSObject>]

Categories

(Core :: JavaScript: GC, defect, critical)

x86
Gonk (Firefox OS)
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla30
Tracking Status
firefox28 --- unaffected
firefox29 --- unaffected
firefox30 --- fixed
firefox-esr24 --- unaffected

People

(Reporter: philor, Assigned: billm)

References

Details

(4 keywords)

Attachments

(2 files)

Of course, with the crap logging for these tests, it's pretty much impossible to tell a hang followed by an assertion and crash during the "crash" that's intended to show the stack of the hang from an assertion and crash followed by a hang, so this might be a no-output-timeout, or it might be the crash. I'm picking the crash for the summary for starring purposes, because it tends to wind up less manged in the log.

https://tbpl.mozilla.org/php/getParsedLog.php?id=35123559&tree=B2g-Inbound
b2g_emulator_vm b2g-inbound debug test mochitest-debug-8 on 2014-02-23 13:02:34 PST for push 51634e9e92d5
slave: tst-linux64-spot-157

13:38:09     INFO -  488 INFO TEST-START | /tests/dom/encoding/test/test_TLD.html
13:38:09     INFO -  [Child 708] WARNING: Transparent content with displayports can be expensive.: file ../../../gecko/layout/base/nsDisplayList.cpp, line 1220
13:38:09     INFO -  [Child 708] WARNING: Transparent content with displayports can be expensive.: file ../../../gecko/layout/base/nsDisplayList.cpp, line 1220
13:38:09     INFO -  [Child 708] WARNING: Transparent content with displayports can be expensive.: file ../../../gecko/layout/base/nsDisplayList.cpp, line 1220
13:38:09     INFO -  [Child 708] WARNING: Transparent content with displayports can be expensive.: file ../../../gecko/layout/base/nsDisplayList.cpp, line 1220
13:38:09     INFO -  [Child 708] WARNING: NS_ENSURE_TRUE(NS_SUCCEEDED(rv) && subjPrincipal) failed: file ../../../gecko/docshell/base/nsDocShell.cpp, line 8640
13:38:09     INFO -  [Child 708] WARNING: Transparent content with displayports can be expensive.: file ../../../gecko/layout/base/nsDisplayList.cpp, line 1220
13:38:09     INFO -  [Child 708] WARNING: NS_ENSURE_TRUE(globalObject && globalObject->GetGlobalJSObject()) failed: file ../../../../../gecko/content/html/document/src/nsHTMLContentSink.cpp, line 717
13:38:09     INFO -  [Child 708] WARNING: Subdocument container has no frame: file ../../../gecko/layout/base/nsDocumentViewer.cpp, line 2413
13:38:09     INFO -  [Child 708] WARNING: Transparent content with displayports can be expensive.: file ../../../gecko/layout/base/nsDisplayList.cpp, line 1220
13:38:09     INFO -  [Child 708] WARNING: Transparent content with displayports can be expensive.: file ../../../gecko/layout/base/nsDisplayList.cpp, line 1220
13:38:09     INFO -  [Child 708] WARNING: Subdocument container has no frame: file ../../../gecko/layout/base/nsDocumentViewer.cpp, line 2413
13:38:09     INFO -  [Child 708] WARNING: Subdocument container has no frame: file ../../../gecko/layout/base/nsDocumentViewer.cpp, line 2413
13:38:09     INFO -  [Child 708] WARNING: Subdocument container has no frame: file ../../../gecko/layout/base/nsDocumentViewer.cpp, line 2413
13:38:09     INFO -  [Child 708] WARNING: Subdocument container has no frame: file ../../../gecko/layout/base/nsDocumentViewer.cpp, line 2413
13:38:09     INFO -  489 INFO TEST-INFO | MEMORY STAT vsize after test: 122023936
13:38:09     INFO -  490 INFO TEST-INFO | MEMORY STAT residentFast after test: 64163840
13:38:09     INFO -  491 INFO TEST-INFO | MEMORY STAT heapAllocated after test: 24117108
13:38:09     INFO -  492 INFO TEST-END | /tests/dom/encoding/test/test_TLD.html | finished in 7895ms
13:38:09     INFO -  [Child 708] WARNING: Transparent content with displayports can be expensive.: file ../../../gecko/layout/base/nsDisplayList.cpp, line 1220
13:38:09     INFO -  [Child 708] WARNING: Transparent content with displayports can be expensive.: file ../../../gecko/layout/base/nsDisplayList.cpp, line 1220
13:38:09     INFO -  ############ ErrorPage.js
13:38:09     INFO -  [Parent 653] WARNING: RemoveObserver() called for unregistered observer: file ../../gecko/hal/Hal.cpp, line 204
13:45:40     INFO -  [Parent 653] WARNING: RemoveObserver() called for unregistered B2GRunner TEST-UNEXPECTED-FAIL | /tests/dom/encoding/test/test_TLD.html | application timed out after 450.0 seconds with no output
13:45:40     INFO -  B2GRunner INFO | checking for crashes in '/data/local/tests/profile/minidumps'
13:45:57     INFO -  observer: file ../../gecko/hal/Hal.cpp, line 204
13:45:57     INFO -  [Parent 653] WARNING: RemoveObserver() called for unregistered observer: file ../../gecko/hal/Hal.cpp, line 204
13:45:57     INFO -  [Parent 653] WARNING: RemoveObserver() called for unregistered observer: file ../../gecko/hal/Hal.cpp, line 204
13:45:57     INFO -  [Parent 653] WARNING: RemoveObserver() called for unregistered observer: file ../../gecko/hal/Hal.cpp, line 204
13:45:57     INFO -  [Parent 653] WARNING: NS_ENSURE_TRUE(mRegistered) failed: file ../../../gecko/dom/telephony/ipc/TelephonyParent.cpp, line 84
13:45:57     INFO -  [Parent 653] WARNING: NS_ENSURE_TRUE(mCallback) failed: file ../../../../gecko/content/base/src/nsFrameMessageManager.cpp, line 639
13:45:57     INFO -  ************************************************************
13:45:57     INFO -  * Call to xpconnect wrapped JSObject produced this error:  *
13:45:57     INFO -  [Exception... "Component returned failure code: 0xc1f30001 (NS_ERROR_NOT_INITIALIZED) [nsIMessageSender.sendAsyncMessage]"  nsresult: "0xc1f30001 (NS_ERROR_NOT_INITIALIZED)"  location: "JS frame :: chrome://specialpowers/content/SpecialPowersObserver.js :: SpecialPowersObserver.prototype._sendAsyncMessage :: line 96"  data: no]
13:45:57     INFO -  ************************************************************
13:45:57  WARNING -  PROCESS-CRASH | /tests/dom/encoding/test/test_TLD.html | application crashed [@ MarkInternal<JSObject>]
13:45:57     INFO -  Crash dump filename: /tmp/tmp7BMVHX/61f5ac1e-a6b0-9a3a-532e6601-23c40a80.dmp
13:45:57     INFO -  Operating system: Android
13:45:57     INFO -                    0.0.0 Linux 2.6.29-00299-g661373c1 #6 Fri Jan 10 14:42:34 CST 2014 armv7l Android/full/generic:4.0.4.0.4.0.4/OPENMASTER/eng.cltbld.20140223.141950:eng/test-keys
13:45:57     INFO -  CPU: arm
13:45:57     INFO -       0 CPUs
13:45:57     INFO -  Crash reason:  SIGSEGV
13:45:57     INFO -  Crash address: 0x0
13:45:57     INFO -  Thread 0 (crashed)
13:45:57     INFO -   0  libxul.so!MarkInternal<JSObject> [Marking.cpp:51634e9e92d5 : 180 + 0x4]
13:45:57     INFO -       r4 = 0x43eee2b0    r5 = 0x45709940    r6 = 0x44f0f538    r7 = 0x4215d058
13:45:57     INFO -       r8 = 0x429786f4    r9 = 0x43eee000   r10 = 0xbeea73e0    fp = 0x00000000
13:45:57     INFO -       sp = 0xbeea7328    lr = 0x41aa6537    pc = 0x41aa653a
13:45:57     INFO -      Found by: given as instruction pointer in context
13:45:57     INFO -   1  libxul.so!js::gc::MarkObject(JSTracer*, js::BarrieredPtr<JSObject, unsigned int>*, char const*) + 0x11
13:45:57     INFO -       r4 = 0x405cd1e5    r5 = 0x43eee2b0    r6 = 0xbeea737c    r7 = 0x00000000
13:45:57     INFO -       r8 = 0x0000000c    r9 = 0x00000001   r10 = 0xbeea73e0    fp = 0x00000000
13:45:57     INFO -       sp = 0xbeea7350    pc = 0x41aa6843
13:45:57     INFO -      Found by: call frame info
13:45:57     INFO -   2  libxul.so!JS_CallHeapObjectTracer(JSTracer*, JS::Heap<JSObject*>*, char const*) [Tracer.cpp:51634e9e92d5 : 66 + 0x3]
13:45:57     INFO -       r4 = 0x405cd1e5    r5 = 0x43eee2b0    r6 = 0xbeea737c    r7 = 0x00000000
13:45:57     INFO -       r8 = 0x0000000c    r9 = 0x00000001   r10 = 0xbeea73e0    fp = 0x00000000
13:45:57     INFO -       sp = 0xbeea7358    pc = 0x41ab3603
13:45:57     INFO -      Found by: call frame info
13:45:57     INFO -   3  libxul.so!JsGcTracer::Trace(JS::Heap<JSObject*>*, char const*, void*) const [CycleCollectedJSRuntime.cpp:51634e9e92d5 : 748 + 0x5]
13:45:57     INFO -       r4 = 0x405cd1e5    r5 = 0x43eee2b0    r6 = 0xbeea737c    r7 = 0x00000000
13:45:57     INFO -       r8 = 0x0000000c    r9 = 0x00000001   r10 = 0xbeea73e0    fp = 0x00000000
13:45:57     INFO -       sp = 0xbeea7360    pc = 0x405cd1ed
13:45:57     INFO -      Found by: call frame info
13:45:57     INFO -   4  libxul.so!mozilla::nsDOMCameraControl::cycleCollection::Trace(void*, TraceCallbacks const&, void*) [nsWrapperCache.h:51634e9e92d5 : 175 + 0x11]
13:45:57     INFO -       r4 = 0x405cd1e5    r5 = 0x43eee2b0    r6 = 0xbeea737c    r7 = 0x00000000
13:45:57     INFO -       r8 = 0x0000000c    r9 = 0x00000001   r10 = 0xbeea73e0    fp = 0x00000000
13:45:57     INFO -       sp = 0xbeea7368    pc = 0x4101bfa9
13:45:57     INFO -      Found by: call frame info
13:45:57     INFO -   5  libxul.so!TraceJSHolder [CycleCollectedJSRuntime.cpp:51634e9e92d5 : 764 + 0x3]
13:45:57     INFO -       r4 = 0x4101bf81    r5 = 0x44f0f530    r6 = 0x405cc445    r7 = 0x00000000
13:45:57     INFO -       r8 = 0x0000000c    r9 = 0x00000001   r10 = 0xbeea73e0    fp = 0x00000000
13:45:57     INFO -       sp = 0xbeea7378    pc = 0x405cc407
13:45:57     INFO -      Found by: call frame info
...
13:47:10     INFO -  02-23 21:38:07.629   708   708 F MOZ_Assert: Assertion failure: !thing->zone()->scheduledForDestruction, at ../../../gecko/js/src/gc/Marking.cpp:154
Pinging our GC gurus for a frequent B2G mochitest crash.
Flags: needinfo?(terrence)
Flags: needinfo?(sphink)
Flags: needinfo?(jcoppeard)
This comment explains the assertion:

http://dxr.mozilla.org/comm-central/source/mozilla/js/src/jsgc.cpp#3099

So I think this would imply that the live edge being marked is cross compartment and no wrapper exists. Andrew would probably be able to give a better analysis from the browser's side.
Flags: needinfo?(terrence)
Flags: needinfo?(sphink)
Flags: needinfo?(jcoppeard)
Flags: needinfo?(continuation)
It looks like this is all happening inside the GC tracing for mozilla::dom::CameraCapabilities, which was refactored recently in bug 909542. That landed a few days before this started showing up, but it is probably still related.  I'll try looking over the JS API stuff that is doing to try to figure out what might be going wrong, and try to understand what exactly this assertion is...
Assignee: nobody → continuation
Flags: needinfo?(continuation)
You can't tell when this started from looking in the bug, because debug b2g mochitests were hidden for most of the month, and I gave up on looking at hidden jobs partway through the month, so at the time that landed absolutely nobody was looking at any debug b2g failures.
Ah, thanks for the explanation.  I figured there might be something like that going on.  I was surprised to see we were running B2G debug builds at all. ;)
Of course, we were still running them, and in fact by looking at https://tbpl.mozilla.org/?tree=B2g-Inbound&rev=e93e96ed8a5a&jobname=mochitest-debug-8 and then hitting the green down arrow a few times you can see we actually hit this on the push when bug 909542 landed the final time, and while there was enormous mochitest-8 orange prior to that, clear back to when I stopped starring it, it was all other things, not this.
Blocks: 909542
Thanks for tracking that down, philor!
Depends on: 977260
Bill, do you have any idea what the general problem here might be?  The comment Terrence linked to in comment 19 says that maybeAlive should be set if an object from a zone was added during root marking, but here we're hitting this fatal assertion ( Assertion failure: !thing->zone()->scheduledForDestruction, at ../../../gecko/js/src/gc/Marking.cpp:154 ) during root marking, which seems weird to me?  I'm going to try to dig around this code some more to try to figure it out, but if you have any thoughts it would be appreciate.
Flags: needinfo?(wmccloskey)
No longer depends on: 977260
Attached patch dead-zone-fixSplinter Review
I think I know what the problem is.

The system is supposed to work as follows. If you think you're going to be touching "dead zones" (zones that the GC didn't think should live when the most recent collection started), then you put AutoMaybeTouchDeadZones. Any code within that scope that will actually cause an object in a dead zone to be marked is supposed to use AutoMarkInDeadZone. When the scope of AutoMaybeTouchDeadZones ends, it checks if you actually used AutoMarkInDeadZone on a dead zone; if you did, it does a non-incremental GC right then to avoid leaking the dead zone.

The purpose of this assertion is to ensure that any marking of dead zones that happens during AutoMaybeTouchDeadZones happens within the scope of AutoMarkInDeadZone, since we need that code to set the flag that tells AutoMaybeTouchDeadZones to GC. I'm wondering now if it would be easier to just have the marking code directly set the flag. I guess that would be a little slower, and maybe it's also nice to directly call out the places where this sort of marking can happen.

Anyway, I think the problem is that ~AutoMaybeTouchDeadZones needs to unset the gcManipulatingDeadZones flag *before* it does the full GC. Otherwise, if the full GC happens to mark the dead zone, we'll assert.

As an aside, it seems likely that something is going wrong here with this camera code. I don't understand why a once-dead zone should get revived like this, especially so frequently. But I don't have time to look into it, and this patch does fix an actual problem.
Assignee: continuation → wmccloskey
Status: NEW → ASSIGNED
Attachment #8383204 - Flags: review?(luke)
Flags: needinfo?(wmccloskey)
Thanks Bill!

I have no idea why this is ending up reviving a zone.  It seems like it is just a conventional wrapper cache.
Attachment #8383204 - Flags: review?(luke) → review+
Looks like that worked. Anything else need doing here or can we close it out?
Flags: needinfo?(wmccloskey)
Let's close it.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Flags: needinfo?(wmccloskey)
Resolution: --- → FIXED
Component: General → JavaScript: GC
Product: Firefox OS → Core
Target Milestone: --- → mozilla30
Version: unspecified → Trunk
You need to log in before you can comment on or make changes to this bug.