Closed Bug 437449 Opened 17 years ago Closed 15 years ago

top crash [@ nsCycleCollector::MarkRoots(GCGraphBuilder&)][@ nsBaseContentList::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)]

Categories

(Core :: XPCOM, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9.2
Tracking Status
status1.9.2 --- beta5-fixed
status1.9.1 --- wanted

People

(Reporter: morac, Assigned: peterv)

References

()

Details

(Keywords: crash, regression, topcrash, Whiteboard: [crashkill][sg:critical?])

Crash Data

Attachments

(3 files, 2 obsolete files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008051206 Firefox/3.0 Firefox 3.0 RC1 crashed once when closing a window yesterday. I didn't think too much about it until I looked up the crash on crash-stats and found about 1200 identical crashes (all Windows users). http://crash-stats.mozilla.com/report/index/fa1025bd-3192-11dd-a04d-001cc45a2ce4 http://crash-stats.mozilla.com/report/list?range_unit=weeks&version=Firefox%3A3.0&range_value=2&signature=nsCycleCollector%3A%3AMarkRoots(GCGraphBuilder%26) I couldn't find any other bug reports on this (save bug 383763 which doesn't appear to be related), so I figured I'd report it. Reproducible: Couldn't Reproduce
Version: unspecified → Trunk
Attached file WinDbg trace
I didn't close a page. My Firefox crashed after restarting my RC browser to update to 3.0.1 after i restarted it twice due to crashes that might have been caused by OkCupid pages (with OkC's IM OFF, AFAIK). crash-stats.mozilla.com is taking too long, so i'll just add my crash trace as it also says MarkRoots: 0:001> kp ChildEBP RetAddr 0012baf4 605e558c xul!nsCycleCollector::MarkRoots(class GCGraphBuilder * builder = 0x0012bb08)+0x42d [e:\fx19rel\winnt_5.2_depend\mozilla\xpcom\base\nscyclecollector.cpp @ 1513] 0012bb5c 60587d82 xul!nsCycleCollector::BeginCollection(void)+0x84 [e:\fx19rel\winnt_5.2_depend\mozilla\xpcom\base\nscyclecollector.cpp @ 2379] 0012bb70 60139c25 xul!XPCCycleCollectGCCallback(struct JSContext * cx = 0x00d040b0, JSGCStatus status = 20151969 (No matching enumerant))+0xe2 [e:\fx19rel\winnt_5.2_depend\mozilla\js\src\xpconnect\src\nsxpconnect.cpp @ 440] 0012bc30 60101e40 js3250!js_GC(struct JSContext * cx = 0x00d06ae0, JSGCInvocationKind gckind = GC_NORMAL (0))+0x2b5 [e:\fx19rel\winnt_5.2_depend\mozilla\js\src\jsgc.c @ 3239] 0012bc44 60610e08 js3250!JS_GC(struct JSContext * cx = 0x605fd185)+0x30 [e:\fx19rel\winnt_5.2_depend\mozilla\js\src\jsapi.c @ 2469] 0012bd0c 605fd185 xul!nsXPConnect::Collect(void)+0x80 [e:\fx19rel\winnt_5.2_depend\mozilla\js\src\xpconnect\src\nsxpconnect.cpp @ 530] 0012fbbc 6064d13f xul!nsCycleCollector::Collect(unsigned int aTryCollections = 1)+0x87 [e:\fx19rel\winnt_5.2_depend\mozilla\xpcom\base\nscyclecollector.cpp @ 2250] 0012fbc4 6064d11e xul!nsCycleCollector_collect(void)+0x11 [e:\fx19rel\winnt_5.2_depend\mozilla\xpcom\base\nscyclecollector.cpp @ 2899] 0012fbc8 6075e091 xul!nsJSContext::CC(void)+0x2d [e:\fx19rel\winnt_5.2_depend\mozilla\dom\src\base\nsjsenvironment.cpp @ 3346] 0012fbf0 605478d7 xul!nsJSContext::MaybeCC+0x132e9e 0012fc14 605dc8ae xul!nsObserverService::NotifyObservers(class nsISupports * aSubject = 0x40c00019, char * aTopic = 0x0000000c "--- memory read error at address 0x0000000c ---", wchar_t * someData = 0x0000003b "--- memory read error at address 0x0000003b ---")+0x137 [e:\fx19rel\winnt_5.2_depend\mozilla\xpcom\ds\nsobserverservice.cpp @ 181] 00000000 00000000 xul!nsUITimerCallback::Notify(class nsITimer * aTimer = <Memory access error>)+0x75 [e:\fx19rel\winnt_5.2_depend\mozilla\content\events\src\nseventstatemanager.cpp @ 212]
If you can still reproduce this, can you please post some crash IDs (from about:crashes)? The ones in comment 0 are no longer valid.
Keywords: crash
Summary: Crash in nsCycleCollector::MarkRoots on window close → Crash in [@ nsCycleCollector::MarkRoots] on window close
This crash seems to be a topcrash, though I didn't comment to that effect before. It still exists in 3.0.6 (currently #13). This also might still exist on the 1.9.1 branch (#48 for beta 2), though I don't see it in 3.1b3pre (could be the type of user?). Does the WinDbg trace from comment 1 help determine what's going on here?
Flags: wanted1.9.0.x?
Keywords: topcrash
Flags: wanted1.9.0.x? → wanted1.9.0.x+
Summary: Crash in [@ nsCycleCollector::MarkRoots] on window close → top crash [@ nsCycleCollector::MarkRoots(GCGraphBuilder&)]
I'm actually just going to confirm this as a topcrasher (currently #10). Who can I get to look at it? (Is this even the right component?) Breakpad reports the following stack, taken from bp-38b3417f-bf27-4660-8ad6-3c29f2090625 (though I expect the WinDbg trace to be more helpful): Frame Module Signature Source 0 xul.dll nsCycleCollector::MarkRoots(GCGraphBuilder&) mozilla/xpcom/base/nsCycleCollector.cpp:1513 1 xul.dll nsCycleCollector::BeginCollection() mozilla/xpcom/base/nsCycleCollector.cpp:2368 2 xul.dll XPCCycleCollectGCCallback mozilla/js/src/xpconnect/src/nsXPConnect.cpp:440 3 js3250.dll js_GC mozilla/js/src/jsgc.c:3229 4 js3250.dll JS_GC mozilla/js/src/jsapi.c:2469 5 xul.dll nsXPConnect::Collect() mozilla/js/src/xpconnect/src/nsXPConnect.cpp:529 6 xul.dll nsCycleCollector::Collect(unsigned int) mozilla/xpcom/base/nsCycleCollector.cpp:2250 7 xul.dll nsCycleCollector_collect() mozilla/xpcom/base/nsCycleCollector.cpp:2898 8 xul.dll nsJSContext::CC() mozilla/dom/src/base/nsJSEnvironment.cpp:3346 9 xul.dll xul.dll@0x2d41fb
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: blocking1.9.0.13?
Peterv: Can you take a look at the crash reporter and WinDbg stacks above and see if there's any problem we can fix here?
Assignee: nobody → peterv
Flags: wanted1.9.1.x+
Flags: blocking1.9.1.1?
I just got this again today in the release version of Firefox 3.5. I was testing doing removeChild and appendChild with user created javascript elements. http://crash-stats.mozilla.com/report/index/50326ee5-4120-4836-ac22-03a1b2090709
(In reply to comment #5) > Peterv: Can you take a look at the crash reporter and WinDbg stacks above and > see if there's any problem we can fix here? There's not much we can do with just the stack traces, at least I don't see anything obviously wrong. Maybe dbaron does?
Flags: blocking1.9.0.13?
Flags: blocking1.9.1.1?
dbaron: Any thoughts here?
Event happened July 29, 2009 I opened chats and a link to a game, maybe 10 tabs on 1 window at a time. Always done that so, i dont think its coz i have too many tabs open? I thought the problem was my laptop(just bought it), then i turned on my desktop, same thing happened when i tried to go to the same websites that made my laptop crash. Can anybody help me with this? Frame Module Signature [Expand] Source 0 xul.dll nsCycleCollector::MarkRoots xpcom/base/nsCycleCollector.cpp:1535 1 xul.dll nsCycleCollector::BeginCollection xpcom/base/nsCycleCollector.cpp:2421 2 xul.dll XPCCycleCollectGCCallback js/src/xpconnect/src/nsXPConnect.cpp:390 3 js3250.dll js_GC js/src/jsgc.cpp:3500 4 js3250.dll JS_GC js/src/jsapi.cpp:2458 5 xul.dll nsXPConnect::Collect js/src/xpconnect/src/nsXPConnect.cpp:477 6 xul.dll nsCycleCollector::Collect xpcom/base/nsCycleCollector.cpp:2340 7 xul.dll nsCycleCollector_collect xpcom/base/nsCycleCollector.cpp:2999 8 xul.dll nsJSContext::CC dom/src/base/nsJSEnvironment.cpp:3461 9 xul.dll nsUserActivityObserver::Observe dom/src/base/nsJSEnvironment.cpp:266
Flags: wanted1.9.1.x+
This might be the same bug 521750, but we won't know until we fix it.
Depends on: 521750
Whiteboard: [crashkill]
The current crash list also shows crashes on OS X with this signature.
OS: Windows XP → All
Hardware: x86 → All
(In reply to comment #11) > This might be the same bug 521750, but we won't know until we fix it. No, it's pretty clearly different from bug 521750 based on the data, but I've been assuming for the past month or so that it was the same as bug 500105.
Never nominated, but marking blocking1.9.2- to explicitly mark [CrashKill] bugs as either blocking or not. If we can get a patch before RC, we should really consider taking it.
blocking2.0: --- → ?
Flags: blocking1.9.2-
I looked at 6 MarkRoots crashes for Firefox 3.6b3. Here's a summary of what I saw in all 6. The first and fourth are particularly interesting and seem to point to at least a significant fraction of these crashes (and perhaps more of these, and other cycle collection crashes) being related to XPCWrappedNatives. http://crash-stats.mozilla.com/report/index/0102e581-38bb-498d-9c9b-185952091121 The top 3 frames of the stack (all inlined within MarkRoots!) are actually: canonicalize GCGraphBuilder::NoteXPCOMChild XPCWrappedNative::cycleCollection::Traverse The XPCWrappedNative (EDI) is 0x18127070. It's mIdentity (ESI) is 0x0b00b470. mIdentity's vtable pointer (ECX) is 0x00000000. Thus we crash. http://crash-stats.mozilla.com/report/index/2a379c96-320e-4ef4-83d0-a35642091121 This one is really just in the very beginning of GCGraphBuilder::Traverse inlined into MarkRoots. queue::mNext ([ESP+0x60]) is queue::mBlockEnd ([ESP+0x64]) is builder (EBX) is 0x0012BB78 builder.mCurrPI ([EBX + 0x40]) is pi/aPtrInfo (EBP) is 0x05F6737C (from EBP) builder.mEdgeBuilder.mCurrent and mCurrPI->mFirstChild ([EBX+0x14], EAX, [EBP+0x10]) is 0x05C0C850 (from EAX) aPtrInfo->mParticipant is ESI, 0x51A5D134 that smells like a bad address, and we crash dereferencing it (trying to load its vtable pointer in order to call its Traverse method) http://crash-stats.mozilla.com/report/index/3c376f19-e853-4ab4-b921-660202091121 Looks just like the previous, except we got one instruction further: aPtrInfo->mParticipant is ESI, 0x69239408 aPtrInfo->mParticipant's vtable pointer (ECX) is 0x00000002 and we crash dereferencing that. http://crash-stats.mozilla.com/report/index/546343e5-afe5-4dd4-8fea-d16af2091121 This one looks just like the first one. The top of the stack again looks like: canonicalize GCGraphBuilder::NoteXPCOMChild XPCWrappedNative::cycleCollection::Traverse The XPCWrappedNative (EDI) is 0x0B608B80 The wrapped native's mIdentity (ESI) is 0x0692EFB0 mIdentity's vtable pointer (ECX) is 0x00000000. http://crash-stats.mozilla.com/report/index/81f10540-46a9-48aa-9312-e02c52091121 This one looks like the second and third (more like the second). builder (EBX) is 0x0012BB20 builder.mCurrPI ([EBX + 0x40]) is pi/aPtrInfo (EBP) is 0x0FD6203C (from EBP) builder.mEdgeBuilder.mCurrent and mCurrPI->mFirstChild ([EBX+0x14], EAX, [EBP+0x10]) is 0x0E19921C (from EAX) aPtrInfo->mParticipant is ESI, 0x34A995F8 and we crash dereferencing that http://crash-stats.mozilla.com/report/index/c38ff6d0-2079-41d3-9381-a86b82091121 This one looks like second, third, and fifth (more like the third). builder (EBX) is 0x0013BB78 builder.mCurrPI ([EBX + 0x40]) is pi/aPtrInfo (EBP) is 0x0B4617B4 (from EBP) builder.mEdgeBuilder.mCurrent and mCurrPI->mFirstChild ([EBX+0x14], EAX, [EBP+0x10]) is 0x0B52ADB0 (from EAX) aPtrInfo->mParticipant is ESI, 0x10A9B5DC aPtrInfo->mParticipant's vtable (ECX) is 0x00000000
I am using Windows XP with Firefox 3.6b3. I am able to consistently produce crashes that have "MarkRoots" near the top of the stack trace. Perhaps these crashes are relevant to this bug. (If you think the fault lies with my add-ons, please let me know and I will submit this information elsewhere.) Steps to reproduce: -------- 1) Create a new profile in Firefox 3.6b3 2) Disable all extensions and plugins 3) Install Nightly Tester Tools and restart Firefox 4) Install Greasemonkey 0.8.3RC1 from here: http://userscripts.org/topics/40309 The Greasemonkey release notes say that the 3.6 compatibility flag has been set on this build, but this does not seem to be the case. So use the Nightly Tester Tools to force compatibility. 5) Install the Greasemonkey script "Facebook Purity" (v2.2b, 2009-11-20) from here: http://steeev.freehostia.com/facebook/facebook.purity.user.js 6) Login to your Facebook homepage. Wait about 10 to 20 seconds for your full newsfeed to appear. At this point you should be able to see that Facebook Purity has loaded, because it will say something like this at the top of your newsfeed: "FB Purity hid: 1 app [ Show ] 6 extra [ Show ]" 7) Click the "Logout" button to logout of Facebook, then wait about 20 seconds without clicking anything else or switching to another application. 8) If Firefox hasn't crashed yet, log back into Facebook and repeat the above steps. It should crash within a minute or so. -------- Here are a few crash reports that I produced using the above steps: http://crash-stats.mozilla.com/report/index/bp-ac4d9630-7678-4f70-8930-4e02c2091126 (I noticed that on 11/17, another user made the following comment on the above crash report: "Just installed the new userscript facebook purity from bit.ly/fbpurity and visited facebook.com") http://crash-stats.mozilla.com/report/index/bp-ce6b5897-6d45-471e-bd56-a81162091126 http://crash-stats.mozilla.com/report/index/d4ba7f22-5c82-4042-97b7-0b9b32091126 http://crash-stats.mozilla.com/report/index/ea21e8c6-912f-4473-8327-554532091126 (Hmm, this one is different; it doesn't have "MarkRoots" in it.) If I revert to the previous version of the Greasemonkey script "Facebook Purity" (v2.1b, 2009-11-11), the crashing problem goes away: http://steeev.freehostia.com/facebook/facebook.purity.user.js.old20 I am attaching a diff from the old version of the script (which doesn't crash) to the new version of the script (which crashes). The diff is called facebook_purity_diff.txt.
Thanks you so much for those steps to reproduce, I've managed to catch the crash in a debugger. Something is very wrong here, I'm seeing a bunch of "###!!! ASSERTION: Clearing a preserved wrapper!: '!PreservingWrapper()'", coming through XPC_WN_NoHelper_Finalize. The scriptable helper looks like nsContentListSH, no idea yet why we are in a *_NoHelper_* finalize hook.
Status: NEW → ASSIGNED
The *_NoHelper_* is fine, we don't have Finalize hooks anymore for DOM objects (except Window). I think we're missing tracing code for the cached wrapper on nsContentList. It relies on nsBaseContentList for cycle collection, but that doesn't have a cached wrapper.
Attached patch v1 (obsolete) — Splinter Review
This fixes the bug for me.
What I think is happening is that the greasemonkey script gets a NodeList with nodes from a content document. The NodeList is in a security wrapper. The script tries to get list[0], so we end up in XPCWrapper::ResolveNativeProperty. ResolveNativeProperty calls MaybePreserveWrapper, which ends up in nsDOMClassInfo::PreserveNodeWrapper and we preserve the wrapper (setting a flag in the wrapper cache and adding the native object to the preserved wrapper hash). For objects with preserved wrappers we absolutely need their cycle collection code to traverse to and trace the preserved wrapper, or we will not mark them. So we do not mark the wrapper, which ends up being finalized, clearing the wrapper cache and the flag, but the native object is still in the hash of preserved wrappers (the cycle collection code didn't remove it). The native object dies, and it doesn't know it needs to remove itself from the hash. At some point we enumerate through the hash and try to use the stale pointer. The patch doesn't do this yet, but to be completely safe we also need to remove the objects from the hash in their destructor. However, I wonder if we really want XPConnect to preserve wrappers for objects that we don't preserve wrappers for in the DOM code itself. The DOM code only preserves wrappers for nodes and XHR, seems like the security wrappers should really do the same.
Did you want to request review? It would be good to try to get this in to 1.9.2 RC.
Attached patch v2 (obsolete) — Splinter Review
I actually would like to take this one instead. This only ever preserves wrappers for nodes and XHR, both from DOMCI and from the XPCWrapper code. I removed the tracing of the cached wrapper from nsDOMCSSAttributeDeclaration and nsComputedDOMStyle, if they never preserve their wrapper then we'll never trace/traverse it anyway. I added debugging code to nsContentUtils to make sure that if we try to preserve a wrapper then the cycle collector code for its native traverses to and traces the wrapper. In a build with just the debugging code we fail the assertion when we try to preserve the wrapper for a nsContentList, with XPCWrapper::ResolveNativeProperty on the stack. With the rest of the patch applied the assertion doesn't fail and we don't crash. I'm trying to make a mochitest for this, but it's not done and it'll have to wait till tomorrow.
Attachment #414750 - Attachment is obsolete: true
Attachment #414768 - Flags: review?(jst)
I think that this should be blocking, now that we understand what's going on.
Flags: blocking1.9.2- → blocking1.9.2?
Summary: top crash [@ nsCycleCollector::MarkRoots(GCGraphBuilder&)] → top crash [@ nsCycleCollector::MarkRoots(GCGraphBuilder&)][@ nsBaseContentList::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)]
I find it hard to disagree!
Flags: blocking1.9.2? → blocking1.9.2+
Flags: wanted1.9.2?
Group: core-security
Priority: -- → P2
Whiteboard: [crashkill] → [crashkill][sg:critical?]
Target Milestone: --- → mozilla1.9.2
Attachment #414768 - Flags: review?(jst) → review+
I've checked this in on trunk and branch, but now there's a bunch of orange. I've found an issue with the patch, so I checked in a fix for that on trunk (http://hg.mozilla.org/mozilla-central/rev/71e77f546ddf). There's still orange, so I've tried backing out from branch, but that didn't resolve the orange on branch either. Not sure what's going on. I'm going to back out from trunk too and try again later.
(In reply to comment #27) > I've checked this in on trunk and branch, but now there's a bunch of orange. > I've found an issue with the patch, so I checked in a fix for that on trunk > (http://hg.mozilla.org/mozilla-central/rev/71e77f546ddf). There's still orange, > so I've tried backing out from branch, but that didn't resolve the orange on > branch either. Not sure what's going on. I'm going to back out from trunk too > and try again later. The orange was because the test server's cert expired at 9:45AM this morning, and should be fixed now. See bug 531590.
http://hg.mozilla.org/mozilla-central/rev/ed6b195bae6d http://hg.mozilla.org/releases/mozilla-1.9.2/rev/45101239c881 Thanks to dbaron's swift action this should be in today's nightlies. We need to take this on the 1.9.1 branch too. 1.9.0 is unaffected (didn't have cached wrappers). I'll hold off on checking in a testcase for now.
Status: ASSIGNED → RESOLVED
blocking1.9.1: --- → ?
blocking2.0: ? → ---
Closed: 15 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Attached patch v2Splinter Review
Patch that was checked in.
Attachment #414768 - Attachment is obsolete: true
Attachment #414958 - Flags: review+
Attachment #414958 - Flags: approval1.9.1.7?
Comment on attachment 414958 [details] [diff] [review] v2 This doesn't apply on 1.9.1 (which doesn't have slim wrappers).
Attachment #414958 - Flags: approval1.9.1.7?
(In reply to comment #31) > This doesn't apply on 1.9.1 (which doesn't have slim wrappers). ... and so the 1.9.1 crash must be a completely different crash with the same signature. I think I'll file a separate bug for the 1.9.1 crash (which might also still happen on 1.9.2 and trunk?). Testcase for this bug does cause leaks on 1.9.1, I'll also file a separate bug for that.
blocking1.9.1: ? → ---
Flags: wanted1.9.0.x+ → wanted1.9.0.x-
Doesn't FileReader also preserve wrappers the way that XHR does? I think the intent was for them to work the same.
FileReader uses the same DOMCI helper as XHR.
(In reply to comment #32) > I think I'll file a separate bug for the 1.9.1 crash (which might > also still happen on 1.9.2 and trunk?). Bug 531961. > Testcase for this bug does cause leaks > on 1.9.1, I'll also file a separate bug for that. Bug 531964.
Crash Signature: [@ nsCycleCollector::MarkRoots(GCGraphBuilder&)] [@ nsBaseContentList::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)]
Group: core-security
Issue is Resolved - removing QA-Wanted Keywords - QA-Wanted query clean-up task
Crash Signature: [@ nsCycleCollector::MarkRoots(GCGraphBuilder&)] [@ nsBaseContentList::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)] → [@ nsCycleCollector::MarkRoots(GCGraphBuilder&)] [@ nsBaseContentList::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: