top crash [@ nsCycleCollector::MarkRoots(GCGraphBuilder&)][@ nsBaseContentList::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)]

RESOLVED FIXED in mozilla1.9.2

Status

()

defect
P2
critical
RESOLVED FIXED
11 years ago
5 years ago

People

(Reporter: morac, Assigned: peterv)

Tracking

({crash, regression, topcrash})

Trunk
mozilla1.9.2
Points:
---
Bug Flags:
blocking1.9.2 +
wanted1.9.0.x -
in-testsuite ?

Firefox Tracking Flags

(status1.9.2 beta5-fixed, status1.9.1 wanted)

Details

(Whiteboard: [crashkill][sg:critical?], crash signature, )

Attachments

(3 attachments, 2 obsolete attachments)

Reporter

Description

11 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008051206 Firefox/3.0

Firefox 3.0 RC1 crashed once when closing a window yesterday. I didn't think too much about it until I looked up the crash on crash-stats and found about 1200 identical crashes (all Windows users).

http://crash-stats.mozilla.com/report/index/fa1025bd-3192-11dd-a04d-001cc45a2ce4

http://crash-stats.mozilla.com/report/list?range_unit=weeks&version=Firefox%3A3.0&range_value=2&signature=nsCycleCollector%3A%3AMarkRoots(GCGraphBuilder%26)

I couldn't find any other bug reports on this (save bug 383763 which doesn't appear to be related), so I figured I'd report it.

Reproducible: Couldn't Reproduce
Version: unspecified → Trunk

Comment 1

11 years ago
Posted file WinDbg trace
I didn't close a page. My Firefox crashed after restarting my RC browser to update to 3.0.1 after i restarted it twice due to crashes that might have been caused by OkCupid pages (with OkC's IM OFF, AFAIK).
crash-stats.mozilla.com is taking too long, so i'll just add my crash trace as it also says MarkRoots:

0:001> kp
ChildEBP RetAddr  
0012baf4 605e558c xul!nsCycleCollector::MarkRoots(class GCGraphBuilder * builder = 0x0012bb08)+0x42d [e:\fx19rel\winnt_5.2_depend\mozilla\xpcom\base\nscyclecollector.cpp @ 1513]
0012bb5c 60587d82 xul!nsCycleCollector::BeginCollection(void)+0x84 [e:\fx19rel\winnt_5.2_depend\mozilla\xpcom\base\nscyclecollector.cpp @ 2379]
0012bb70 60139c25 xul!XPCCycleCollectGCCallback(struct JSContext * cx = 0x00d040b0, JSGCStatus status = 20151969 (No matching enumerant))+0xe2 [e:\fx19rel\winnt_5.2_depend\mozilla\js\src\xpconnect\src\nsxpconnect.cpp @ 440]
0012bc30 60101e40 js3250!js_GC(struct JSContext * cx = 0x00d06ae0, JSGCInvocationKind gckind = GC_NORMAL (0))+0x2b5 [e:\fx19rel\winnt_5.2_depend\mozilla\js\src\jsgc.c @ 3239]
0012bc44 60610e08 js3250!JS_GC(struct JSContext * cx = 0x605fd185)+0x30 [e:\fx19rel\winnt_5.2_depend\mozilla\js\src\jsapi.c @ 2469]
0012bd0c 605fd185 xul!nsXPConnect::Collect(void)+0x80 [e:\fx19rel\winnt_5.2_depend\mozilla\js\src\xpconnect\src\nsxpconnect.cpp @ 530]
0012fbbc 6064d13f xul!nsCycleCollector::Collect(unsigned int aTryCollections = 1)+0x87 [e:\fx19rel\winnt_5.2_depend\mozilla\xpcom\base\nscyclecollector.cpp @ 2250]
0012fbc4 6064d11e xul!nsCycleCollector_collect(void)+0x11 [e:\fx19rel\winnt_5.2_depend\mozilla\xpcom\base\nscyclecollector.cpp @ 2899]
0012fbc8 6075e091 xul!nsJSContext::CC(void)+0x2d [e:\fx19rel\winnt_5.2_depend\mozilla\dom\src\base\nsjsenvironment.cpp @ 3346]
0012fbf0 605478d7 xul!nsJSContext::MaybeCC+0x132e9e
0012fc14 605dc8ae xul!nsObserverService::NotifyObservers(class nsISupports * aSubject = 0x40c00019, char * aTopic = 0x0000000c "--- memory read error at address 0x0000000c ---", wchar_t * someData = 0x0000003b "--- memory read error at address 0x0000003b ---")+0x137 [e:\fx19rel\winnt_5.2_depend\mozilla\xpcom\ds\nsobserverservice.cpp @ 181]
00000000 00000000 xul!nsUITimerCallback::Notify(class nsITimer * aTimer = <Memory access error>)+0x75 [e:\fx19rel\winnt_5.2_depend\mozilla\content\events\src\nseventstatemanager.cpp @ 212]
If you can still reproduce this, can you please post some crash IDs (from about:crashes)? The ones in comment 0 are no longer valid.
Keywords: crash
Summary: Crash in nsCycleCollector::MarkRoots on window close → Crash in [@ nsCycleCollector::MarkRoots] on window close
This crash seems to be a topcrash, though I didn't comment to that effect before. It still exists in 3.0.6 (currently #13). This also might still exist on the 1.9.1 branch (#48 for beta 2), though I don't see it in 3.1b3pre (could be the type of user?).

Does the WinDbg trace from comment 1 help determine what's going on here?
Flags: wanted1.9.0.x?
Keywords: topcrash
Flags: wanted1.9.0.x? → wanted1.9.0.x+
Summary: Crash in [@ nsCycleCollector::MarkRoots] on window close → top crash [@ nsCycleCollector::MarkRoots(GCGraphBuilder&)]
I'm actually just going to confirm this as a topcrasher (currently #10). Who can I get to look at it? (Is this even the right component?)

Breakpad reports the following stack, taken from bp-38b3417f-bf27-4660-8ad6-3c29f2090625 (though I expect the WinDbg trace to be more helpful):

Frame  	Module  	Signature  	Source
0 	xul.dll 	nsCycleCollector::MarkRoots(GCGraphBuilder&) 	mozilla/xpcom/base/nsCycleCollector.cpp:1513
1 	xul.dll 	nsCycleCollector::BeginCollection() 	mozilla/xpcom/base/nsCycleCollector.cpp:2368
2 	xul.dll 	XPCCycleCollectGCCallback 	mozilla/js/src/xpconnect/src/nsXPConnect.cpp:440
3 	js3250.dll 	js_GC 	mozilla/js/src/jsgc.c:3229
4 	js3250.dll 	JS_GC 	mozilla/js/src/jsapi.c:2469
5 	xul.dll 	nsXPConnect::Collect() 	mozilla/js/src/xpconnect/src/nsXPConnect.cpp:529
6 	xul.dll 	nsCycleCollector::Collect(unsigned int) 	mozilla/xpcom/base/nsCycleCollector.cpp:2250
7 	xul.dll 	nsCycleCollector_collect() 	mozilla/xpcom/base/nsCycleCollector.cpp:2898
8 	xul.dll 	nsJSContext::CC() 	mozilla/dom/src/base/nsJSEnvironment.cpp:3346
9 	xul.dll 	xul.dll@0x2d41fb
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: blocking1.9.0.13?
Peterv: Can you take a look at the crash reporter and WinDbg stacks above and see if there's any problem we can fix here?
Assignee: nobody → peterv
Flags: wanted1.9.1.x+
Flags: blocking1.9.1.1?
Reporter

Comment 6

10 years ago
I just got this again today in the release version of Firefox 3.5.  I was testing doing removeChild and appendChild with user created javascript elements. 

http://crash-stats.mozilla.com/report/index/50326ee5-4120-4836-ac22-03a1b2090709
Assignee

Comment 7

10 years ago
(In reply to comment #5)
> Peterv: Can you take a look at the crash reporter and WinDbg stacks above and
> see if there's any problem we can fix here?

There's not much we can do with just the stack traces, at least I don't see anything obviously wrong. Maybe dbaron does?
Flags: blocking1.9.0.13?
Flags: blocking1.9.1.1?
dbaron: Any thoughts here?

Comment 9

10 years ago
Event happened July 29, 2009
I opened chats and a link to a game, maybe 10 tabs on 1 window at a time.
Always done that so, i dont think its coz i have too many tabs open?
I thought the problem was my laptop(just bought it), then i turned on my desktop, same thing happened when i tried to go to the same websites that made my laptop crash. Can anybody help me with this?

Frame  	Module  	Signature [Expand]  	Source
0 	xul.dll 	nsCycleCollector::MarkRoots 	xpcom/base/nsCycleCollector.cpp:1535
1 	xul.dll 	nsCycleCollector::BeginCollection 	xpcom/base/nsCycleCollector.cpp:2421
2 	xul.dll 	XPCCycleCollectGCCallback 	js/src/xpconnect/src/nsXPConnect.cpp:390
3 	js3250.dll 	js_GC 	js/src/jsgc.cpp:3500
4 	js3250.dll 	JS_GC 	js/src/jsapi.cpp:2458
5 	xul.dll 	nsXPConnect::Collect 	js/src/xpconnect/src/nsXPConnect.cpp:477
6 	xul.dll 	nsCycleCollector::Collect 	xpcom/base/nsCycleCollector.cpp:2340
7 	xul.dll 	nsCycleCollector_collect 	xpcom/base/nsCycleCollector.cpp:2999
8 	xul.dll 	nsJSContext::CC 	dom/src/base/nsJSEnvironment.cpp:3461
9 	xul.dll 	nsUserActivityObserver::Observe 	dom/src/base/nsJSEnvironment.cpp:266
Flags: wanted1.9.1.x+
This might be the same bug 521750, but we won't know until we fix it.
Depends on: 521750
Whiteboard: [crashkill]
The current crash list also shows crashes on OS X with this signature.
OS: Windows XP → All
Hardware: x86 → All
(In reply to comment #11)
> This might be the same bug 521750, but we won't know until we fix it.

No, it's pretty clearly different from bug 521750 based on the data, but I've been assuming for the past month or so that it was the same as bug 500105.
Never nominated, but marking blocking1.9.2- to explicitly mark [CrashKill] bugs as either blocking or not.  If we can get a patch before RC, we should really consider taking it.
blocking2.0: --- → ?
Flags: blocking1.9.2-
I looked at 6 MarkRoots crashes for Firefox 3.6b3.  Here's a summary of what I saw in all 6.  The first and fourth are particularly interesting and seem to point to at least a significant fraction of these crashes (and perhaps more of these, and other cycle collection crashes) being related to XPCWrappedNatives.



http://crash-stats.mozilla.com/report/index/0102e581-38bb-498d-9c9b-185952091121

The top 3 frames of the stack (all inlined within MarkRoots!) are actually:

canonicalize
GCGraphBuilder::NoteXPCOMChild
XPCWrappedNative::cycleCollection::Traverse

The XPCWrappedNative (EDI) is 0x18127070.
It's mIdentity (ESI) is 0x0b00b470.
mIdentity's vtable pointer (ECX) is 0x00000000.

Thus we crash.


http://crash-stats.mozilla.com/report/index/2a379c96-320e-4ef4-83d0-a35642091121

This one is really just in the very beginning of GCGraphBuilder::Traverse inlined into MarkRoots.

queue::mNext ([ESP+0x60]) is
queue::mBlockEnd ([ESP+0x64]) is 
builder (EBX) is 0x0012BB78
builder.mCurrPI ([EBX + 0x40]) is pi/aPtrInfo (EBP) is 0x05F6737C (from EBP)
builder.mEdgeBuilder.mCurrent and mCurrPI->mFirstChild ([EBX+0x14], EAX, [EBP+0x10]) is 0x05C0C850 (from EAX)
aPtrInfo->mParticipant is ESI, 0x51A5D134
that smells like a bad address, and we crash dereferencing it (trying to load its vtable pointer in order to call its Traverse method)


http://crash-stats.mozilla.com/report/index/3c376f19-e853-4ab4-b921-660202091121

Looks just like the previous, except we got one instruction further:
aPtrInfo->mParticipant is ESI, 0x69239408
aPtrInfo->mParticipant's vtable pointer (ECX) is 0x00000002
and we crash dereferencing that.



http://crash-stats.mozilla.com/report/index/546343e5-afe5-4dd4-8fea-d16af2091121

This one looks just like the first one.  The top of the stack again looks like:
canonicalize
GCGraphBuilder::NoteXPCOMChild
XPCWrappedNative::cycleCollection::Traverse

The XPCWrappedNative (EDI) is 0x0B608B80
The wrapped native's mIdentity (ESI) is 0x0692EFB0
mIdentity's vtable pointer (ECX) is 0x00000000.


http://crash-stats.mozilla.com/report/index/81f10540-46a9-48aa-9312-e02c52091121

This one looks like the second and third (more like the second).

builder (EBX) is 0x0012BB20
builder.mCurrPI ([EBX + 0x40]) is pi/aPtrInfo (EBP) is 0x0FD6203C (from EBP)
builder.mEdgeBuilder.mCurrent and mCurrPI->mFirstChild ([EBX+0x14], EAX, [EBP+0x10]) is 0x0E19921C (from EAX)
aPtrInfo->mParticipant is ESI, 0x34A995F8
and we crash dereferencing that


http://crash-stats.mozilla.com/report/index/c38ff6d0-2079-41d3-9381-a86b82091121

This one looks like second, third, and fifth (more like the third).

builder (EBX) is 0x0013BB78
builder.mCurrPI ([EBX + 0x40]) is pi/aPtrInfo (EBP) is 0x0B4617B4 (from EBP)
builder.mEdgeBuilder.mCurrent and mCurrPI->mFirstChild ([EBX+0x14], EAX, [EBP+0x10]) is 0x0B52ADB0 (from EAX)
aPtrInfo->mParticipant is ESI, 0x10A9B5DC
aPtrInfo->mParticipant's vtable (ECX) is 0x00000000

Comment 16

10 years ago
I am using Windows XP with Firefox 3.6b3. I am able to consistently produce crashes that have "MarkRoots" near the top of the stack trace. Perhaps these crashes are relevant to this bug. (If you think the fault lies with my add-ons, please let me know and I will submit this information elsewhere.)

Steps to reproduce:

--------

1) Create a new profile in Firefox 3.6b3

2) Disable all extensions and plugins

3) Install Nightly Tester Tools and restart Firefox

4) Install Greasemonkey 0.8.3RC1 from here:
http://userscripts.org/topics/40309

The Greasemonkey release notes say that the 3.6 compatibility flag has been set on this build, but this does not seem to be the case. So use the Nightly Tester Tools to force compatibility.

5) Install the Greasemonkey script "Facebook Purity" (v2.2b, 2009-11-20) from here:
http://steeev.freehostia.com/facebook/facebook.purity.user.js

6) Login to your Facebook homepage. Wait about 10 to 20 seconds for your full newsfeed to appear. At this point you should be able to see that Facebook Purity has loaded, because it will say something like this at the top of your newsfeed: "FB Purity hid:  1  app [ Show  ]  6 extra [ Show ]"

7) Click the "Logout" button to logout of Facebook, then wait about 20 seconds without clicking anything else or switching to another application.

8) If Firefox hasn't crashed yet, log back into Facebook and repeat the above steps. It should crash within a minute or so.

--------

Here are a few crash reports that I produced using the above steps:

http://crash-stats.mozilla.com/report/index/bp-ac4d9630-7678-4f70-8930-4e02c2091126
(I noticed that on 11/17, another user made the following comment on the above crash report: "Just installed the new userscript facebook purity from bit.ly/fbpurity and visited facebook.com")

http://crash-stats.mozilla.com/report/index/bp-ce6b5897-6d45-471e-bd56-a81162091126

http://crash-stats.mozilla.com/report/index/d4ba7f22-5c82-4042-97b7-0b9b32091126

http://crash-stats.mozilla.com/report/index/ea21e8c6-912f-4473-8327-554532091126
(Hmm, this one is different; it doesn't have "MarkRoots" in it.)

If I revert to the previous version of the Greasemonkey script "Facebook Purity" (v2.1b, 2009-11-11), the crashing problem goes away:
http://steeev.freehostia.com/facebook/facebook.purity.user.js.old20

I am attaching a diff from the old version of the script (which doesn't crash) to the new version of the script (which crashes). The diff is called facebook_purity_diff.txt.
Assignee

Comment 18

10 years ago
Thanks you so much for those steps to reproduce, I've managed to catch the crash in a debugger.

Something is very wrong here, I'm seeing a bunch of "###!!! ASSERTION: Clearing a preserved wrapper!: '!PreservingWrapper()'", coming through XPC_WN_NoHelper_Finalize. The scriptable helper looks like nsContentListSH, no idea yet why we are in a *_NoHelper_* finalize hook.
Status: NEW → ASSIGNED
Assignee

Comment 19

10 years ago
The *_NoHelper_* is fine, we don't have Finalize hooks anymore for DOM objects (except Window).

I think we're missing tracing code for the cached wrapper on nsContentList. It relies on nsBaseContentList for cycle collection, but that doesn't have a cached wrapper.
Assignee

Comment 20

10 years ago
Posted patch v1 (obsolete) — Splinter Review
This fixes the bug for me.
Assignee

Comment 21

10 years ago
What I think is happening is that the greasemonkey script gets a NodeList with nodes from a content document. The NodeList is in a security wrapper. The script tries to get list[0], so we end up in XPCWrapper::ResolveNativeProperty. ResolveNativeProperty calls MaybePreserveWrapper, which ends up in nsDOMClassInfo::PreserveNodeWrapper and we preserve the wrapper (setting a flag in the wrapper cache and adding the native object to the preserved wrapper hash). For objects with preserved wrappers we absolutely need their cycle collection code to traverse to and trace the preserved wrapper, or we will not mark them. So we do not mark the wrapper, which ends up being finalized, clearing the wrapper cache and the flag, but the native object is still in the hash of preserved wrappers (the cycle collection code didn't remove it). The native object dies, and it doesn't know it needs to remove itself from the hash. At some point we enumerate through the hash and try to use the stale pointer.

The patch doesn't do this yet, but to be completely safe we also need to remove the objects from the hash in their destructor. However, I wonder if we really want XPConnect to preserve wrappers for objects that we don't preserve wrappers for in the DOM code itself. The DOM code only preserves wrappers for nodes and XHR, seems like the security wrappers should really do the same.
Did you want to request review?  It would be good to try to get this in to 1.9.2 RC.
Assignee

Comment 23

10 years ago
Posted patch v2 (obsolete) — Splinter Review
I actually would like to take this one instead. This only ever preserves wrappers for nodes and XHR, both from DOMCI and from the XPCWrapper code. I removed the tracing of the cached wrapper from nsDOMCSSAttributeDeclaration and nsComputedDOMStyle, if they never preserve their wrapper then we'll never trace/traverse it anyway. I added debugging code to nsContentUtils to make sure that if we try to preserve a wrapper then the cycle collector code for its native traverses to and traces the wrapper. In a build with just the debugging code we fail the assertion when we try to preserve the wrapper for a nsContentList, with XPCWrapper::ResolveNativeProperty on the stack. With the rest of the patch applied the assertion doesn't fail and we don't crash.

I'm trying to make a mochitest for this, but it's not done and it'll have to wait till tomorrow.
Attachment #414750 - Attachment is obsolete: true
Attachment #414768 - Flags: review?(jst)
Assignee

Comment 24

10 years ago
I think that this should be blocking, now that we understand what's going on.
Flags: blocking1.9.2- → blocking1.9.2?
Assignee

Updated

10 years ago
Summary: top crash [@ nsCycleCollector::MarkRoots(GCGraphBuilder&)] → top crash [@ nsCycleCollector::MarkRoots(GCGraphBuilder&)][@ nsBaseContentList::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)]
I find it hard to disagree!
Flags: blocking1.9.2? → blocking1.9.2+
Assignee

Updated

10 years ago
Flags: wanted1.9.2?
Group: core-security
Priority: -- → P2
Whiteboard: [crashkill] → [crashkill][sg:critical?]
Target Milestone: --- → mozilla1.9.2
Attachment #414768 - Flags: review?(jst) → review+
Assignee

Comment 27

10 years ago
I've checked this in on trunk and branch, but now there's a bunch of orange. I've found an issue with the patch, so I checked in a fix for that on trunk (http://hg.mozilla.org/mozilla-central/rev/71e77f546ddf). There's still orange, so I've tried backing out from branch, but that didn't resolve the orange on branch either. Not sure what's going on. I'm going to back out from trunk too and try again later.
(In reply to comment #27)
> I've checked this in on trunk and branch, but now there's a bunch of orange.
> I've found an issue with the patch, so I checked in a fix for that on trunk
> (http://hg.mozilla.org/mozilla-central/rev/71e77f546ddf). There's still orange,
> so I've tried backing out from branch, but that didn't resolve the orange on
> branch either. Not sure what's going on. I'm going to back out from trunk too
> and try again later.

The orange was because the test server's cert expired at 9:45AM this morning, and should be fixed now.  See bug 531590.
Assignee

Comment 29

10 years ago
http://hg.mozilla.org/mozilla-central/rev/ed6b195bae6d
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/45101239c881

Thanks to dbaron's swift action this should be in today's nightlies.

We need to take this on the 1.9.1 branch too. 1.9.0 is unaffected (didn't have cached wrappers). I'll hold off on checking in a testcase for now.
Status: ASSIGNED → RESOLVED
blocking1.9.1: --- → ?
blocking2.0: ? → ---
Closed: 10 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Assignee

Comment 30

10 years ago
Posted patch v2Splinter Review
Patch that was checked in.
Attachment #414768 - Attachment is obsolete: true
Attachment #414958 - Flags: review+
Attachment #414958 - Flags: approval1.9.1.7?
Assignee

Comment 31

10 years ago
Comment on attachment 414958 [details] [diff] [review]
v2

This doesn't apply on 1.9.1 (which doesn't have slim wrappers).
Attachment #414958 - Flags: approval1.9.1.7?
Assignee

Comment 32

10 years ago
(In reply to comment #31)
> This doesn't apply on 1.9.1 (which doesn't have slim wrappers).

... and so the 1.9.1 crash must be a completely different crash with the same signature. I think I'll file a separate bug for the 1.9.1 crash (which might also still happen on 1.9.2 and trunk?). Testcase for this bug does cause leaks on 1.9.1, I'll also file a separate bug for that.
blocking1.9.1: ? → ---
Flags: wanted1.9.0.x+ → wanted1.9.0.x-
Doesn't FileReader also preserve wrappers the way that XHR does? I think the intent was for them to work the same.
Assignee

Comment 34

10 years ago
FileReader uses the same DOMCI helper as XHR.
Assignee

Comment 35

10 years ago
(In reply to comment #32)
> I think I'll file a separate bug for the 1.9.1 crash (which might
> also still happen on 1.9.2 and trunk?).

Bug 531961.

> Testcase for this bug does cause leaks
> on 1.9.1, I'll also file a separate bug for that.

Bug 531964.
Crash Signature: [@ nsCycleCollector::MarkRoots(GCGraphBuilder&)] [@ nsBaseContentList::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)]
Group: core-security
Issue is Resolved - removing QA-Wanted Keywords - QA-Wanted query clean-up task
Crash Signature: [@ nsCycleCollector::MarkRoots(GCGraphBuilder&)] [@ nsBaseContentList::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)] → [@ nsCycleCollector::MarkRoots(GCGraphBuilder&)] [@ nsBaseContentList::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)]
You need to log in before you can comment on or make changes to this bug.