Closed Bug 815141 Opened 12 years ago Closed 6 years ago

Thunderbird crash in CanonicalizeXPCOMParticipant in cycle collector, via general memory corruption. Users have randomish crash signatures. (Sometimes bad memory.)

Categories

(Thunderbird :: General, defect)

17 Branch
x86
All
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: Usul, Unassigned)

References

(Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [tbird crash][comment 8])

Crash Data

This bug was filed from the Socorro interface and is 
report bp-d095da64-059a-434c-aebf-17af32121121 .
============================================================= 
0 	xul.dll 	CanonicalizeXPCOMParticipant 	xpcom/base/nsCycleCollector.cpp:705
1 	xul.dll 	GCGraphBuilder::NoteXPCOMChild 	xpcom/base/nsCycleCollector.cpp:1869
2 	xul.dll 	XPCWrappedNative::NoteTearoffs 	js/xpconnect/src/XPCWrappedNative.cpp:118
3 	xul.dll 	XPCWrappedNative::cycleCollection::TraverseImpl 	js/xpconnect/src/XPCWrappedNative.cpp:98
4 	xul.dll 	GCGraphBuilder::Traverse 	xpcom/base/nsCycleCollector.cpp:1782
5 	xul.dll 	nsCycleCollector::MarkRoots 	xpcom/base/nsCycleCollector.cpp:2101
6 	xul.dll 	nsCycleCollectorRunner::Run 	xpcom/base/nsCycleCollector.cpp:3112
7 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:624
8 	xul.dll 	NS_ProcessNextEvent_P 	objdir-tb/mozilla/xpcom/build/nsThreadUtils.cpp:220
9 	xul.dll 	nsThread::ThreadFunc 	xpcom/threads/nsThread.cpp:257
10 	nspr4.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:395
11 	ntdll.dll 	_SEH_epilog 	
12 	msvcr100.dll 	_threadstartex 	f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c:292
13 	kernel32.dll 	BaseThreadStart
Can you reproduce this? Is there anything in particular you were doing when this crashed?
Summary: crash in CanonicalizeXPCOMParticipant → [Thunderbird] crash in CanonicalizeXPCOMParticipant
Yeah, those look different. This one is on an XPCWrappedNative. Though it is weird that almost all those crashes on "Ubuntu SMP". I'm not sure I've noticed that before, but maybe it is standard...
Like most cycle collector crashes, the ultimate problem here is almost certainly going to be the underlying object that the CC is examining getting mangled somehow elsewhere.
earliest examples on crash stats is 
bp-66c10b0e-5860-4048-9cbd-a41f32120911 TB17 Build ID 20120909030218
bp-9d9b1653-1813-4b96-bbc8-446592120918 TB17 Build ID 20120911042003

Most crash comments are of the type "was closing thunderbird"
OS: Windows NT → All
Whiteboard: [regression:TB17]
That function was added in bug 750570, which landed in 17, but they would have just shown up as something else before that, like canonicalize, NoteXPCOMRoot, or NoteXPCOMChild.
#19 crash for TB17.  
A high percentage of comments mention doing shutdown.

(In reply to Andrew McCreight [:mccr8] from comment #6)
> That function was added in bug 750570, which landed in 17, but they would
> have just shown up as something else before that, like canonicalize,
> NoteXPCOMRoot, or NoteXPCOMChild.

in TB17 still have the likes of 
GCGraphBuilder::NoteXPCOMChild | XPCWrappedNative::cycleCollection::TraverseImpl 
bp-b02ee916-54b8-44aa-8b89-eb19f2121128
Whiteboard: [regression:TB17] → [tbird topcrash]
Sure, you can crash elsewhere, but I think the function CanonicalizeXPCOMParticipant replaced the function canonicalize, and the latter was much smaller and tended to get inlined, so it wouldn't show up in crash stacks.

Anyways, what you want to do is look at these stacks for the part that looks like:
  PlaceholderTxn::cycleCollection::TraverseImpl
The class |PlaceholderTxn| will likely vary. If you see one class that is very common in these stacks, then odds are what is happening is that objects of that class are garbage for some reason.

If it is a bunch of different classes, then it may just be some kind of general memory corruption problem. But as I said, the only actionable ones here are going to be where you see the same class over and over again.
It's a bit early in TB17.0.2 cycle to say this will stick/continue, but CanonicalizeXPCOMParticipant has dropped significantly to #26 compared to TB17.0 #17

cc: Paenglab because there may be correlation to the multiple window issue Bug 814630
I crashed on Sunday when resuming vista laptop from 3 days of sleep bp-be0fb109-96c2-45f1-8e41-9897e2130121
0	xul.dll	CanonicalizeXPCOMParticipant	xpcom/base/nsCycleCollector.cpp:720
1	xul.dll	GCGraphBuilder::NoteXPCOMChild	xpcom/base/nsCycleCollector.cpp:1959
2	xul.dll	nsINode::nsSlots::Traverse	content/base/src/nsINode.cpp:127
3	xul.dll	nsINode::Traverse	content/base/src/nsINode.cpp:1195
4	xul.dll	mozilla::dom::FragmentOrElement::cycleCollection::TraverseImpl	content/base/src/FragmentOrElement.cpp:1612
5	xul.dll	nsXULElement::cycleCollection::TraverseImpl	content/xul/content/src/nsXULElement.cpp:306
6	xul.dll	GCGraphBuilder::Traverse	xpcom/base/nsCycleCollector.cpp:1874
7	xul.dll	nsCycleCollector::MarkRoots	xpcom/base/nsCycleCollector.cpp:2186
Andrew, when did igc ship?


bp-51ac7ff5-4502-4c5d-a15c-afe5e2130204(In reply to Wayne Mery (:wsmwk) from comment #9)
> It's a bit early in TB17.0.2 cycle to say this will stick/continue, but
> CanonicalizeXPCOMParticipant has dropped significantly to #26 compared to
> TB17.0 #17

It didn't stick. 
#11 on the TB17.0.2 list
 
> cc: Paenglab because there may be correlation to the multiple window issue
> Bug 814630
Flags: needinfo?(continuation)
Incremental GC is bug 641025, which says Firefox 16.  It doesn't seem too likely that that is related to this, though.
Flags: needinfo?(continuation)
Firefox crashed a couple of seconds after I killed plugin-container which was sucking up 100% of CPU (probably some youtube video ignoring HTML5 vid settings in some tab after I reluctantly whitelisted google's stuff temporarily in noscript).

https://crash-stats.mozilla.com/report/index/6eb80a7c-88b2-4415-a237-087242130430

Linked here.

Mentioning since the bug seemed short on ideas.
I've seen this crash with TB 22.0b1 on Linux.
ID: bp-9b6d8c3e-2f46-45a8-8b2c-194f82130531
Signature: CanonicalizeXPCOMParticipant
The crash isn't reproducible.
#67 crash for TB24.0.1 so not topcrash
Whiteboard: [tbird topcrash] → [tbird crash]
[Tracking Requested - why for this release]:
Crash also on FF 36.04
Flags: needinfo?(vseerror)
#2 crash for 38.0b1 - but that will pr9bably move over the next week.

one user reports "I was deleting multiple email addresses, some cases 3-4 repeated addresses, with new beta program" 
bp-59e0df7c-aa03-4d6a-988a-30cb72150404 Crash Address 	0x5a5a5a5a
 0 	xul.dll	CanonicalizeXPCOMParticipant	xpcom/base/nsCycleCollector.cpp
1 	xul.dll	ChildFinder::NoteXPCOMChild(nsISupports*)	xpcom/base/nsCycleCollector.cpp
2 	xul.dll	XPCWrappedNative::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)	js/xpconnect/src/XPCWrappedNative.cpp
3 	xul.dll	RemoveSkippableVisitor::Visit(nsPurpleBuffer&, nsPurpleBufferEntry*)	xpcom/base/nsCycleCollector.cpp
4 	xul.dll	nsPurpleBuffer::Block::VisitEntries<RemoveSkippableVisitor>(nsPurpleBuffer&, RemoveSkippableVisitor&)	xpcom/base/nsCycleCollector.cpp
5 	xul.dll	nsPurpleBuffer::VisitEntries<RemoveSkippableVisitor>(RemoveSkippableVisitor&)	xpcom/base/nsCycleCollector.cpp
6 	xul.dll	nsPurpleBuffer::RemoveSkippable(nsCycleCollector*, bool, bool, void (*)(void))	xpcom/base/nsCycleCollector.cpp
7 	xul.dll	nsCycleCollector::ForgetSkippable(bool, bool)	xpcom/base/nsCycleCollector.cpp
8 	xul.dll	nsCycleCollector_forgetSkippable(bool, bool)	xpcom/base/nsCycleCollector.cpp 

(mkdante reported FF crash as bug 1151352, so clearing tracking request.)
Flags: needinfo?(vseerror)
 The crash occurred when I reviewed the Filters list. I am currently looking as to where I can find the filter protocol as file.   bp-13e6112c-2c01-4e1a-a0c9-4fc4c2150407

 T Bird continually crashes while I am culling my address book ???  bp-87fc4a69-2042-47c1-909c-c6d242150410

 I believe I was deleting a message when crashed... but can't be 100% certain. bp-9a260926-7777-445b-b57f-c83fe2150409
Whiteboard: [tbird crash] → [tbird topcrash]
A regression window request from 2012 doesn't seem very likely to help at this point. Still an active crash from the looks of it, though. Interestingly, there's recent reports on Android w/ Firefox 41, but the recent desktop reports all seem to be in Gecko 38 and under.
At it's current ranking with no testcase this isn't worthy of topcrash nor much investigation - #38 crash for 38.5.0, unlike 38.0beta ranking of #2.  No Mac crashes, only windows and linux.  No crashes for current versions of firefox, except for linux which are rare - about 15 crashes in a week for FF 43.0 eg bp-96827360-17d1-4657-9bb4-77e452160108

Thunderbird users' crashes are 90% one-offs and user has no other crash signatures reported. So I doubt there's much hope of getting something reproducible.  If something reproducible is to be found I think it will mostly come by contacting linux users, which are 2% of crashes but oddly enough are 15% of users who provided their email address. (
Summary: [Thunderbird] crash in CanonicalizeXPCOMParticipant → Thunderbird crash in CanonicalizeXPCOMParticipant cycle collector
Whiteboard: [tbird topcrash] → [tbird crash][comment 8]
Component: XPCOM → General
Product: Core → Thunderbird
Version: 17 Branch → 17
See Also: → 795821
See Also: → 849585, 835798
I just experienced it with tb 51.0b2: bp-9300ad1c-9557-47d4-93cc-c52c62170121
(In reply to Sylvestre Ledru [:sylvestre] from comment #24)
> I just experienced it with tb 51.0b2: bp-9300ad1c-9557-47d4-93cc-c52c62170121

Is that the right crash report? The signature is [@ base.apk@0x226eacc ].
Still present in 52 cycle, unfortunately I have no other useful info to add: https://crash-stats.mozilla.com/report/index/6d8b2fa0-df3f-4efd-835d-9e3db2170225
I crashed 52.4.0 on laptop resume bp-944feb4a-8494-44be-9356-519da0171022
And signature is #33 crash for 52.4.0.

But perhaps good news,  signature is a blip for post 52.x development versions - not even in top 300 in the ranking.
Only 10 crashes in past 3 months:
bp-dc11fcb1-eb60-4b7d-baed-99ccd0170808	55.0b2
bp-09a0b8fe-be76-4fc1-8b04-f0fd60170804	55.0b2
bp-8e17a58f-20b5-4fe9-b403-6e8780170729	55.0b2
bp-8aca1381-9286-4e61-920e-79ca00170728	55.0b2
bp-e101f8ce-3cdb-4c87-9197-0e11a0170727	55.0b2
bp-3ac5c6be-84a8-4a58-8987-3b2970170724	55.0b2
bp-0a19d686-288c-4aad-a976-fe9130170724	55.0b2
bp-ac8df29b-8368-40c0-aba1-3d25d0170907	54.0a2
bp-584e285d-3069-4d56-b704-865550170804	54.0a2
bp-51296078-0558-4b93-991e-cbb3b0170915	53.0b2

(Or perhaps for beta versions we have a different signature for these crashes)
#27 crash for 52.8.0.
Overall, 49% of crashes are win7 - which seems higher than overall population of win7 users 


(In reply to Andrew McCreight [:mccr8] from comment #8)
> ...
> Anyways, what you want to do is look at these stacks for the part that looks
> like:
>   PlaceholderTxn::cycleCollection::TraverseImpl
> The class |PlaceholderTxn| will likely vary. If you see one class that is
> very common in these stacks, then odds are what is happening is that objects
> of that class are garbage for some reason.
> 
> If it is a bunch of different classes, then it may just be some kind of
> general memory corruption problem. But as I said, the only actionable ones
> here are going to be where you see the same class over and over again.

In the last couple years I've had a few users report replacing memory solved their issue
See Also: → 1414382
Summary: Thunderbird crash in CanonicalizeXPCOMParticipant cycle collector → Thunderbird crash in CanonicalizeXPCOMParticipant in cycle collector, via general memory corruption. Users have randomish crash signatures. (Sometimes bad memory.)
Blocks: GCCrashes
See Also: → 1288952, 1216776, 719114, 1255903
In version 60 there is not a single crash with this signature. So perhaps it has morphed, but I'm not sure it's bug 1414382 because that bug doesn't have near the crash rate of this bug.  

I have one person (s collins) say their crashing has stopped in version 60.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.