Closed Bug 564704 Opened 11 years ago Closed 8 years ago

crash [@ FinalizeObject]

Categories

(Thunderbird :: General, defect)

x86
Windows Vista
defect
Not set
critical

Tracking

(blocking-thunderbird3.1 -)

RESOLVED WORKSFORME
Tracking Status
blocking-thunderbird3.1 --- -

People

(Reporter: wsmwk, Unassigned)

Details

(Keywords: crash, topcrash-)

Crash Data

crash [@ FinalizeObject]
presumably FinalizeObject is the merely a symptom?

#3 crash for 3.1b2. whether this will eventually be a topcrash for v3.1 is anyone's guess. Or it could end up being in top 50 like js_FinalizeObject/Bug 518303

observations - FinalizeObject exists only on trunk (not on v3.0) and so far shows only for windows OS. js_FinalizeObject exists only on v3.0, and shows for all OS js_FinalizeObject is Bug 518303.

I didn't thoroughly check if this is a duplicate of 518303.  (one crash reporter I emailed for 518303 said he on longer see this. And I have a pm from bp-9882e2b0-3017-4690-ab2b-779ed2100420 - he crashed only once.

the earliest crash found is bp-17912b19-32b7-444c-9222-31a4f2100211 3.1b1pre 20100211033153 (gmhyman)
0	js3250.dll	FinalizeObject	 js/src/jsgc.cpp:3197
1	js3250.dll	js_GC	js/src/jsgc.cpp:3622
2	js3250.dll	JS_GC	js/src/jsapi.cpp:2439
3	thunderbird.exe	nsXPConnect::Collect	js/src/xpconnect/src/nsXPConnect.cpp:477
4	xpcom_core.dll	nsCycleCollector::Collect	xpcom/base/nsCycleCollector.cpp:2434
5	xpcom_core.dll	nsCycleCollector_collect	xpcom/base/nsCycleCollector.cpp:3129
6	thunderbird.exe	nsJSContext::CC	dom/base/nsJSEnvironment.cpp:3578
7	thunderbird.exe	nsJSContext::IntervalCC	dom/base/nsJSEnvironment.cpp:3666
8	thunderbird.exe	nsUserActivityObserver::Observe	dom/base/nsJSEnvironment.cpp:268
9	xpcom_core.dll	nsObserverList::NotifyObservers	xpcom/ds/nsObserverList.cpp:130
10	xpcom_core.dll	nsObserverService::NotifyObservers	xpcom/ds/nsObserverService.cpp:182
11	thunderbird.exe	nsUITimerCallback::Notify	content/events/src/nsEventStateManager.cpp:276
12	xpcom_core.dll	nsTimerImpl::Fire	xpcom/threads/nsTimerImpl.cpp:430
13	xpcom_core.dll	nsTimerEvent::Run	xpcom/threads/nsTimerImpl.cpp:519
14	xpcom_core.dll	nsThread::ProcessNextEvent	xpcom/threads/nsThread.cpp:527
15	xpcom_core.dll	NS_ProcessNextEvent_P	objdir-tb/mozilla/xpcom/build/nsThreadUtils.cpp:250
16	thunderbird.exe	nsBaseAppShell::Run	widget/src/xpwidgets/nsBaseAppShell.cpp:170
17	thunderbird.exe	nsAppStartup::Run	toolkit/components/startup/src/nsAppStartup.cpp:182
18	thunderbird.exe	XRE_main	toolkit/xre/nsAppRunner.cpp:3506 


bp-bfec4ac7-487b-456f-b15f-efcd82100324 3.1b1, safe mode (rcolmegna)
who says "yes, I can replicate the crash.  But I haven't detected which mail produce the TB-crash. So I can only replicate it on my PC+IMAP srv." 
0	js3250.dll	FinalizeObject	 js/src/jsgc.cpp:3197
1	js3250.dll	js_GC	js/src/jsgc.cpp:3622
2	js3250.dll	NewGCThing<JSString>	js/src/jsgc.cpp:1858
3	js3250.dll	JS_NewExternalString	js/src/jsapi.cpp:2626
4	thunderbird.exe	XPCStringConvert::ReadableToJSVal	js/src/xpconnect/src/xpcstring.cpp:108
5	thunderbird.exe	XPCConvert::NativeData2JS	js/src/xpconnect/src/xpcconvert.cpp:333
6	thunderbird.exe	XPCConvert::NativeData2JS	js/src/xpconnect/src/xpcprivate.h:2974
7	thunderbird.exe	XPCWrappedNative::CallMethod	js/src/xpconnect/src/xpcwrappednative.cpp:2809
8	thunderbird.exe	XPC_WN_CallMethod	js/src/xpconnect/src/xpcwrappednativejsops.cpp:1740
9	js3250.dll	js_Invoke	js/src/jsinterp.cpp:1360
10	js3250.dll	js_Interpret	js/src/jsops.cpp:2240
11	js3250.dll	js_Invoke	js/src/jsinterp.cpp:1368
12	js3250.dll	js_fun_call	js/src/jsfun.cpp:1955
13	js3250.dll	js_Interpret	js/src/jsops.cpp:2208
14	js3250.dll	js_Invoke	js/src/jsinterp.cpp:1368
15	thunderbird.exe	nsXPCWrappedJSClass::CallMethod	js/src/xpconnect/src/xpcwrappedjsclass.cpp:1696
16	thunderbird.exe	nsXPCWrappedJS::CallMethod	js/src/xpconnect/src/xpcwrappedjs.cpp:570
17	xpcom_core.dll	PrepareAndDispatch	xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:114
18	xpcom_core.dll	SharedStub	xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:141
19	thunderbird.exe	mozilla::storage::`anonymous namespace'::CallbackResultNotifier::Run	storage/src/mozStorageAsyncStatementExecution.cpp:100
20	xpcom_core.dll	nsThread::ProcessNextEvent	xpcom/threads/nsThread.cpp:527 

bp-dc18e1b0-8baf-4043-8e70-0da952100421 (melvin) ~same stack as above

bp-6992c2bc-788b-471d-a5ef-c25cf2100414 3.1b2pre (aaron) 
0	js3250.dll	FinalizeObject	 js/src/jsgc.cpp:3197
1	js3250.dll	js_GC	js/src/jsgc.cpp:3622
2	js3250.dll	JS_GC	js/src/jsapi.cpp:2439
3	thunderbird.exe	nsXPConnect::Collect	js/src/xpconnect/src/nsXPConnect.cpp:478
4	xpcom_core.dll	nsCycleCollector::Collect	xpcom/base/nsCycleCollector.cpp:2434
5	xpcom_core.dll	nsCycleCollector_collect	xpcom/base/nsCycleCollector.cpp:3129
6	thunderbird.exe	nsJSContext::CC	dom/base/nsJSEnvironment.cpp:3613
7	thunderbird.exe	nsJSContext::IntervalCC	dom/base/nsJSEnvironment.cpp:3701
8	thunderbird.exe	nsUserActivityObserver::Observe	dom/base/nsJSEnvironment.cpp:289
9	xpcom_core.dll	nsObserverList::NotifyObservers	xpcom/ds/nsObserverList.cpp:130
10	xpcom_core.dll	nsObserverService::NotifyObservers	xpcom/ds/nsObserverService.cpp:182
this needs diagnosis for v3.1. Unknown whether it should block
blocking-thunderbird3.1: --- → ?
I submitted a crash report that let to the creation of this entry. I only experienced the problem once, and have not experienced it using the latest nightly builds over the past couple of weeks. Then again, I don't know if I'm doing what I did to make it crash in the first place. :)
bp-0d7e73c6-f82d-4dee-bb3b-236172100510
bp-6343171e-1040-4b62-930c-c9c482100510
bp-0784f9f7-e7ae-4e09-aa9f-0c7932100510
bp-0e45b185-2446-4674-9e6b-4102e2100510

These crash reports were all this morning as I was saving attachments. I am not having a crash on every download after installing the latest build. I saved 3 or four attachments before the first crash and then a couple more successful saves before the next one. I am going to reboot and start Thunderbird in safe mode and test it there.
bp-971c65b3-c3bb-4710-82d9-a6d4a2100510

This crash occurred on the second attachment I saved running in Save Mode.
Though I don't have deep knowledge here, it appears to me that bug 518303 is something different. 

Interestingly, the breakpad report for both the first and last full stacks in comment 0 have mail-tweak installed.  I wonder if that could be related...

Melvin, the crashes in comment 3 and comment 4 appear to be something else different from the ones in comment 0 that this bug was filed about.  Would you be willing to file a separate bug for that?

Igor, it was suggested that if anyone had seen stacks like the ones in comment 0 and knew of existing bugs in this area, it would likely be you.  Do they look at all familiar?  Do you have any suggestions on debugging this further, given that all we have are Breakpad reports and not actual steps to reproduce?
Unfortunately as with other stack traces pointing into the GC it is hard to tell what has caused the crash. The stacks from the comment 0 looks like a missing rooting of GC thing. The GC has collected it during one of the previous GC cycles but something still assumes that it is still alive. The thing could be sored in some of xpconnect objects that the cycle collector discovered to be garbage. But even that is not 100% certain.
Probably a core dump might be useful - one would have to trap the crash in gdb using a debug Thunderbird. See bug 546611 comment 7.
(In reply to comment #7)
> Probably a core dump might be useful - one would have to trap the crash in gdb
> using a debug Thunderbird. See bug 546611 comment 7.

Whoops that's locked, but asuth mentions "I used check-interactive with gdb
attached and then used gdb's generate-core-file."

If this only occurs on Windows then I think Visual Studio's debugger or windbg might be the only options.
There are either some potentially long buffer overwrites happening or the heap has been completely trashed at the point of the crash.  Specifically, because the JS objects have their own arenas, for us to see random string data overwriting a JS object, a really long string had to overwrite beyond the end of its buffer or people are just writing randomly into memory.

OBJ_IS_NATIVE dereferences the obj->maps which is the first pointer in the JSObject and given the line of code in question, we can expect that the crash point represents the contents of that first pointer.  So here we go, this is all the non-0, non 0xffffff pointers from the 3.1b2 FinalizeObject crashes as ASCII:

'4.12' '----' 'Dtex' 'by u' 'llow' '0Sq1' '-Id:' 's: a'
'////' 't"> ' 's:Z=' 'ress' '\x80/_/' ' of ' '\x88fla' '\x00o\x00b'
'$R\x00J' ' ha\\' '=0)(' 'AISO' '6.00' '\x00\x00\x00o' '\x90d w' '\x90\x005d'
'\x90ose' '\x84l s' '\x98y: ' '\x84in\x10' '\x8c \x10@' 'ent=' '\x88e. ' 'd"\x00"'
'\x00\x00\x00o' ' SMT' '\x98ddq' 'DOur' '****' 'ook ' '\x00#\x06A' 'sion'
'ecip' 'X7 [' 'xmm+' 'pan>' 'ssag' 'ENIO' 'late' 'cynt'
'\x90t=3' 'lns:' '\x88n=3' 'ttp-'

Using:
http://hg.mozilla.org/users/bugmail_asutherland.org/tb-test-help/file/04e6e929f83d/ptrs_as_ascii.py
against
http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&signature=FinalizeObject&version=Thunderbird:3.1b2


Unfortunately, that pointer value is basically the only useful piece of information in the crash report.  I fired up valgrind with an appropriately configured build and ran for a while providing pretty extensive coverage of normal usage, but it would appear the code is generally solid.  The (unsurprising) implication is that anything bad we're doing is not on the common path.  This suggests static analysis as the easiest path to investigate things further.  We should consider taking advantage of coverity's open source loss leader ( http://scan.coverity.com/devfaq.html ) if we want to more decisively root out this type of problem.
Given a combination of the likely difficulty of fixing and the way our topcrash stats are currently looking, we believe we wouldn't block on this given what we know now.  That said, we could certainly consider it for a subsequent 3.1 release, particularly if we found steps to reproduce.

Jcranmer, would you care to see if the static analysis tools offer any extra insight here?
blocking-thunderbird3.1: ? → -
This is probably the blocker bienvenu fixed that was marked security.  Nothing to look into here.  Sorry for not speaking up about the correlation earlier.  This is probably the case for most of our recent crashers.
(In reply to comment #11)
> This is probably the blocker bienvenu fixed that was marked security. 

bug#?

no longer a topcrash. but also far from gone. 
#71 for v3.1. 
#39 for v3.1.1

bp-43b07412-8506-4a0b-9974-4e0852100723  v3.1.1
0	js3250.dll	FinalizeObject	 js/src/jsgc.cpp:3189
1	js3250.dll	js_GC	js/src/jsgc.cpp:3622
2	js3250.dll	JS_GC	js/src/jsapi.cpp:2439
3	thunderbird.exe	nsXPCComponents_Utils::ForceGC	js/src/xpconnect/src/xpccomponents.cpp:3698
4	xpcom_core.dll	NS_InvokeByIndex_P	xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:102
5	thunderbird.exe	XPCWrappedNative::CallMethod	js/src/xpconnect/src/xpcwrappednative.cpp:2722
6	thunderbird.exe	XPC_WN_CallMethod	js/src/xpconnect/src/xpcwrappednativejsops.cpp:1740
7	js3250.dll	js_Invoke	js/src/jsinterp.cpp:1360
8	js3250.dll	js_Interpret	js/src/jsops.cpp:2240
9	js3250.dll	SendToGenerator	js/src/jsiter.cpp:865
10	js3250.dll	generator_op	js/src/jsiter.cpp:978
11	js3250.dll	generator_next	js/src/jsiter.cpp:993
12	js3250.dll	js_Interpret	js/src/jsops.cpp:2208
13	js3250.dll	js_Invoke	js/src/jsinterp.cpp:1368
14	js3250.dll	js_fun_apply	js/src/jsfun.cpp:2046
15	js3250.dll	js_Interpret	js/src/jsops.cpp:2208
16	js3250.dll	js_Invoke	js/src/jsinterp.cpp:1368
17	thunderbird.exe	nsXPCWrappedJSClass::CallMethod	js/src/xpconnect/src/xpcwrappedjsclass.cpp:1696
18	thunderbird.exe	nsXPCWrappedJS::CallMethod	js/src/xpconnect/src/xpcwrappedjs.cpp:570
Keywords: topcrashtopcrash-
(In reply to comment #12)
> no longer a topcrash. but also far from gone. 
> #71 for v3.1. 
> #39 for v3.1.1

How is being in the top 100 not a topcrash? (And in the top 40, no less.)
(In reply to comment #13)
> (In reply to comment #12)
> > no longer a topcrash. but also far from gone. 
> > #71 for v3.1. 
> > #39 for v3.1.1
> 
> How is being in the top 100 not a topcrash? (And in the top 40, no less.)

For the past 1-2 years we have considered topcrash to be very focused, as in the top 20 or so crashes. The focus derives from these facts: 

a) there a few people triaging crash bugs and crash-stats
b) there are few people looking for crashes to fix
c) # crashes per sig drops significantly when you get out of the top 10-20 signatures

And so in seeking maximum benefit for users, there has been a preference to focus on bugs that have the most crashes, which of course changes with each point release.

ludo and I recently discussed this, and we may be altering the approach soon. And it may get altered further still with the upgrading of the masses of users who are still on v2. Your thoughts on the subject would be helpful. Please post them in mdat and include [tb-qa] in the subject.
certainly not proof this is gone - but there are no FinalizeObject crashes on trunk for last 6 months
Crash Signature: [@ FinalizeObject]
all crashes are v3.1
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
Version: Trunk → 3.1
You need to log in before you can comment on or make changes to this bug.