Closed Bug 1281610 Opened 8 years ago Closed 7 years ago

Crash in arena_dalloc | je_free | js::ctypes::CData::Finalize with McAfee

Categories

(Thunderbird :: General, defect)

x86
Windows 7
defect
Not set
critical

Tracking

(thunderbird_esr52? affected)

RESOLVED INVALID
Tracking Status
thunderbird_esr52 ? affected

People

(Reporter: wsmwk, Unassigned)

References

Details

(Keywords: crash, topcrash-thunderbird, Whiteboard: [antivirus:McAfee])

Crash Data

Attachments

(3 files)

Topcrash for Thunderbird 47.0b1 and 47.0b2 [1]. But like bug 1131180, this correlates to McAfee.
I've notified several users, most of whom have multiple crashes.

This bug was filed from the Socorro interface and is 
report bp-5aab938b-b77e-450c-af5b-0d3862160613.
=============================================================
 0 	mozglue.dll	arena_dalloc	memory/mozjemalloc/jemalloc.c:4729
1 	mozglue.dll	je_free	memory/mozjemalloc/jemalloc.c:6479
2 	xul.dll	js::ctypes::CData::Finalize	js/src/ctypes/CTypes.cpp:7166
3 	xul.dll	js::gc::Arena::finalize<JSObject>(js::FreeOp*, js::gc::AllocKind, unsigned int)	js/src/jsgc.cpp:519
4 	xul.dll	FinalizeTypedArenas<JSObject>	js/src/jsgc.cpp:577
5 	xul.dll	js::gc::ArenaLists::forceFinalizeNow(js::FreeOp*, js::gc::AllocKind, js::gc::ArenaLists::KeepArenasEnum, js::gc::Arena**)	js/src/jsgc.cpp:2920
6 	xul.dll	js::gc::ArenaLists::queueForegroundObjectsForSweep(js::FreeOp*)	js/src/jsgc.cpp:3041
7 	xul.dll	js::gc::GCRuntime::beginSweepingZoneGroup()	js/src/jsgc.cpp:5261
8 	xul.dll	js::gc::GCRuntime::sweepPhase(js::SliceBudget&)	js/src/jsgc.cpp:5535 

[1] 
https://crash-stats.mozilla.com/topcrashers/?product=Thunderbird&_facets_size=50&days=7&process_type=browser&platform=None&version=47.0b2&date_range_type=report
https://crash-stats.mozilla.com/topcrashers/?product=Thunderbird&_facets_size=50&days=7&process_type=browser&platform=None&version=47.0b1&date_range_type=report

----

Very likely a few signatures from this list correlate totally to McAfee this list https://crash-stats.mozilla.com/search/?product=Thunderbird&date=%3E2016-06-01&addons=~msktbird%40mcafee.com&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature . 

I suspect one is https://crash-stats.mozilla.com/signature/?product=Thunderbird&date=%3E2016-06-01&addons=~msktbird%40mcafee.com&signature=shutdownhang%20%7C%20WaitForMultipleObjectsEx%20%7C%20MsgWaitForMultipleObjectsEx%20%7C%20CCliModalLoop%3A%3ABlockFn&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#comments
Typically we just close this McAfee related bug reports as INVALID.

Some data - 1/5 of crashes are from just 9 users. http://archive.mozilla.org/pub/thunderbird/releases/47.0b2/win32/en-US/Thunderbird%20Setup%2047.0b2.exe shows up clean in virus total. Almost no beta users replied to me. Only one wrote, to say removing McAfee solved the crash.  So this almost certainly completely caused by the McAfee's Thunderbird add-on. Which is not out of character with our past experience with McAfee.  Based on crash rate and user uptake, I estimate the affected beta user population at no more than 5%
Some feedback is coming in from the crashing beta users.
Carole "Wayne I installed the release version and enabled the McAfee again and so far it hasn't crashed, if it crashes again I will let you know."
Further indication that 45.x might be OK with McAfee add-on, Katherine (Kathy) reports
"I've just installed the release version, without subtracting any add-ons, and all is fine - so far..... "
This of course does not take into account the patches that are in 47.0b -- but we don't have a 45.2 with those patches to test.
See Also: → 1290689
If addon signing is enabled in Thunderbird as well or if Thunderbird uses the same addon blocklisting methods as Firefox, then this should be fixed (like bug 1131180).
As expected, McAfee has reared it's had in a big way in 52.0 being the #1 crash. See the comments at https://crash-stats.mozilla.com/signature/?product=Thunderbird&version=52.0&signature=arena_dalloc%20%7C%20je_free%20%7C%20js%3A%3Actypes%3A%3ACData%3A%3AFinalize&date=%3E%3D2017-04-02T12%3A28%3A18.000Z&date=%3C2017-04-05T12%3A28%3A18.000Z#comments

Approximately 1k crashes per hour and avg 10 crashes per install https://crash-stats.mozilla.com/topcrashers/?product=Thunderbird&version=52.0&days=1

I don't have time to deal with McAfee devs right now, but we can't have this crash and also have thunderbird updates enabled, so we'll must sort this out in the next couple days, whether 
a) there is a Thunderbird fix or workaround
b) to blocklist them. 

So updates will be disabled in 3.5 hours.  Meanwhile, the time will not be wasted as we sift through other crash and support reports requests.
Blocks: TB52found
Let's blocklist the addon and be done with it
Urgent question...

The majority of crashes obviously require McAfee addon. However, does anything in these crash reports suggest our code is even partly to blame?  Or, is it impossible to know without involving McAfee?

The vast majority are startup crashes (perhaps just from getting mail). And they are ALL during GC. [1]
TB52.0 examples:
bp-1689deae-36dd-467e-86c8-93ff52170104
bp-e75f3e60-c13c-4079-a20f-a6d4f2170104
bp-ea6030cb-cf94-410c-9bcb-dff8c2170104
bp-69852b5d-26b2-4826-8119-41b542170103

I'm going to mention bp-8b893be9-2645-41b6-8f1d-082722170104 because it doesn't involve McAfee but rather nsDragService. Perhaps the example adds some insight to what might be happening in the McAfee examples.

[1] See "Is GC" column of https://crash-stats.mozilla.com/topcrashers/?product=Thunderbird&version=52.0&days=3   Note, this isn't proof that the problem isn't entirely on McAfee.
Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(m_kato)
(sigh, too early in morning)

c/impossible to know without involving McAfee?/* McAfee developers?/
c/isn't proof that the problem isn't entirely on McAfee./isn't proof that the problem isn't entirely on US/Thunderbird./

The crash doesn't happen for version 45.x. So I asked the questions above because I'd prefer to hear from a developer who acutally inspected the crash details that our code is not part of the cause, before proceeding to block the addon. It could be that McAfee simply need to change their code for version 52.x, or (which just occurred to me this morning) rearchitect what they are doing to not be hurt by GC.  McAfee worked on this in the fall but we never came to a solution - which I now suspect may have been because this appears to only happen during GC!

Note, the full name of the addon is "McAfee Anti-Spam Thunderbird Extension" by "McAfee Anti-Spam". Aka Anti-Spam toolbar  http://download.mcafee.com/products/webhelp/4/1033/GUID-38309231-AF64-4C5A-8FE0-9B6832E613C5.html
I suggest blocking the addon. It doesn't matter if the fault is ours or theirs, if it causes crashes for users it should be blocked until the crash is fixed.
Depends on: 1354240
As it appears to be a call through ctypes, I imagine there is nothing we could do to fix that crash.
Flags: needinfo?(mkmelin+mozilla)
An addon (McAfee?) is creating a COM object via CoCreateInstance API.  But COM server (I think that it is created on another process) doesn't response...  So we have no workaround...

Does anyone contact to McAfee for this issue?
Flags: needinfo?(m_kato)
> Does anyone contact to McAfee for this issue?

I have had extensive contact with McAfee developers and support about this last fall. So no worries about getting contacts - they know about it, and the blocklist request pending in bug 1354240
Depends on: 1354912
Attached image exception1.JPG
Divya from McAfee reports the following...

While upgrading TB from 45.6 to latest version, we found a crash on startup. So we tried to debug MSK code by attaching Thunderbird application. While debugging TB, we got a code break in mozglue.dll; I have attached call stack for same and exception message while debugging. [1]

In call stack, call starts from kernel32.dll and ends at mozglue.dll. And in this call stack, I couldn't find any frame from MSK. we will continue our investigation on this but meanwhile I would request you to ask your developers also to look into it. 

[1]
mozglue.dll!arena_dalloc(void * ptr, unsigned int offset) Line 4714
mozglue.dll!je_free(void * ptr) Line 6393
xul.dll!js::ctypes::CData::Finalize(JSFreeOp * fop, JSObject * obj) Line 7653
xul.dll!js::gc::Arena::finalize<JSObject>(js::FreeOp * fop, js::gc::AllocKind thingKind, unsigned int thingSize) Line 457
xul.dll!FinalizeTypedArenas<JSObject>(js::FreeOp * fop, js::gc::Arena * * src, js::gc::SortedArenaList & dest, js::gc::AllocKind thingKind, js::SliceBudget & budget, js::gc::ArenaLists::KeepArenasEnum keepArenas) Line 518
xul.dll!js::gc::ArenaLists::forceFinalizeNow(js::FreeOp * fop, js::gc::AllocKind thingKind, js::gc::ArenaLists::KeepArenasEnum keepArenas, js::gc::Arena * * empty) Line 2732
xul.dll!js::gc::ArenaLists::queueForegroundObjectsForSweep(js::FreeOp * fop) Line 2853
xul.dll!js::gc::GCRuntime::beginSweepingZoneGroup(js::AutoLockForExclusiveAccess & lock) Line 5183
xul.dll!js::gc::GCRuntime::sweepPhase(js::SliceBudget & sliceBudget, js::AutoLockForExclusiveAccess & lock) Line 5450
xul.dll!js::gc::GCRuntime::incrementalCollectSlice(js::SliceBudget & budget, JS::gcreason::Reason reason, js::AutoLockForExclusiveAccess & lock) Line 5965
xul.dll!js::gc::GCRuntime::gcCycle(bool nonincrementalByAPI, js::SliceBudget & budget, JS::gcreason::Reason reason) Line 6249
xul.dll!js::gc::GCRuntime::collect(bool nonincrementalByAPI, js::SliceBudget budget, JS::gcreason::Reason reason) Line 6370
xul.dll!js::gc::GCRuntime::notifyDidPaint() Line 6517
xul.dll!nsXPConnect::NotifyDidPaint() Line 1070
xul.dll!nsRefreshDriver::Tick(__int64 aNowEpoch, mozilla::TimeStamp aNowTime) Line 2050
xul.dll!mozilla::RefreshDriverTimer::TickDriver(nsRefreshDriver * driver, __int64 jsnow, mozilla::TimeStamp now) Line 327
xul.dll!mozilla::RefreshDriverTimer::TickRefreshDrivers(__int64 aJsNow, mozilla::TimeStamp aNow, nsTArray<RefPtr<nsRefreshDriver> > & aDrivers) Line 297
xul.dll!mozilla::RefreshDriverTimer::Tick(__int64 jsnow, mozilla::TimeStamp now) Line 319
xul.dll!mozilla::VsyncRefreshDriverTimer::RunRefreshDrivers(mozilla::TimeStamp aTimeStamp) Line 664
xul.dll!mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver(mozilla::TimeStamp aVsyncTimestamp) Line 585
xul.dll!mozilla::detail::RunnableMethodImpl<void (__thiscall mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::*)(mozilla::TimeStamp),1,0,mozilla::TimeStamp>::Run() Line 813
xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 1216
xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) Line 361
xul.dll!nsXULWindow::ShowModal() Line 408
xul.dll!nsContentTreeOwner::ShowAsModal() Line 509
xul.dll!nsWindowWatcher::OpenWindowInternal(mozIDOMWindowProxy * aParent, const char * aUrl, const char * aName, const char * aFeatures, bool aCalledFromJS, bool aDialog, bool aNavigate, nsIArray * aArgv, bool aIsPopupSpam, bool aForceNoOpener, nsIDocShellLoadInfo * aLoadInfo, mozIDOMWindowProxy * * aResult) Line 1324
xul.dll!nsWindowWatcher::OpenWindow(mozIDOMWindowProxy * aParent, const char * aUrl, const char * aName, const char * aFeatures, nsISupports * aArguments, mozIDOMWindowProxy * * aResult) Line 353
xul.dll!_NS_InvokeByIndex() Line 57
xul.dll!XPCWrappedNative::CallMethod(XPCCallContext & ccx, XPCWrappedNative::CallMode mode) Line 1344
xul.dll!XPC_WN_CallMethod(JSContext * cx, unsigned int argc, JS::Value * vp) Line 999
xul.dll!js::InternalCallOrConstruct(JSContext * cx, const JS::CallArgs & args, js::MaybeConstruct construct) Line 459
xul.dll!InternalCall(JSContext * cx, const js::AnyInvokeArgs & args) Line 504
xul.dll!Interpret(JSContext * cx, js::RunState & state) Line 2922
xul.dll!js::RunScript(JSContext * cx, js::RunState & state) Line 405
xul.dll!js::InternalCallOrConstruct(JSContext * cx, const JS::CallArgs & args, js::MaybeConstruct construct) Line 480
xul.dll!InternalCall(JSContext * cx, const js::AnyInvokeArgs & args) Line 504
xul.dll!js::Call(JSContext * cx, JS::Handle<JS::Value> fval, JS::Handle<JS::Value> thisv, const js::AnyInvokeArgs & args, JS::MutableHandle<JS::Value> rval) Line 523
xul.dll!JS_CallFunctionValue(JSContext * cx, JS::Handle<JSObject *> obj, JS::Handle<JS::Value> fval, const JS::HandleValueArray & args, JS::MutableHandle<JS::Value> rval) Line 2769
xul.dll!nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS * wrapper, unsigned short methodIndex, const XPTMethodDescriptor * info_, nsXPTCMiniVariant * nativeParams) Line 1213
xul.dll!nsXPCWrappedJS::CallMethod(unsigned short methodIndex, const XPTMethodDescriptor * info, nsXPTCMiniVariant * params) Line 613
xul.dll!PrepareAndDispatch(nsXPTCStubBase * self, unsigned int methodIndex, unsigned int * args, unsigned int * stackBytesToPop) Line 85
xul.dll!SharedStub() Line 113
xul.dll!nsPop3Protocol::Error(const char * err_code, const char16_t * * params, unsigned int length) Line 1320
xul.dll!nsPop3Protocol::RetrResponse(nsIInputStream * inputStream, unsigned int length) Line 3223
xul.dll!nsPop3Protocol::ProcessProtocolState(nsIURI * url, nsIInputStream * aInputStream, unsigned __int64 sourceOffset, unsigned int aLength) Line 3934
xul.dll!nsMsgProtocol::OnDataAvailable(nsIRequest * request, nsISupports * ctxt, nsIInputStream * inStr, unsigned __int64 sourceOffset, unsigned int count) Line 297
xul.dll!nsInputStreamPump::OnStateTransfer() Line 603
xul.dll!nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream * stream) Line 430
xul.dll!nsInputStreamReadyEvent::Run() Line 97
xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 1216
xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) Line 361
xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate) Line 96
xul.dll!MessageLoop::RunHandler() Line 226
xul.dll!MessageLoop::Run() Line 206
xul.dll!nsBaseAppShell::Run() Line 158
xul.dll!nsAppShell::Run() Line 264
xul.dll!nsAppStartup::Run() Line 284
xul.dll!XREMain::XRE_mainRun() Line 4488
xul.dll!XREMain::XRE_main(int argc, char * * argv, const nsXREAppData * aAppData) Line 4621
xul.dll!XRE_main(int argc, char * * argv, const nsXREAppData * aAppData, unsigned int aFlags) Line 4712
thunderbird.exe!do_main(int argc, char * * argv, char * * envp, nsIFile * xreDirectory) Line 245
thunderbird.exe!NS_internal_main(int argc, char * * argv, char * * envp) Line 378
thunderbird.exe!wmain(int argc, wchar_t * * argv) Line 118
[External Code]
[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]
Flags: needinfo?(m_kato)
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #13)
> Created attachment 8856502 [details]
> exception1.JPG
> 
> Divya from McAfee reports the following...
> 
> While upgrading TB from 45.6 to latest version, we found a crash on startup.
> So we tried to debug MSK code by attaching Thunderbird application. While
> debugging TB, we got a code break in mozglue.dll; I have attached call stack
> for same and exception message while debugging. [1]
> 
> In call stack, call starts from kernel32.dll and ends at mozglue.dll. And in
> this call stack, I couldn't find any frame from MSK. we will continue our
> investigation on this but meanwhile I would request you to ask your
> developers also to look into it. 

This crash means that js-ctype's data (using external 3rd party DLL) is corrupted, so this crash occurs during GC.  I think that most root causes are the addon that uses js-ctypes.  That binary code that is calling via js-ctypes might corrupt memory that is allocated by js-ctypes.
Flags: needinfo?(m_kato)
Indeed, crash-stats tells us 100% of crashes are during GC (comment 7)
Today McAfee supplied a potential fix in .jar file for testing
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #16)
> Today McAfee supplied a potential fix in .jar file for testing

"Today" was 7 days ago. Since then 4 of 7 users reported favorable results with McAfee's new jar file. But I haven't gotten a response from McAfee as to what their next step is. So I'm forced to make some decisions (below) without insight into their plans.

Since bug 1354912 soft block of the addon, the crash rate has dropped. But it clearly hasn't gotten close to killing it per crashes reported in past week [1] and it is still ranked #1 crash [2]. (and we can't know how much is a result of the block vs users adjusting)

Some fuzzy math .... TB52.0.1 ADI is ~600k for past week (18% of crashes). Let's say 550k are windows. I've seen estimates that McAfee has 6% of AV market, which nets to 33k users of the 450k ADI. Topcrash reports [2] indicates an average of ~1,500 installs crashing. So the math nets to roughly 5% of Thunderbird McAfee installs are crashing.  

For that amount of user pain caused by the startup crash I think we must go to a hard block before totally unleashing TB52 updates - even though it means disabling the addon for an estimated 25k addon users who are not crashing. (And even though it's not clear how much of an impact a hard block of the addon will have on the crash rate)  

Quick thoughts?

[1] https://crash-stats.mozilla.com/signature/?signature=arena_dalloc%20%7C%20je_free%20%7C%20js%3A%3Actypes%3A%3ACData%3A%3AFinalize&date=%3E%3D2017-04-18T11%3A03%3A00.000Z&date=%3C2017-04-25T11%3A03%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports
[2] https://crash-stats.mozilla.com/topcrashers/?product=Thunderbird&version=52.0.1
Flags: needinfo?(unicorn.consulting)
The block in bug 1354240 went into production on April 7. I think on April 6 we throttled updates, which correlates to the huge drop in the graph.  After April 7 I'd say the rate dropped by maybe 50% - how much of that can be credited to the blocklist is unclear.
Depends on: 1359469
Flags: needinfo?(unicorn.consulting)
Crash rank has been steady at #6. Even for 52.1.1 which ships with the hard block.

McAfee have indicated the updated anti-spam add-on will be version 3.0. No ETA.
Despite the hard block of bug 1359469 we still have 80 crashes per day distributed as
Windows 10	475	88.0%
Windows 7	45	8.3%
Windows 8.1	20	3.7%

No dates, but per McAfee on June 1, "New Release of McAfee will be live in mid-June, which will have the fix. But there is still a possibility where, user wont update their products to latest version. That's why, we are also releasing patch in previous version."
Signature is gone. Probably a combination of the blocklist and McAfee shipping an update of the addon to version v3.

But the signature for bug 1290689 still exists, because the version 3 addon still crashes
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: