Closed Bug 1250424 Opened 8 years ago Closed 8 years ago

crash in mozalloc_abort | NS_DebugBreak | nsDebugImpl::Abort (with abort message chrome://browser/content/sanitize.js) and Firefox no longer responding on startup

Categories

(Firefox :: General, defect)

46 Branch
x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Firefox 47
Tracking Status
firefox45 --- unaffected
firefox46 blocking fixed
firefox47 + fixed

People

(Reporter: mak, Assigned: mak)

References

Details

(Keywords: crash, regression)

Crash Data

This will keep tracking work to solve the timeout aborts in Nightly 47 and Aurora 46.

+++ This bug was initially created as a clone of Bug #1248489 +++

[Tracking Requested - why for this release]:

This bug was filed from the Socorro interface and is 
report bp-a464ca53-513a-40a9-9393-7c5462160215.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	mozglue.dll 	mozalloc_abort(char const* const) 	memory/mozalloc/mozalloc_abort.cpp
1 	xul.dll 	NS_DebugBreak 	xpcom/base/nsDebugImpl.cpp
2 	xul.dll 	nsDebugImpl::Abort(char const*, int) 	xpcom/base/nsDebugImpl.cpp
3 	xul.dll 	NS_InvokeByIndex 	xpcom/reflect/xptcall/md/win32/xptcinvoke.cpp
4 	xul.dll 	XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) 	js/xpconnect/src/XPCWrappedNative.cpp
5 	xul.dll 	XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) 	js/xpconnect/src/XPCWrappedNativeJSOps.cpp
6 	xul.dll 	js::Invoke(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp
7 	xul.dll 	Interpret 	js/src/vm/Interpreter.cpp
8 	xul.dll 	js::RunScript(JSContext*, js::RunState&) 	js/src/vm/Interpreter.cpp
9 	xul.dll 	js::Invoke(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp
10 	xul.dll 	js::CallOrConstructBoundFunction(JSContext*, unsigned int, JS::Value*) 	js/src/jsfun.cpp
11 	xul.dll 	js::Invoke(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp
12 	xul.dll 	js::fun_call(JSContext*, unsigned int, JS::Value*) 	js/src/jsfun.cpp
13 	xul.dll 	js::Invoke(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp
14 	xul.dll 	js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>) 	js/src/vm/Interpreter.cpp
15 	xul.dll 	js::DirectProxyHandler::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) 	js/src/proxy/DirectProxyHandler.cpp
16 	xul.dll 	js::CrossCompartmentWrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) 	js/src/proxy/CrossCompartmentWrapper.cpp
17 	xul.dll 	js::Invoke(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp
18 	xul.dll 	Interpret 	js/src/vm/Interpreter.cpp
19 	xul.dll 	js::RunScript(JSContext*, js::RunState&) 	js/src/vm/Interpreter.cpp
20 	xul.dll 	js::Invoke(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp
21 	xul.dll 	js::CallOrConstructBoundFunction(JSContext*, unsigned int, JS::Value*) 	js/src/jsfun.cpp
22 	xul.dll 	js::Invoke(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp
23 	xul.dll 	js::CallOrConstructBoundFunction(JSContext*, unsigned int, JS::Value*) 	js/src/jsfun.cpp
24 	xul.dll 	js::Invoke(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp
25 	xul.dll 	js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>) 	js/src/vm/Interpreter.cpp
26 	xul.dll 	nsXPCComponents_Utils::CallFunctionWithAsyncStack(JS::Handle<JS::Value>, nsIStackFrame*, nsAString_internal const&, JSContext*, JS::MutableHandle<JS::Value>) 	js/xpconnect/src/XPCComponents.cpp
27 	xul.dll 	NS_InvokeByIndex 	xpcom/reflect/xptcall/md/win32/xptcinvoke.cpp
28 	xul.dll 	XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) 	js/xpconnect/src/XPCWrappedNative.cpp
29 	xul.dll 	XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) 	js/xpconnect/src/XPCWrappedNativeJSOps.cpp
30 	xul.dll 	js::Invoke(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp
31 	xul.dll 	Interpret 	js/src/vm/Interpreter.cpp
32 	xul.dll 	js::RunScript(JSContext*, js::RunState&) 	js/src/vm/Interpreter.cpp
33 	xul.dll 	js::Invoke(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp
34 	xul.dll 	JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) 	js/src/jsapi.cpp
35 	xul.dll 	mozilla::dom::AnyCallback::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::MutableHandle<JS::Value>, mozilla::ErrorResult&) 	obj-firefox/dom/bindings/PromiseBinding.cpp
36 	xul.dll 	mozilla::dom::WrapperPromiseCallback::Call(JSContext*, JS::Handle<JS::Value>) 	dom/promise/PromiseCallback.cpp

AbortMessage 	[11276] ###!!! ABORT: file chrome://browser/content/sanitize.js, line 98

AsyncShutdownTimeout 	{"phase":"Places Database shutdown","conditions":[{"name":"sanitize.js: Sanitize","state":{"progress":{"cache":"cleared","cookies":"cleared","history":"ready","formdata":"ready","downloads":"ready","sessions":"ready"}},"filename":"chrome://browser/content/sanitize.js","lineNumber":98,"stack":["chrome://browser/content/sanitize.js:Sanitizer.prototype.sanitize<:98","resource://gre/modules/Task.jsm:TaskImpl_run:314","resource://gre/modules/Task.jsm:TaskImpl:275","resource://gre/modules/Task.jsm:createAsyncFunction/asyncFunction:249","chrome://browser/content/sanitize.js:Sanitizer.onStartup<:832","resource://gre/modules/Task.jsm:TaskImpl_run:314","resource://gre/modules/Task.jsm:TaskImpl:275","resource://gre/modules/Task.jsm:createAsyncFunction/asyncFunction:249","resource://app/components/nsBrowserGlue.js:BG__finalUIStartup:759","resource://app/components/nsBrowserGlue.js:BG_observe:283"]}]}

the signature is #11 top crasher on release & this crash variant (abort in sanitize.js, line 98) makes up 2/3 of those crashes. it is regressing in firefox 44.

multiple user comments describe that they tried to delete the firefox history and firefox was subsequently stuck in a "not responding" mode after a restart constantly:
https://crash-stats.mozilla.com/search/?signature=%3Dmozalloc_abort+|+NS_DebugBreak+|+nsDebugImpl%3A%3AAbort&abort_message=98&user_comments=~history&_facets=signature&_facets=version&_facets=user_comments&_facets=uptime&_facets=platform_pretty_version&_facets=abort_message&_facets=async_shutdown_timeout&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-user_comments

downgrading to 43 can get you out of the freezing condition on startup again apparently according to https://support.mozilla.org/questions/1110002.
Assignee: nobody → mak77
Status: NEW → ASSIGNED
Depends on: 1250362
Tracking, marking this as a blocker for 46 release as well based on the issues in 45 beta 8.
Depends on: 1251469
Depends on: 1252089
mak, any luck here? Are we looking at a possible backout for 46 as we did in 45?
Flags: needinfo?(mak77)
no, I'm waiting to uplift the dependencies I already asked approval for.
Flags: needinfo?(mak77)
Ah, I see. I was caught up for the weekend -- I usually let things have a day on m-c before uplift but I'll approve these now to let you get quicker feedback on aurora. Thanks.
Thanks for landing fixes.

Follow up reports:

Firefox Developer Edition (en-GB) 46.0a2 (2016-02-19) 32 bit.
Could not Restart or Shutdown, see:
https://bugzilla.mozilla.org/show_bug.cgi?id=1244650#c36 [Aurora]

Nightly (x64 for Windows en-GB), was similar to Aurora.
Restarts and Shutdown of Nightly also stopped on 2016-02-19, see:
https://bugzilla.mozilla.org/show_bug.cgi?id=1244650#c37 [Nightly]

1. Nightly

Name           Firefox 64bit for Windows
Version        47.0a1
Build ID       20160224030246
Update Channel nightly
User Agent     Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0
OS             Windows_NT 6.1 x86-64

Could Restart and Shutdown normally.
However bug 976940 was then Reopened
https://bugzilla.mozilla.org/show_bug.cgi?id=976940#c10 and backed out.

Then on 2016-02-26 some bugs were relanded.
Since then my Nightly 47.0a1 [64 bit for Windows] (en-GB), Build ID 20160227030205,
is able to Shutdown and also Restart (with all my Extensions and without changing
my pref.js - i.e. back to 'how things were on 18 Feb 2016').

DJ-Leith
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #5)
> ... get quicker feedback on aurora.

2. Aurora - good news here as well.

Name           Firefox 32 bit for Windows
Version        46.0a2
Build ID       20160302004006	
Update Channel aurora
User Agent     Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0

All the uplifts that I need for Aurora have now landed.
All my Aurora Profiles (I have many) are able to Shutdown and also Restart (with all
my Extensions and without changing my pref.js - i.e. back to 'how things were on 18 Feb 2016').

Note, as I said in
https://bugzilla.mozilla.org/show_bug.cgi?id=1244650#c36
and in
https://bugzilla.mozilla.org/show_bug.cgi?id=1244650#c47

A. I'm not trying to clear Browser History on these Profiles.
B. I'm NOT using e10s, I have NoScript (and other Extensions), on all these Profiles.

So, I may not be a 'typical user'.  I had MANY crashes, I only reported a couple.

Additional point in passing I think I noticed 'slightly slower Firefox Startup'
in the 16-18th Feb 2016.  I speculate that this might have been 'sanitize on Startup'.
I don't have measurements (i.e. evidence).

I'm very happy that you fixed bug 1249786.
I also think bug 1250363 was a good move.

DJ-Leith
(In reply to DJ-Leith from comment #7)
> A. I'm not trying to clear Browser History on these Profiles.
> B. I'm NOT using e10s, I have NoScript (and other Extensions), on all these
> Profiles.

e10s has no implications here, it's all chrome code.
Your case was likely affected by a disabled form history.

Thank you for reporting back.
This is by far no more a top crash in latest builds.

bug 1252089 did its job, this is the most recent shutdown crash we got on 20160302030209 (the first build properly reporting abort metadata):
Sanitize","state":{"progress":{"cache":"cleared","cookies":"blocking","history":"cleared","formdata":"cleared","downloads":"cleared","sessions":"cleared"}}
So we aborted while we were clearing cookies.
Note that cookies in this case include plugins, that seems compatible with bug 1249333 (the only left dependency), for which we can't do much, maybe we could just workaround by setting a 10s timeout on plugins cleanup, but we should try to avoid capping plugins sanitize telemetry at the same timeout.
Depends on: 1253204
Crash Signature: [@ mozalloc_abort | NS_DebugBreak | nsDebugImpl::Abort] → [@ mozalloc_abort | NS_DebugBreak | nsDebugImpl::Abort] [@ mozalloc_abort | NS_DebugBreak | nsDebugImpl::Abort | NS_InvokeByIndex ] [@ mozalloc_abort | NS_DebugBreak | nsDebugImpl::Abort | XPTC__InvokebyIndex ] [@ mozalloc_abort | Abort | NS_DebugBreak…
I'm looking at https://crash-stats.mozilla.com/topcrashers/?product=Firefox&version=46.0a2&days=3  to see the top crashes for the last 3 days in aurora, and signatures similar to these still look high up in the list. (the top 2 crashes for sure)

If I look by reports and sort on buildid, I still see some reports from 20160303004038 on aurora,  but just a couple from 0302.  I have a hard time telling with this data, Kairo can you confirm that the crash rate for these signatures have dropped?  We may need to have a couple more days of data to tell what is going on.
Flags: needinfo?(kairo)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #10)
> I'm looking at
> https://crash-stats.mozilla.com/topcrashers/?product=Firefox&version=46.
> 0a2&days=3  to see the top crashes for the last 3 days in aurora, and
> signatures similar to these still look high up in the list. (the top 2
> crashes for sure)

The first meaningful aurora version should be 0303, cause the last critical dependency (bug 1250363) landed on 02.
For nightly, I could only find 1 report, that should be solved by bug 1253204. I'll now land bug 1252089 in aurora, so we get proper metadata.
https://crash-stats.mozilla.com/search/?product=Firefox&version=46.0a2&process_type=browser&process_type=content&build_id=%3E20160302000000&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature is a search for top crashes in builds since this has landed, and there seem to be some crashes of the mozalloc_abort | NS_DebugBreak | nsDebugImpl::Abort | NS_InvokeByIndex but not in a place high enough up to be concerning any more.
Flags: needinfo?(kairo)
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #13)
> is a search for top
> crashes in builds since this has landed, and there seem to be some crashes
> of the mozalloc_abort | NS_DebugBreak | nsDebugImpl::Abort |
> NS_InvokeByIndex but not in a place high enough up to be concerning any more.

Note that some of those are not this bug, it depends on the AsyncShutdownTimeout annotation. Here we are only handling the ones regarding sanitize... There are other crashes that should be (and I know some are) tracked apart.
Thanks! Should we call this fixed then?
we'll call this fixed once all the dependencies are.
At that point all the current crash stacks pointing to sanitize.js will be handled.

Any other aborts due to other reasons (I see some related to media graphs, telemetry, session restore...) should be filed and handled apart, depending on their asyncShutdownTimeout metadata, cauise they are unrelated with sanitize on shutdown.
No longer depends on: 1249333
Fixed by dependencies.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updating the status flags as it seems all deps are fixed in 46 too.
In 46.0b1 data, the "mozalloc_abort | NS_DebugBreak | nsDebugImpl::Abort | NS_InvokeByIndex" signature is 4.9% of all crashes, and the largest group of AsyncShutdownTimeout annotations on it are still about places sanitize.
does 46.0b1 include bug 1253204 already?
According to http://yoric.github.io/are-we-shutting-down-yet-/?version=Firefox+46.0b1#m about 30% of those crashes are Places sanitize (it's hard to estimate the occurrences in the long tail on that dashboard). Bug 1255745 is almost 60% though and therefore the majority of those shutdown crashes.
(In reply to Marco Bonardo [::mak] from comment #20)
> does 46.0b1 include bug 1253204 already?

No, that's probably why this is still around there.
Once we verify the fix in bug 1253204 (for 46 beta 2) we can mark this as fixed.
Actually.... in theory it should be still marked fixed.  just not "verified"
In the 47.0a2 top crashers, this one is going down, it shows that it's down 5 spots already since 3-10-16.
Those have different reasons and should be filed and handled apart, a single bug for "Sanitize.js: Sanitize on shutdown with state:(none)" would probably be fine. One of the problems, or maybe THE problem, is that we have 2 blockers when we could just do with one (sanitize is already blocking, we should just add to progress an isShutdown property).
I'll file a bug for that. Regardless, those should be by far less frequent than what we fixed here.

By looking at the global async shutdown crashes, bug 1255977 and bug 1255737 are the most frequent aborts atm.
Flags: needinfo?(mak77)
Target Milestone: --- → Firefox 47
Version: unspecified → 46 Branch
You need to log in before you can comment on or make changes to this bug.