Closed Bug 1057312 Opened 5 years ago Closed 5 years ago

AsyncShutdownTimeout "AddonManager: shutting down providers"

Categories

(Toolkit :: Add-ons Manager, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
firefox32 --- unaffected
firefox33 + fixed
firefox34 + fixed
firefox35 + fixed

People

(Reporter: Yoric, Assigned: Irving)

References

()

Details

(Keywords: crash, topcrash-win)

Crash Data

Note: to make this debuggable, I believe that the AddonManager should maintain as state a list of the providers that haven't finished shutting down yet.
David, do you want to file a separate blocking bug about specifically adding state for this? That's pretty simple and something I can get on the next iteration.
I have a TODO to convert AddonManager to use the ShutdownBlocker mechanism, rather than its home-grown version from before ..Blocker existed; a bug filed for that would be good.
As a side-note, now that the blocker bug has been fixed, the dashboard now lets us look at specific crashes without the noise of Experiments.jsm.

http://yoric.github.io/are-we-shutting-down-yet-/?signature=~AddonManager%3A+shutting+down+providers#
[Tracking Requested - why for this release]:
Digging through http://yoric.github.io/are-we-shutting-down-yet-/ tells me:
32 is not affected.
33: This is 94% of 1000 samples (~4091 total crashes over 5 days)
34 and 35 have little data there but see at least some of this.

The 33 data means that this is 5-6% of all our crash volume in 33 Beta.

From bug 1071702 comment #1
Looking at the code, I guess we actually are somewhere in the area of http://mxr.mozilla.org/mozilla-beta/source/toolkit/mozapps/extensions/AddonManager.jsm#745 and that means that bug 1012466, which landed in 33, is somewhat connected.

Georg, anything you can help here as you did that other patch? Or Irving?
Flags: needinfo?(irving)
Flags: needinfo?(georg.fritzsche)
Keywords: crash, topcrash-win
Duplicate of this bug: 1071702
Blocks: 944873, 1071792
I did a quick review of how we're setting up the asynchronous shutdown, and I think we're handling synchronously thrown exceptions OK.

PreviousExperimentsProvider.shutdown() is synchronous only, so I'd rule that out. I can dive into code review of our other providers to see if I can spot anything, but the most likely candidate is still the main Experiments provider - unfortunately the implementation of shutdown barrier recently added to AddonManager doesn't report that level of detail back to AsyncShutdown. We want bug 1059674 to fix that.


Robert, can we correlate these crashes with add-ons, just in case it's not our own fault? I think Greasemonkey registers an add-on provider, and there may be others.
Flags: needinfo?(irving) → needinfo?(kairo)
(In reply to :Irving Reid from comment #7)
> Robert, can we correlate these crashes with add-ons, just in case it's not
> our own fault?

Our correlation scripts say there's no interesting add-on correlation there and module correlations just show normal Windows DLLs so also nothing interesting. Other stats also do not point to anything special, so it looks like it's just our code.

That said, we have an issue there that affects 5-6% of all our users with 33 and higher and we need something done until Thursday of next week, when we build the last beta for this cycle.

Can analysis of the crash minidumps help you at all? Or can we do anything to avoid the misbehavior here?
Flags: needinfo?(kairo)
With Experiments.jsm issues, we're still expecting this to not affect users past beta.
We are looking at things in bug 1064470 there - the log limit was raised, so we should look at the data again.
Flags: needinfo?(georg.fritzsche)
My only idea currently to collect more info would be to get someone experiencing the crash to run the browser from a terminal and get the standard out / standard error messages as the browser shuts down. Whatever is going wrong is likely to be showing up there and in the browser console, but we lose the console contents when AsyncShutdown crashes.

Aside from that, we could land a quick fix for bug 1059674 that would at least give us more specific information about where the failure is, we could do more code review of the providers to see if we spot other failure paths, and/or (most radically) we could turn off the crash-on-timeout behaviour in AsyncShutdown until this is sorted out; that trades shutdown-time crashes for possible unnoticed data corruption in whichever provider is having trouble (as I said, most likely Experiments.jsm).
So, I'm confused, did the experiments shutdown crashes just move to this or is this something different and new?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #11)
> So, I'm confused, did the experiments shutdown crashes just move to this or
> is this something different and new?

We are not entirely sure (bug 1059674 was filed to help us find this out), but we suspect that yes, it's a different symptom of the experiments shutdown crashes.
Crash Signature: [@ mozalloc_abort(char const* const) | NS_DebugBreak | nsDebugImpl::Abort(char const*, int)]
(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #12)
> We are not entirely sure (bug 1059674 was filed to help us find this out)

Hrm, then we'd need that one on beta ASAP to find out if that's true or if we need to do immediate work here. And even if it's true, we really need to get that fixed as it skews our beta data and we don't know what we really need to focus on or if overall volume is bad.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #13)
> (In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment
> #12)
> > We are not entirely sure (bug 1059674 was filed to help us find this out)
> 
> Hrm, then we'd need that one on beta ASAP to find out if that's true or if
> we need to do immediate work here. And even if it's true, we really need to
> get that fixed as it skews our beta data and we don't know what we really
> need to focus on or if overall volume is bad.

I have a patch for bug 105974 up on Try right now; unfortunately it turned out a bit bigger than I would have liked, so it would be a bit risky to uplift aggressively.

I'm reading through the Experiments shutdown code right now to see if I spot anything that might be causing problems.
That is too late for 33. We won't take a chance to break release with a beta-only fix.

However, I agree with Kairo in comment #13 that we really want to see that bug fixed.
Depends on: 1074135
Depends on: 1075625
Bug 1059674 was backed out for mochitest bustage. Assuming that it can be cleaned up, we can give this bug another go on 34 beta.
I should have set ni on Irving in comment 16.

Irving - I have assigned this bug to you as you seem to be the one working on it. Feel free to unassign yourself if that's incorrect.
Assignee: nobody → irving
Flags: needinfo?(irving)
(In reply to Sylvestre Ledru [:sylvestre] from comment #15)
> That is too late for 33. We won't take a chance to break release with a
> beta-only fix.

Bad news. AsyncShutdown is now >7% of all our 33.0 RELEASE crashes according to https://crash-stats.mozilla.com/topcrasher/products/Firefox/versions/33.0?days=7 and 93% of that is the issue here as http://yoric.github.io/are-we-shutting-down-yet-/?version=Firefox+33.0# shows. This looks very bad in crash stats and affects very many users if we roll out 33 to all users as-is.

Can we urgently get something done to not give >7% of our Firefox release session a crash prompt after shutdown?
David, don't hesitate to help if you can here.
Flags: needinfo?(dteller)
Ouch. Reviewing the most recent 20 reports i see for "addon manager shutdown timeout", they are all:
{"phase":"profile-before-change","conditions":[{"name":"AddonManager: shutting down providers","state":"(none)","filename":"resource://gre/modules/AddonManager.jsm","lineNumber":744,"stack":["resource://gre/modules/AddonManager.jsm:AMI_startup:744","resource://gre/modules/AddonManager.jsm:AMP_startup:2322","resource://gre/components/addonManager.js:AMC_observe:55","null:null:0"]}]}

I'm not sure if we can pull more info on this from somewhere?
According to my http://yoric.github.io/are-we-shutting-down-yet-/?signature=~AddonManager%3A+shutting+down+providers#, we do not have crashes post 33. I seem to remember that Irving has landed a patch on 34 to make this happen, so my understanding is that this patch works.
Flags: needinfo?(dteller)
Partial fixes that should remove the shutdown crashes have landed and been uplifted (bug 1059674, bug 1074135, bug 1081702). I'm working on addressing the underlying issues in bug 1075625 and on this bug.
Flags: needinfo?(irving)
Status: NEW → ASSIGNED
I have updated AWSDY to display the range of build dates in which crashes happen, this should help us find out if the uplift was good.
(by "good", I mean "the right uplift", of course)
We haven't encountered any crash since October 11st build, so I conclude that this bug is fixed.
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
David, 33, 34 & 35 too? thanks
From the link in 23 I take it the answer is yes. ni Yoric to confirm.
Firefox 34 and 35 haven't been affected in the past few weeks.
Firefox 33 hasn't been affected since the October 11th build.

So yes, confirming.
Flags: needinfo?(dteller)
You need to log in before you can comment on or make changes to this bug.