Closed Bug 1273475 Opened 4 years ago Closed 4 years ago

Crash in shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | nsNSSActivityState::restrictActivityToCurrentThread

Categories

(Core :: Security: PSM, defect, P1, critical)

defect

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox48 --- wontfix
firefox49 --- fixed
firefox50 --- fixed
firefox51 --- fixed

People

(Reporter: mcmanus, Assigned: keeler)

References

Details

(Keywords: crash, Whiteboard: [psm-assigned])

Crash Data

Attachments

(4 files)

Whiteboard: [psm-backlog]
(Updating summary so that hopefully this bug shows up on crash-stats.)
Crash Signature: [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | nsNSSActivityState::restrictActivityToCurrentThread]
Keywords: crash
Summary: restrictActivityToCurrentThread shutdown hang → Crash in shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | nsNSSActivityState::restrictActivityToCurrentThread
See Also: → 1275161
Duplicate of this bug: 1275161
Crash volume for signature 'shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | nsNSSActivityState::restrictActivityToCurrentThread':
 - nightly (version 50): 60 crashes from 2016-06-06.
 - aurora  (version 49): 579 crashes from 2016-06-07.
 - beta    (version 48): 5339 crashes from 2016-06-06.
 - release (version 47): 0 crashes from 2016-05-31.
 - esr     (version 45): 0 crashes from 2016-04-07.

Crash volume on the last weeks:
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly       6      12       8       7       6       8       5
 - aurora       72      85      86      71      85      78      66
 - beta        907     767     746     699     713     628     555
 - release       0       0       0       0       0       0       0
 - esr           0       0       0       0       0       0       0

Affected platform: Windows
That looks like it needs attention..
Flags: needinfo?(dkeeler)
Well, bad news is we introduced a deadlock. Good news is I'm fairly sure I've identified what it is and that the patch I'm working on fixes it. I'll have that ready tomorrow.
Assignee: nobody → dkeeler
Flags: needinfo?(dkeeler)
Priority: -- → P1
Whiteboard: [psm-backlog] → [psm-assigned]
This fixes two issues:
1. nsNSSShutDownList::evaporateAllNSSResources could deadlock by acquiring
sListLock and then the singleton's mNSSActivityStateLock in
nsNSSActivityState::restrictActivityToCurrentThread.

2. Calling UnloadLoadableRoots before
nsNSSShutDownList::evaporateAllNSSResources could result in removing modules
that were still in use, causing assertion failures and potential crashes.

Review commit: https://reviewboard.mozilla.org/r/69044/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/69044/
Attachment #8777512 - Flags: review?(ttaubert)
Attachment #8777512 - Flags: review?(jjones)
Comment on attachment 8777512 [details]
bug 1273475 - fix deadlock and potential crash when PSM shuts down NSS

https://reviewboard.mozilla.org/r/69044/#review66200

I guess this would uplift into Aurora. Since it won't get a full cycle in Nightly, perhaps this should use MOZ_RELEASE_ASSERT so we collect destructor-wrong-thread crashes in the Beta cycle too, instead of hanging. I'd love to get Tim's feedback on that.

Otherwise, r+, and please carry the r+ through if we're moving to RELEASE_ASSERT.
Attachment #8777512 - Flags: review?(jjones) → review+
Patrick - is the 5000 figure for 48 up there in Comment 3 high? Like, should we expect to need to uplift this all the way to firefox48?
Flags: needinfo?(mcmanus)
(In reply to J.C. Jones [:jcj] from comment #8)
> Patrick - is the 5000 figure for 48 up there in Comment 3 high? Like, should
> we expect to need to uplift this all the way to firefox48?

its certainly possible as a ride along.. sylvestre would be the person I would ask.
Flags: needinfo?(mcmanus) → needinfo?(sledru)
Yes, probably. Could you fill the uplift request to Aurora, beta and release?  Thanks
Flags: needinfo?(sledru)
Attachment #8777512 - Flags: review?(ttaubert) → review+
Comment on attachment 8777512 [details]
bug 1273475 - fix deadlock and potential crash when PSM shuts down NSS

https://reviewboard.mozilla.org/r/69044/#review66608

LGTM
(In reply to J.C. Jones [:jcj] from comment #7)
> I guess this would uplift into Aurora. Since it won't get a full cycle in
> Nightly, perhaps this should use MOZ_RELEASE_ASSERT so we collect
> destructor-wrong-thread crashes in the Beta cycle too, instead of hanging.
> I'd love to get Tim's feedback on that.

Yeah, that might indeed be a good idea to find misbehaving code earlier.
The volume is not very big on release. As it is a shutdown issue, I won't take that as a ride along.
OK, having trouble getting my follow-on patch to add-in to Keeler's MozReview (it's opening a whole new review w/ his patch included), so let's get the first patch landed and I'll add the follow-on after.
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/990954b29382
fix deadlock and potential crash when PSM shuts down NSS. r=ttaubert, r=jcj
Keywords: checkin-needed
Comment on attachment 8778981 [details]
bug 1273475 - use release assertions for PSM->NSS shutdown

https://reviewboard.mozilla.org/r/70054/#review67618
Attachment #8778981 - Flags: review?(ttaubert) → review+
Marking checkin-needed for the release assertions patch. Leaving it open, still.

The try run is here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=028ec8bac2c4
Keywords: checkin-needed
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f7e33d3604f5
use release assertions for PSM->NSS shutdown. r=ttaubert
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Comment on attachment 8777512 [details]
bug 1273475 - fix deadlock and potential crash when PSM shuts down NSS

Approval Request Comment
[Feature/regressing bug #]: bug 1235634 appears to have made this worse, although I think this issue has been present for a long time
[User impact if declined]: small potential for a shutdown crash and/or hang
[Describe test coverage new/current, TreeHerder]: has a test
[Risks and why]: low - the worst that will happen is we traded one kind of crash for another kind of crash
[String/UUID change made/needed]: none
Attachment #8777512 - Flags: approval-mozilla-beta?
Attachment #8777512 - Flags: approval-mozilla-aurora?
I don't see any crashes with this signature for the last week on beta. I checked back to 2016-08-01 and still don't see anything (on any channel). 

David, do you still think we need to uplift this? Maybe some other uplift shifted the crash signature. I do remember this being a high volume signature a while back.
Flags: needinfo?(dkeeler)
Yes, looks like there are a lot of shutdown hangs where nsNSSActivityState::restrictActivityToCurrentThread is on the stack: https://crash-stats.mozilla.com/search/?product=Firefox&proto_signature=~nsNSSActivityState%3A%3ArestrictActivityToCurrentThread&_sort=-date&_facets=signature&_facets=version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature.
Crash Signature: [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | nsNSSActivityState::restrictActivityToCurrentThread] → [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | nsNSSActivityState::restrictActivityToCurrentThread] [@ shutdownhang | mozilla::CondVar::Wait | nsNSSActivityState::restrictActivityToCurrentThread]…
Comment on attachment 8777512 [details]
bug 1273475 - fix deadlock and potential crash when PSM shuts down NSS

Let's uplift this, as we are still getting a few hundred crashes per beta release per week. It should land for beta 8. 
Thanks David and Marco!
Attachment #8777512 - Flags: approval-mozilla-beta?
Attachment #8777512 - Flags: approval-mozilla-beta+
Attachment #8777512 - Flags: approval-mozilla-aurora?
Attachment #8777512 - Flags: approval-mozilla-aurora+
Keywords: checkin-needed
Severity: normal → critical
Target Milestone: --- → mozilla51
This hit conflicts for me uplifting it to aurora, so it likely doesn't apply to beta. Could we get rebased patches for this?
Flags: needinfo?(dkeeler)
I applied the following:

https://hg.mozilla.org/mozilla-central/rev/990954b29382
https://hg.mozilla.org/mozilla-central/rev/f7e33d3604f5

on top of https://hg.mozilla.org/releases/mozilla-aurora/rev/285149724c55 without conflicts. Is that the right base? What were the conflicts you encountered?

In terms of beta, there definitely are conflicts. I had forgotten that bug 1284946 hasn't gotten as far as beta yet. I'll investigate making a branch patch for that (and if it will even be a good idea to do so).
Flags: needinfo?(dkeeler) → needinfo?(wkocher)
Ok - the beta conflicts were nowhere near as considerable as I thought they were going to be. I think we're fine to move forward with uplifting this to beta.
Keywords: checkin-needed
Flags: needinfo?(wkocher)
Duplicate of this bug: 615328
You need to log in before you can comment on or make changes to this bug.