Closed
Bug 1375709
Opened 7 years ago
Closed 7 years ago
yet another PSM/NSS shutdown deadlock
Categories
(Core :: Security: PSM, defect, P1)
Core
Security: PSM
Tracking
()
RESOLVED
FIXED
mozilla56
Tracking | Status | |
---|---|---|
firefox56 | --- | fixed |
People
(Reporter: keeler, Assigned: keeler)
References
Details
(Whiteboard: [psm-assigned])
Attachments
(1 file)
59 bytes,
text/x-review-board-request
|
ttaubert
:
review+
Cykesiopka
:
review+
jcristau
:
approval-mozilla-beta-
|
Details |
Consider the following: * The main thread is in nsNSSComponent::ShutdownNSS and calls nsNSSShutDownList::evaporateAllNSSResourcesAndShutDown(), which: a). restricts activity to the main thread and b). then attempts to (re-)acquire sListLock * Meanwhile, another thread creates an nsNSSShutDownPreventionLock, which: c). runs nsNSSShutDownList::enterActivityState(), which acquires sListLock, and d). calls singleton->mActivityState.enter(), which will block if there's a thread activity restriction If the order of execution is a, c, d, b, (or b then d) we have a deadlock.
Comment hidden (mozreview-request) |
Comment 4•7 years ago
|
||
mozreview-review |
Comment on attachment 8880917 [details] bug 1375709 - avoid deadlock when shutting down NSS https://reviewboard.mozilla.org/r/152284/#review159116 ::: security/manager/ssl/nsNSSShutDown.cpp:190 (Diff revision 1) > + { > + StaticMutexAutoLock lock(sListLock); > + delete singleton; // ~nsNSSShutDownList sets singleton to nullptr > + } Between here and the `while (true)` part above, another thread could create a new `nsNSSShutDownObject` and thus add it to `mObjects` before we acquire `sListLock`. In that case we would leak the object, right? We could guard against that by not only checking `sInShutdown` in `nsNSSShutDownList::construct()` before creating the singleton, but by just returning `nullptr` at the top if `sInShutdown == true`. Like: ``` bool nsNSSShutDownList::construct(const StaticMutexAutoLock& /*proofOfLock*/) { if (sInShutdown) { return nullptr; } if (!singleton && XRE_IsParentProcess()) { singleton = new nsNSSShutDownList(); } return !!singleton; } ``` Wouldn't that also fix the deadlock you're describing here? It would turn `nsNSSShutDownList::enterActivityState()` into a no-op after `nullptr` was returned instead of the singleton.
Attachment #8880917 -
Flags: review?(ttaubert) → review-
Comment 5•7 years ago
|
||
mozreview-review |
Comment on attachment 8880917 [details] bug 1375709 - avoid deadlock when shutting down NSS https://reviewboard.mozilla.org/r/152284/#review159974 Apologies for the long delay - I've been busy and paging this stuff back into my brain has taken a while. Anyways, I think what ttaubert suggested makes sense - although AFAICT just doing that isn't sufficient to protect both hash tables. So I guess my suggestion is to make the construct() change in addition to what you already have, and see if that works.
Attachment #8880917 -
Flags: review?(cykesiopka.bmo)
Assignee | ||
Comment 6•7 years ago
|
||
Thanks for the reviews. Unfortunately I found another issue in this shutdown machinery that I don't think will be fixed by this (and it's not even clear how to fix it), so I'm thinking it might be time to redesign this mechanism rather than just spot-fixing it.
Assignee | ||
Comment 7•7 years ago
|
||
Well, good news/bad news: I figured out how to reliably reproduce this, so I'm fairly confident the approach in comment 4 works (the bad news is that I figured out how to reliably reproduce this, so users are probably encountering this frustratingly often). (In reply to :Cykesiopka from comment #5) ... > Anyways, I think what ttaubert suggested makes sense - although AFAICT just > doing that isn't sufficient to protect both hash tables. > So I guess my suggestion is to make the construct() change in addition to > what you already have, and see if that works. From what I can tell, with the new patch (coming shortly), all operations on the tables will be protected by sListLock, although please do double-check my reasoning.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 10•7 years ago
|
||
Found an issue with the initial implementation of the new approach: if acquiring an nsNSSShutDownPreventionLock turns into a no-op during shutdown, when that lock is destructed, it attempts to leave the NSS activity state. Since it never entered it, the activity counter is decremented when it shouldn't be, and NSS can be shut down while other threads are still using it. This was a fairly easy fix - just have nsNSSShutDownPreventionLock keep track of if it successfully entered the activity state. If y'all could have another look, that would be great!
Flags: needinfo?(ttaubert)
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(cykesiopka.bmo)
Comment 11•7 years ago
|
||
mozreview-review |
Comment on attachment 8880917 [details] bug 1375709 - avoid deadlock when shutting down NSS https://reviewboard.mozilla.org/r/152284/#review162220 LGTM, and good catch with the activity state counter. I'm only confused by the removal of the `StateMutexAutoUnlock`. ::: security/manager/ssl/nsNSSShutDown.cpp (Diff revision 3) > - { > - StaticMutexAutoUnlock unlock(sListLock); > - entry->obj->shutdown(nsNSSShutDownObject::ShutdownCalledFrom::List); > + entry->obj->shutdown(nsNSSShutDownObject::ShutdownCalledFrom::List); > - } > iter.Remove(); Is this change intentional? I *think* we can do this because with `ShutdownCalledFrom::List` we always end up calling `virtualDestroyNSSReference()`... Except if we deadlock if some code ends up somehow trying to acquire `sListLock`? Not sure what the expectations here exactly are. Also the whole comment above the block seems obsolete, no code should be able to change `singleton->mObjects` if they can't get `sListLock`, because we never release it while looping, right?
Updated•7 years ago
|
Flags: needinfo?(ttaubert)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 13•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8880917 [details] bug 1375709 - avoid deadlock when shutting down NSS https://reviewboard.mozilla.org/r/152284/#review162220 > Is this change intentional? I *think* we can do this because with `ShutdownCalledFrom::List` we always end up calling `virtualDestroyNSSReference()`... Except if we deadlock if some code ends up somehow trying to acquire `sListLock`? Not sure what the expectations here exactly are. Also the whole comment above the block seems obsolete, no code should be able to change `singleton->mObjects` if they can't get `sListLock`, because we never release it while looping, right? Yes - that was intentional. `virtualDestroyNSSReference()` should only release NSS resources and never cause new nsNSSShutDownObjects to be created. I double-checked this against the current codebase. Unfortunately we don't really have a way of enforcing this going forward, but if we successfully migrate to not shutting down NSS at all, this whole issue becomes moot. I forgot to update the comment, though, and in any case since we're not concurrently modifying the list we can do the simple, obvious thing and just iterate once.
Comment 14•7 years ago
|
||
mozreview-review |
Comment on attachment 8880917 [details] bug 1375709 - avoid deadlock when shutting down NSS https://reviewboard.mozilla.org/r/152284/#review162406 ::: security/manager/ssl/nsNSSShutDown.cpp:172 (Diff revisions 3 - 4) > return NS_ERROR_FAILURE; > } > } > > MOZ_LOG(gPIPNSSLog, LogLevel::Debug, ("now evaporating NSS resources")); > - > + for (auto iter = singleton->mObjects.Iter(); !iter.Done(); iter.Next()) { Is it possible that in the block above where we unlock `sListLock`, someone calls `ShutdownNSS()` and thus might delete `singleton`? I don't think so because we require `evaporateAllNSSResourcesAndShutDown()` to be called on the main thread, just wanted to double-check and make sure you thought about this :)
Attachment #8880917 -
Flags: review?(ttaubert) → review+
Assignee | ||
Comment 15•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8880917 [details] bug 1375709 - avoid deadlock when shutting down NSS https://reviewboard.mozilla.org/r/152284/#review162406 > Is it possible that in the block above where we unlock `sListLock`, someone calls `ShutdownNSS()` and thus might delete `singleton`? > > I don't think so because we require `evaporateAllNSSResourcesAndShutDown()` to be called on the main thread, just wanted to double-check and make sure you thought about this :) Well, it's safe right now because of the main thread enforcement and again no `virtualDestroyNSSReference()` implementation can re-enter `ShutdownNSS()` due to just calling NSS resource deallocation functions, but I could probably add a check to `sInShutdown` near the beginning to prevent that.
Comment 16•7 years ago
|
||
(In reply to David Keeler [:keeler] (use needinfo?) from comment #15) > > Is it possible that in the block above where we unlock `sListLock`, someone calls `ShutdownNSS()` and thus might delete `singleton`? > > > > I don't think so because we require `evaporateAllNSSResourcesAndShutDown()` to be called on the main thread, just wanted to double-check and make sure you thought about this :) > > Well, it's safe right now because of the main thread enforcement and again > no `virtualDestroyNSSReference()` implementation can re-enter > `ShutdownNSS()` due to just calling NSS resource deallocation functions, but > I could probably add a check to `sInShutdown` near the beginning to prevent > that. Ahh, right. Nevermind then and thanks for reiterating :)
Comment 17•7 years ago
|
||
mozreview-review |
Comment on attachment 8880917 [details] bug 1375709 - avoid deadlock when shutting down NSS https://reviewboard.mozilla.org/r/152284/#review162590 Yup, seems reasonable to me. Nice work - I imagine this was a pain to get right.
Attachment #8880917 -
Flags: review?(cykesiopka.bmo) → review+
Updated•7 years ago
|
Flags: needinfo?(cykesiopka.bmo)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 19•7 years ago
|
||
Thanks! Here's try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa0788ff5687
Comment 20•7 years ago
|
||
Pushed by dkeeler@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0bb5555fa027 avoid deadlock when shutting down NSS r=Cykesiopka,ttaubert
Comment 21•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/0bb5555fa027
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox56:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Assignee | ||
Comment 22•7 years ago
|
||
Comment on attachment 8880917 [details] bug 1375709 - avoid deadlock when shutting down NSS Approval Request Comment [Feature/Bug causing the regression]: probably bug 1235634, but also long-standing issue in this area of code [User impact if declined]: shutdown hangs [Is this code covered by automated tests?]: yes [Has the fix been verified in Nightly?]: yes - crash-stats seem to indicate this has fixed a frequent shutdown hang [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: low to medium [Why is the change risky/not risky?]: this is in a complicated and hard-to-reason-about area of code, but this is a fairly self-contained fix, and it has baked a bit on Nightly with no apparent issues [String changes made/needed]: none I realize this is a bit late in the cycle, but I think this will address the shutdown hangs in bug 1375726.
Attachment #8880917 -
Flags: approval-mozilla-beta?
Comment 23•7 years ago
|
||
Comment on attachment 8880917 [details] bug 1375709 - avoid deadlock when shutting down NSS With a single beta build before release candidates I think I'll sit this one out for 55.
Attachment #8880917 -
Flags: approval-mozilla-beta? → approval-mozilla-beta-
You need to log in
before you can comment on or make changes to this bug.
Description
•