Closed Bug 1375709 Opened 7 years ago Closed 7 years ago

yet another PSM/NSS shutdown deadlock

Categories

(Core :: Security: PSM, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox56 --- fixed

People

(Reporter: keeler, Assigned: keeler)

References

Details

(Whiteboard: [psm-assigned])

Attachments

(1 file)

Consider the following:

* The main thread is in nsNSSComponent::ShutdownNSS and calls nsNSSShutDownList::evaporateAllNSSResourcesAndShutDown(), which:
  a). restricts activity to the main thread and
  b). then attempts to (re-)acquire sListLock

* Meanwhile, another thread creates an nsNSSShutDownPreventionLock, which:
  c). runs nsNSSShutDownList::enterActivityState(), which acquires sListLock, and
  d). calls singleton->mActivityState.enter(), which will block if there's a thread activity restriction

If the order of execution is a, c, d, b, (or b then d) we have a deadlock.
Looks like bug 1273475 wasn't a sufficient fix.
Blocks: 1273475
Comment on attachment 8880917 [details]
bug 1375709 - avoid deadlock when shutting down NSS

https://reviewboard.mozilla.org/r/152284/#review159116

::: security/manager/ssl/nsNSSShutDown.cpp:190
(Diff revision 1)
> +  {
> +    StaticMutexAutoLock lock(sListLock);
> +    delete singleton; // ~nsNSSShutDownList sets singleton to nullptr
> +  }

Between here and the `while (true)` part above, another thread could create a new `nsNSSShutDownObject` and thus add it to `mObjects` before we acquire `sListLock`. In that case we would leak the object, right?

We could guard against that by not only checking `sInShutdown` in `nsNSSShutDownList::construct()` before creating the singleton, but by just returning `nullptr` at the top if `sInShutdown == true`. Like:

```
bool nsNSSShutDownList::construct(const StaticMutexAutoLock& /*proofOfLock*/)
{
  if (sInShutdown) {
    return nullptr;
  }

  if (!singleton && XRE_IsParentProcess()) {
    singleton = new nsNSSShutDownList();
  }

  return !!singleton;
}
```

Wouldn't that also fix the deadlock you're describing here? It would turn `nsNSSShutDownList::enterActivityState()` into a no-op after `nullptr` was returned instead of the singleton.
Attachment #8880917 - Flags: review?(ttaubert) → review-
Comment on attachment 8880917 [details]
bug 1375709 - avoid deadlock when shutting down NSS

https://reviewboard.mozilla.org/r/152284/#review159974

Apologies for the long delay - I've been busy and paging this stuff back into my brain has taken a while.

Anyways, I think what ttaubert suggested makes sense - although AFAICT just doing that isn't sufficient to protect both hash tables.
So I guess my suggestion is to make the construct() change in addition to what you already have, and see if that works.
Attachment #8880917 - Flags: review?(cykesiopka.bmo)
Thanks for the reviews. Unfortunately I found another issue in this shutdown machinery that I don't think will be fixed by this (and it's not even clear how to fix it), so I'm thinking it might be time to redesign this mechanism rather than just spot-fixing it.
Well, good news/bad news: I figured out how to reliably reproduce this, so I'm fairly confident the approach in comment 4 works (the bad news is that I figured out how to reliably reproduce this, so users are probably encountering this frustratingly often).

(In reply to :Cykesiopka from comment #5)
...
> Anyways, I think what ttaubert suggested makes sense - although AFAICT just
> doing that isn't sufficient to protect both hash tables.
> So I guess my suggestion is to make the construct() change in addition to
> what you already have, and see if that works.

From what I can tell, with the new patch (coming shortly), all operations on the tables will be protected by sListLock, although please do double-check my reasoning.
Found an issue with the initial implementation of the new approach: if acquiring an nsNSSShutDownPreventionLock turns into a no-op during shutdown, when that lock is destructed, it attempts to leave the NSS activity state. Since it never entered it, the activity counter is decremented when it shouldn't be, and NSS can be shut down while other threads are still using it. This was a fairly easy fix - just have nsNSSShutDownPreventionLock keep track of if it successfully entered the activity state. If y'all could have another look, that would be great!
Flags: needinfo?(ttaubert)
Flags: needinfo?(cykesiopka.bmo)
Comment on attachment 8880917 [details]
bug 1375709 - avoid deadlock when shutting down NSS

https://reviewboard.mozilla.org/r/152284/#review162220

LGTM, and good catch with the activity state counter. I'm only confused by the removal of the `StateMutexAutoUnlock`.

::: security/manager/ssl/nsNSSShutDown.cpp
(Diff revision 3)
> -    {
> -      StaticMutexAutoUnlock unlock(sListLock);
> -      entry->obj->shutdown(nsNSSShutDownObject::ShutdownCalledFrom::List);
> +    entry->obj->shutdown(nsNSSShutDownObject::ShutdownCalledFrom::List);
> -    }
>      iter.Remove();

Is this change intentional? I *think* we can do this because with `ShutdownCalledFrom::List` we always end up calling `virtualDestroyNSSReference()`... Except if we deadlock if some code ends up somehow trying to acquire `sListLock`? Not sure what the expectations here exactly are. Also the whole comment above the block seems obsolete, no code should be able to change `singleton->mObjects` if they can't get `sListLock`, because we never release it while looping, right?
Flags: needinfo?(ttaubert)
Comment on attachment 8880917 [details]
bug 1375709 - avoid deadlock when shutting down NSS

https://reviewboard.mozilla.org/r/152284/#review162220

> Is this change intentional? I *think* we can do this because with `ShutdownCalledFrom::List` we always end up calling `virtualDestroyNSSReference()`... Except if we deadlock if some code ends up somehow trying to acquire `sListLock`? Not sure what the expectations here exactly are. Also the whole comment above the block seems obsolete, no code should be able to change `singleton->mObjects` if they can't get `sListLock`, because we never release it while looping, right?

Yes - that was intentional. `virtualDestroyNSSReference()` should only release NSS resources and never cause new nsNSSShutDownObjects to be created. I double-checked this against the current codebase. Unfortunately we don't really have a way of enforcing this going forward, but if we successfully migrate to not shutting down NSS at all, this whole issue becomes moot.
I forgot to update the comment, though, and in any case since we're not concurrently modifying the list we can do the simple, obvious thing and just iterate once.
Comment on attachment 8880917 [details]
bug 1375709 - avoid deadlock when shutting down NSS

https://reviewboard.mozilla.org/r/152284/#review162406

::: security/manager/ssl/nsNSSShutDown.cpp:172
(Diff revisions 3 - 4)
>        return NS_ERROR_FAILURE;
>      }
>    }
>  
>    MOZ_LOG(gPIPNSSLog, LogLevel::Debug, ("now evaporating NSS resources"));
> -
> +  for (auto iter = singleton->mObjects.Iter(); !iter.Done(); iter.Next()) {

Is it possible that in the block above where we unlock `sListLock`, someone calls `ShutdownNSS()` and thus might delete `singleton`?

I don't think so because we require `evaporateAllNSSResourcesAndShutDown()` to be called on the main thread, just wanted to double-check and make sure you thought about this :)
Attachment #8880917 - Flags: review?(ttaubert) → review+
Comment on attachment 8880917 [details]
bug 1375709 - avoid deadlock when shutting down NSS

https://reviewboard.mozilla.org/r/152284/#review162406

> Is it possible that in the block above where we unlock `sListLock`, someone calls `ShutdownNSS()` and thus might delete `singleton`?
> 
> I don't think so because we require `evaporateAllNSSResourcesAndShutDown()` to be called on the main thread, just wanted to double-check and make sure you thought about this :)

Well, it's safe right now because of the main thread enforcement and again no `virtualDestroyNSSReference()` implementation can re-enter `ShutdownNSS()` due to just calling NSS resource deallocation functions, but I could probably add a check to `sInShutdown` near the beginning to prevent that.
(In reply to David Keeler [:keeler] (use needinfo?) from comment #15)
> > Is it possible that in the block above where we unlock `sListLock`, someone calls `ShutdownNSS()` and thus might delete `singleton`?
> > 
> > I don't think so because we require `evaporateAllNSSResourcesAndShutDown()` to be called on the main thread, just wanted to double-check and make sure you thought about this :)
> 
> Well, it's safe right now because of the main thread enforcement and again
> no `virtualDestroyNSSReference()` implementation can re-enter
> `ShutdownNSS()` due to just calling NSS resource deallocation functions, but
> I could probably add a check to `sInShutdown` near the beginning to prevent
> that.

Ahh, right. Nevermind then and thanks for reiterating :)
Comment on attachment 8880917 [details]
bug 1375709 - avoid deadlock when shutting down NSS

https://reviewboard.mozilla.org/r/152284/#review162590

Yup, seems reasonable to me.
Nice work - I imagine this was a pain to get right.
Attachment #8880917 - Flags: review?(cykesiopka.bmo) → review+
Flags: needinfo?(cykesiopka.bmo)
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0bb5555fa027
avoid deadlock when shutting down NSS r=Cykesiopka,ttaubert
https://hg.mozilla.org/mozilla-central/rev/0bb5555fa027
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Comment on attachment 8880917 [details]
bug 1375709 - avoid deadlock when shutting down NSS

Approval Request Comment
[Feature/Bug causing the regression]: probably bug 1235634, but also long-standing issue in this area of code
[User impact if declined]: shutdown hangs
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes - crash-stats seem to indicate this has fixed a frequent shutdown hang
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: low to medium
[Why is the change risky/not risky?]: this is in a complicated and hard-to-reason-about area of code, but this is a fairly self-contained fix, and it has baked a bit on Nightly with no apparent issues
[String changes made/needed]: none

I realize this is a bit late in the cycle, but I think this will address the shutdown hangs in bug 1375726.
Attachment #8880917 - Flags: approval-mozilla-beta?
Comment on attachment 8880917 [details]
bug 1375709 - avoid deadlock when shutting down NSS

With a single beta build before release candidates I think I'll sit this one out for 55.
Attachment #8880917 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: