Open Bug 706955 Opened 13 years ago Updated 2 months ago

Cleanup cert verification thread pool initiation and shutdown after bug 674147 landing

Categories

(Core :: Security: PSM, defect, P5)

defect

Tracking

()

People

(Reporter: mayhemer, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [psm-cleanup])

Attachments

(2 files, 2 obsolete files)

Bug 674147 introduces a pool of certificate verification.

The initiation is made by call to directly linked PSM functions from nsSocketTransportService::Run() method.

That is not a clean way of linking modules.

Potential solutions:
- my suggestion is to init the thread pool in PSM (nsNSSCompoment::Init/Shutdown maybe) by sending an event to the socket thread
- other way is to introduce a callback directly called from nsSocketTransportService::Run() during start and end to let other modules hook it and handle its initiations there

I personally prefer a little bit more the former.
I didn't expect to ever need to access this service off the main thread. However, a fix for bug 660749 would probably be better-off (performance-wise) if it starts an server cert verification job directly from the cache thread, instead of through an event posted to the socket transport thread.
Probably true, just let you know that PSM inits in nsHttpHandler::NewProxiedChannel, so with the first suggested initiation code you will get the thread pool up on time.  FTP, BTW, is not doing this initiation.

You may use the PSM mutex to sync access to the thread pool global.  Or you can play on assumption that PSM lives longer then consumers of the pool (not quit verified, but worth to check on).

Also there could be a third simple solution: do just a lazy init.  There is no magic around thread pools, you can do that from any thread.  Having and interface to get it from "@mozilla.org/psm;1" is IMO also OK.
Blocks: 674147, 712363
No longer depends on: 674147
Assignee: nobody → bsmith
Certificate verification is strongly dependent on the profile, but not strongly dependent on the network. So, I startup/shutdown the certificate verification threads in response to gaining/losing the profile.

Previously, the code in nsNSSComponent created/deleted background threads in response to the network being torn down or started up. However, while certificate validation works better when there is a network (because then we can fetch OCSP), certificate validation works, to a certain extent, even without the network. However, I am eager to hear conuter-arguments as I may very well be overlooking something here.
Attachment #583638 - Flags: review?(kaie)
Attachment #583638 - Flags: review?(kaie) → review?(honzab.moz)
Attachment #583638 - Flags: review?(honzab.moz) → review?(kaie)
I could potentially do the review on Sunday.

Is there a strong reason for this bug to block bug 712363 except you have to re-merge?  I don't see the relation being strong here my self.
Comment on attachment 583638 [details] [diff] [review]
Cleanup cert verification thread pool initiation and shutdown

Review of attachment 583638 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab with the comment addressed or vetoed with a good explanation.

Kai will have comments him self soon, so leaving r? for now.

::: security/manager/ssl/src/nsNSSComponent.cpp
@@ -402,5 @@
>  nsNSSComponent::createBackgroundThreads()
>  {
>    NS_ASSERTION(mCertVerificationThread == nsnull,
>                 "Cert verification thread already created.");
> -

I'd leave this blank line, personally.

@@ +2538,5 @@
>  void
>  nsNSSComponent::DoProfileChangeTeardown(nsISupports* aSubject)
>  {
> +  /* XXX this doesn't work well, since nothing expects null pointers */
> +  deleteBackgroundThreads();

Hmm.. not sure how this will work in applications allowing profile switch (someone recently told me there is none anymore now).

This IMO could be done in xpcom-shutdown better.

Then you don't need to separate the code of ShutdownSSLServerCertVerification() function.
Honza, I'm worried that waiting until XPCOM shutdown is too late.

The description of "profile-change-teardown" clearly says:
  "All async activity must be stopped in this phase."

This means I agree with the flow of the proposed patch.


(a)

Cosmetical request, please move
+  /* XXX this doesn't work well, since nothing expects null pointers */
+  deleteBackgroundThreads();

to this position:
   else if (nsCRT::strcmp(aTopic, PROFILE_CHANGE_TEARDOWN_TOPIC) == 0) {
     PR_LOG(gPIPNSSLog, PR_LOG_DEBUG, ("in PSM code, receiving change-teardown\n"));
   -->
     DoProfileChangeTeardown(aSubject);

This will place the calls to createBackgroundThreads and deleteBackgroundThreads closer to each other, and makes this correlation easier to grasp.


(b)

Optional:
I understand you're simply moving the comment "since nothing expects null pointers",
but that comment isn't clear.
If you understand what this means, could you please clarify the term "nothing" in the comment?


(c)

-  createBackgroundThreads();
-  if (!mCertVerificationThread)
-  {
-    PR_LOG(gPIPNSSLog, PR_LOG_DEBUG, ("NSS init, could not create threads\n"));
-
-    DeregisterObservers();
-    mPIPNSSBundle = nsnull;
-    return NS_ERROR_OUT_OF_MEMORY;
-  }


Startup of the thread could still fail, which will cause us to destroy the thread object, and InitializeSSLServerCertVerificationThreads also checks for failures.

Why don't you keep an error check in place like this, just to be safe?


  if (!mCertVerificationThread || !gCertVerificationThreadPool)
  {
    PR_LOG(gPIPNSSLog, PR_LOG_DEBUG, ("NSS init, at least one thread failed\n"));

    deleteBackgroundThreads();

    DeregisterObservers();
    mPIPNSSBundle = nsnull;
    return NS_ERROR_OUT_OF_MEMORY;
  }


(d)

+void ShutdownSSLServerCertVerification()
+{
+  delete gMutex;
+  gMutex = nsnull;
+}

You could add a comment saying:
  // TODO: reset gCreateMutexCallOnce if XPCOM might restart

(glandium said, we currently don't do, but in theory we could.)
Comment on attachment 583638 [details] [diff] [review]
Cleanup cert verification thread pool initiation and shutdown

r- because I believe removing the error checking code isn't the best approach and should be kept.

If you address all my suggestions, r=kaie
Attachment #583638 - Flags: review?(kaie) → review-
(In reply to Honza Bambas (:mayhemer) from comment #5)
> Hmm.. not sure how this will work in applications allowing profile switch
> (someone recently told me there is none anymore now).

bent, bsmedberg, and others have told me that profile switch is no longer supported and many parts of Gecko have had their profile switch support removed already. (In fact, IIRC, I got an r- from somebody on a patch for trying to add logic to support it somewhere.)

> This IMO could be done in xpcom-shutdown better.

Kai is right, this code has to happen at profile-change-teardown.

(In reply to Kai Engert (:kaie) from comment #6)
> Cosmetical request, please move
> +  /* XXX this doesn't work well, since nothing expects null pointers */
> +  deleteBackgroundThreads();
> 
> to this position:
>    else if (nsCRT::strcmp(aTopic, PROFILE_CHANGE_TEARDOWN_TOPIC) == 0) {
>      PR_LOG(gPIPNSSLog, PR_LOG_DEBUG, ("in PSM code, receiving
> change-teardown\n"));
>    -->
>      DoProfileChangeTeardown(aSubject);

I do not think we should make this change, because DoProfileChangeTeardown is called in TWO places, not just this one place.

> Optional:
> I understand you're simply moving the comment "since nothing expects null
> pointers",
> but that comment isn't clear.
> If you understand what this means, could you please clarify the term
> "nothing" in the comment?

I actually do not understand the comment myself, so I am just going to leave it as-is.

> -  createBackgroundThreads();
> -  if (!mCertVerificationThread)
> 
> Startup of the thread could still fail, which will cause us to destroy the
> thread object, and InitializeSSLServerCertVerificationThreads also checks
> for failures.

> Why don't you keep an error check in place like this, just to be safe?

My view is that mCertVerificationThread is a non-critical component and that we should tolerate its failure gracefully, since in common operation it isn't used at all. Failure to create the SSL cert verification threads is more serious issue, but at least in theory, Firefox should still work fine for non-SSL connections in this state, so I don't consider it fatal either.
 
I suggest that we change the logging so that it can log more specific errors (i.e. whether the SSL cert verification thread failed, or whether the mCertVerificationThread failed), and carry on. Do you agree?
(In reply to Honza Bambas (:mayhemer) from comment #4)
> I could potentially do the review on Sunday.
> 
> Is there a strong reason for this bug to block bug 712363 except you have to
> re-merge?  I don't see the relation being strong here my self.

Originally, this patch was required when we were trying to support asynchronous cert verification for connections not managed by the socket transport thread. However, now, we do not use asynchronous cert verification for those connections, so this bug does not block bug 712363 any longer. Thanks for pointing this out.
No longer blocks: 712363
Comment on attachment 593767 [details] [diff] [review]
Cleanup cert verification thread pool initiation and shutdown, updated after landing of #712363

Oops, playing with hg bzexport... with that updated diff, m-c packages fine on OpenBSD (see 669050)
Attachment #593767 - Attachment description: Cleanup cert verification thread pool initiation and shutdown, updated after landing of # → Cleanup cert verification thread pool initiation and shutdown, updated after landing of #712363
Comment on attachment 593768 [details] [diff] [review]
Cleanup cert verification thread pool initiation and shutdown, updated after landing of #

obsolete duplicate patch..
Attachment #593768 - Attachment is obsolete: true
Comment on attachment 593767 [details] [diff] [review]
Cleanup cert verification thread pool initiation and shutdown, updated after landing of #712363

Note that this patch applies to both m-c and m-a, now that #712363 has been backported to m-a.
(In reply to Landry Breuil from comment #14)
> Comment on attachment 593767 [details] [diff] [review]
> Cleanup cert verification thread pool initiation and shutdown, updated after
> landing of #712363
> 
> Note that this patch applies to both m-c and m-a, now that #712363 has been
> backported to m-a.

Landry no review requests ?
Well, no since it's originally bsmith's patch, only ported to apply on m-c, and kaie's review/comments still stands against that patch.. and there's been no reply to comment 8.

Patch is still needed and used on OpenBSD to workaround #669050, but i don't know what's needed now to get some progress..
Any progress here?
I will update the patch as part of my work on bug 660749. I want to make sure that what I do here also works for what I need in bug 660749.
What's the status now ? i had to update the patch after #703834 landed...
Interestingly, it seems that this patch is not needed anymore on OpenBSD in aurora and central now... maybe some other change to cert threads made the problem i was seeing in #669050 go away (or hide somewhere else ?)
Assignee: brian → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: