Last Comment Bug 468736 - PSM nsPSMBackgroundThread::requestExit is racy and not threadsafe
: PSM nsPSMBackgroundThread::requestExit is racy and not threadsafe
Status: RESOLVED FIXED
[inbound]
: hang
Product: Core Graveyard
Classification: Graveyard
Component: Security: UI (show other bugs)
: Trunk
: All All
: -- critical with 5 votes (vote)
: mozilla7
Assigned To: Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
:
:
Mentors:
: 428749 430260 471080 738012 (view as bug list)
Depends on: 429304 653161 669050
Blocks: cmd-q 651665
  Show dependency treegraph
 
Reported: 2008-12-09 16:29 PST by timeless
Modified: 2016-09-27 13:03 PDT (History)
29 users (show)
benjamin: blocking1.9.1-
benjamin: wanted1.9.1+
See Also:
QA Whiteboard:
Iteration: ---
Points: ---
-
-
.x+


Attachments
profiled stack trace from camino (11.56 KB, text/plain)
2008-12-09 16:29 PST, timeless
no flags Details
Patch v1 [i think this might address the race i described] (19.14 KB, patch)
2009-04-06 18:59 PDT, timeless
honzab.moz: review-
Details | Diff | Splinter Review
another possible sample of this hang (133.03 KB, text/plain)
2009-06-02 10:13 PDT, Oscar van Vliet
no flags Details
A sample of a hang from Firefox 3.5 final (144.85 KB, text/plain)
2009-07-01 21:01 PDT, Randy Oo
no flags Details
whitespace-ignore version of patch v1 (18.50 KB, patch)
2010-04-19 04:51 PDT, Kai Engert (:kaie) (on vacation)
no flags Details | Diff | Splinter Review
smaller whitespace-ignore vresion of patch v1 (10.65 KB, patch)
2010-04-19 04:57 PDT, Kai Engert (:kaie) (on vacation)
no flags Details | Diff | Splinter Review
Updated patch (10.85 KB, patch)
2011-04-21 09:26 PDT, :Ehsan Akhgari
honzab.moz: review+
Details | Diff | Splinter Review
Wait for 25ms or until notified by background thread (10.05 KB, patch)
2011-06-08 16:15 PDT, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
no flags Details | Diff | Splinter Review
Wait for 25ms or until notified by background thread, fixed (13.27 KB, patch)
2011-06-09 11:42 PDT, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
no flags Details | Diff | Splinter Review
Have background threads send exit notification to requestExit in the main thread (18.51 KB, patch)
2011-06-22 15:53 PDT, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
honzab.moz: review+
Details | Diff | Splinter Review
v2 Have background threads send exit notification to requestExit in the main thread (18.38 KB, patch)
2011-06-24 11:41 PDT, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
brian: review+
Details | Diff | Splinter Review

Description timeless 2008-12-09 16:29:39 PST
Created attachment 352221 [details]
profiled stack trace from camino

ok, so the main thread is here:
http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/security/manager/ssl/src/nsPSMBackgroundThread.cpp?rev=23dd66069c44&mark=88-97#82
- i think this code needs a stack object. 

it looks like a race

http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/security/manager/ssl/src/nsNSSIOLayer.cpp?rev=23dd66069c44&mark=341-375#334
- is the other thread - the second thread enters passes 340 and then gets dangerous. the main thread enters and does stuff, but it's too late.
Comment 1 timeless 2008-12-12 02:23:39 PST
*** Bug 468234 has been marked as a duplicate of this bug. ***
Comment 2 Mike Beltzner [:beltzner, not reading bugmail] 2008-12-15 12:22:06 PST
This causes unpredictable hangs (see bug 468234) and is thusly nominated for blocking.
Comment 3 Benjamin Smedberg [:bsmedberg] 2008-12-16 12:29:09 PST
I don't think this is prevalent enough to be a blocker. Definitely wanted.
Comment 4 Ilya Sherman 2008-12-24 18:10:28 PST
*** Bug 471080 has been marked as a duplicate of this bug. ***
Comment 5 Stewart Campbell 2009-01-04 14:35:39 PST
I can reproduce this almost everytime I use Camino (Version 2.0b1 (1.9.0.6pre 2008121820)) (See Duplicate bug 471080). It seems to be more prevalent when I use GMail before I try to quit.
Comment 6 Honza Bambas (:mayhemer) 2009-03-20 12:51:13 PDT
*** Bug 484367 has been marked as a duplicate of this bug. ***
Comment 7 Honza Bambas (:mayhemer) 2009-03-20 12:54:30 PDT
(In reply to comment #6)
> *** Bug 484367 has been marked as a duplicate of this bug. ***

nsPSMBackgroundThread::requestExit is innocent here. See bug 484367, it happened to me right after start, that bug also contains some local variables dump on the dead thread, hope it might move us forward a bit. It seems to me that this is some event queue problem, but it's a guess. I could reproduce just ones.
Comment 8 timeless 2009-04-06 18:59:14 PDT
Created attachment 371365 [details] [diff] [review]
Patch v1 [i think this might address the race i described]

could someone who hits but try running w/ this? (you can tryserver it)
Comment 9 Honza Bambas (:mayhemer) 2009-04-14 10:49:51 PDT
Comment on attachment 371365 [details] [diff] [review]
Patch v1 [i think this might address the race i described]

In your new while cycle in nsPSMBackgroundThread::requestExit you should access mExitState under the lock as you modify it on the other thread.

Why did you remove NS_ASSERTION(mThreadHandle, "Could not create nsPSMBackgroundThread\n"); ?

Nit: if (ssl_thread_singleton->mExitState): I'd rather see if (ssl_thread_singleton->mExitState != ePSMThreadRunning), just to make more clear what the condition is about

Please don't include the only-formatting changes in nsNSSComponent, it's unrelated to this bug.

Otherwise it looks good, we just have to take care to test shutdown well, there might be races introduced.
Comment 10 Oscar van Vliet 2009-06-02 10:13:09 PDT
Created attachment 381101 [details]
another possible sample of this hang

I have had another hang after it was suggested in bug 466148 that my crashes are due to this bug. I attached the sample.

If it is the same bug, I would be happy to test out builds with fixes, as Smokey said it was difficult to reproduce.
Comment 11 Chris Lawson (gone) 2009-06-07 21:06:12 PDT
*** Bug 466148 has been marked as a duplicate of this bug. ***
Comment 12 Randy Oo 2009-07-01 21:01:36 PDT
Created attachment 386436 [details]
A sample of a hang from Firefox 3.5 final

Oscar van Vliet wrote:
> Smokey said it was difficult to reproduce.

I really don't understand this. My bug, comment #1 (which existed before this one was reported, yet somehow got closed as a dupe of this one?!?) has a testcase that triggers the problem every single time, and a website that I visit frequently brings Firefox 3.5 to its knees EVERY SINGLE TIME, without fail. I can't (and won't) upgrade from 3.0.11 until this bug is resolved.

I'd just like to point out that there may be information in the comments under my bug that could prove helpful for someone working on this one (if anyone), and also that I can reproduce the problem consistently, so I'd be happy to help with testing patches or capture samples from the crashing, if it would be helpful.
Comment 13 Ilya Sherman 2009-07-02 00:21:21 PDT
(In reply to comment #12)
> Created an attachment (id=386436) [details]
> A sample of a hang from Firefox 3.5 final

AFAICT, this sample does not have nsPSMBackgroundThread::requestExit in it, and so is a different bug.
Comment 14 Kai Engert (:kaie) (on vacation) 2010-04-19 04:51:04 PDT
Created attachment 439892 [details] [diff] [review]
whitespace-ignore version of patch v1

timeless, maybe it's just me, but I find it very difficult to separate "bug fix meat" from "white space changes" when looking through a patch, especially when trying to understand fixes for tricky code.

This is timeless' patch, ignoring all whitespace changes (which doesn't work for the places where you split code into multiple lines).
Comment 15 Kai Engert (:kaie) (on vacation) 2010-04-19 04:57:50 PDT
Created attachment 439893 [details] [diff] [review]
smaller whitespace-ignore vresion of patch v1

I've removed additional unnecessary chunks from the patch.
Comment 16 Kai Engert (:kaie) (on vacation) 2010-04-19 05:01:52 PDT
Comment on attachment 371365 [details] [diff] [review]
Patch v1 [i think this might address the race i described]

clearing review requests, because Honza already gave r-

waiting for answers to Honza's comments and a new patch
Comment 17 :Ehsan Akhgari 2010-12-03 13:04:44 PST
So, what's needed to make progress on this bug?  Just addressing the comments in comment 9?
Comment 18 Honza Bambas (:mayhemer) 2010-12-03 13:09:51 PST
(In reply to comment #17)
> So, what's needed to make progress on this bug?  Just addressing the comments
> in comment 9?

Probably yes.  Also some kind of automated testing for this would be excelent, but probably unachievable.
Comment 19 :Ehsan Akhgari 2010-12-03 13:26:26 PST
OK, then.  If this is deemed to be blocking, I can pick it up and work on it.  If not, I'm afraid that I can take a look at it post-2.0.
Comment 20 chris hofmann 2010-12-15 19:41:50 PST
looks like this one might have missed the grand re-triage.  have we decided to block on firefox 4, or not?
Comment 21 Benjamin Smedberg [:bsmedberg] 2010-12-15 20:00:25 PST
Not a regression from 3.6, not a blocker.
Comment 22 Myk Melez [:myk] [@mykmelez] 2011-04-20 16:52:51 PDT
It looks like this might be causing bug 651665.

Ehsan: any chance you can pick this up now that we're post-2.0?
Comment 23 :Ehsan Akhgari 2011-04-21 09:26:00 PDT
Created attachment 527557 [details] [diff] [review]
Updated patch

This patch updates the no-whitespace-change version to apply cleanly on trunk, and then addresses review comments in comment 9.

I can't reproduce the problem, so I don't really know how to test this.  Running the browser and going to SSL sites doesn't seem to break anything obvious.  I've posted this to the try server as well.
Comment 24 Myk Melez [:myk] [@mykmelez] 2011-04-21 10:29:28 PDT
One indication that your patch resolves bug 651665 is that the Add-on SDK tests pass on Linux Debug with your push to try server:

http://tbpl.mozilla.org/?tree=MozillaTry&noignore=1&pusher=eakhgari@mozilla.com

Compare with this random push of yours to mozilla-central:

http://tbpl.mozilla.org/?noignore=1&pusher=eakhgari@mozilla.com

That isn't determinative, since it sometimes happens anyway (f.e. in that random push of yours, one of the four debug builds that have run the tests have passed them).  But most of the time the tests fail on debug builds when tested by the test automation, so it's a good sign.

It'd be useful to have Linux64 Debug, OS X Debug, and OS X64 Debug results for that try server push as well.
Comment 25 Myk Melez [:myk] [@mykmelez] 2011-04-21 13:04:04 PDT
(In reply to comment #24)
> It'd be useful to have Linux64 Debug, OS X Debug, and OS X64 Debug results for
> that try server push as well.

The results are now in, and all those builds passed the SDK tests, so it looks like your patch does indeed fix the problem.
Comment 26 :Ehsan Akhgari 2011-04-21 13:36:16 PDT
And in general try server was extremely happy with my push: <http://tbpl.mozilla.org/?tree=MozillaTry&rev=b53e241153e5>.

(The two windows reftest failures are caused by another patch in my queue, and the OS X opt Moth orange is intermittent.)
Comment 27 Myk Melez [:myk] [@mykmelez] 2011-04-21 14:30:54 PDT
(In reply to comment #25)
> The results are now in, and all those builds passed the SDK tests, so it looks
> like your patch does indeed fix the problem.

To make extra sure, I re-ran all the SDK tests on the four Mac/Linux debug builds, and they all passed the second time as well:

http://tbpl.mozilla.org/?tree=MozillaTry&rev=b53e241153e5&noignore=1

We don't have test results on Windows debug builds because the SDK tests aren't being run on those yet (bug 651806).
Comment 28 Honza Bambas (:mayhemer) 2011-04-26 09:35:02 PDT
Comment on attachment 527557 [details] [diff] [review]
Updated patch

>+++ b/security/manager/ssl/src/nsCertVerificationThread.cpp

>+      while (!mExitState && (0 == verification_thread_singleton->mJobQ.GetSize())) {

mExitState == ePSMThreadRunning please.

>+      if (mExitState)
>         break;

mExitState != ePSMThreadRunning please.

-1917,21 +1915,29 @@ nsNSSComponent::Init()
>   mSSLThread = new nsSSLThread();
>-  if (mSSLThread)
>-    mSSLThread->startThread();
>+  if (mSSLThread) {
>+    if (NS_FAILED(mSSLThread->startThread())) {
>+      delete mSSLThread;
>+      mSSLThread = nsnull;
>+    }
>+  }
>   mCertVerificationThread = new nsCertVerificationThread();
>-  if (mCertVerificationThread)
>-    mCertVerificationThread->startThread();
>+  if (mCertVerificationThread) {
>+    if (NS_FAILED(mCertVerificationThread->startThread())) {
>+      delete mCertVerificationThread;
>+      mCertVerificationThread = nsnull;
>+    }
>+  }
> 

Do the same in nsNSSComponent::DoProfileChangeNetRestore or rather remove this change (I prefer the letter).  I don't think we have non-null checks on all places in the code.

> void nsPSMBackgroundThread::requestExit()
>-    if (mExitRequested)
>+    if (mExitState)
>       return;

mExitState != ePSMThreadRunning please.

>+  while (GetExitStateThreadSafe() < ePSMThreadStopped) {
>+    NS_ProcessNextEvent(nsnull, PR_FALSE);
>+    if (GetExitStateThreadSafe() < ePSMThreadStopped)
>+      PR_Sleep(PR_SecondsToInterval(5));
>+  }

We should add a comment here that we cannot rely on posting/processing events since there is no event loop that late when this code gets called.

> PRBool nsSSLThread::exitRequested()
>+  return ssl_thread_singleton->mExitState;

mExitState != ePSMThreadRunning please.

r=honzab with these fixes
Comment 30 :Ehsan Akhgari 2011-04-27 10:01:32 PDT
I've backed this out on the suspicion of being the culprit for the 400% Tshutdown regression:

http://hg.mozilla.org/mozilla-central/rev/da816b446ffa

If this is the cause, we'll need a new owner for this bug, as I won't have enough time to diagnose the regression.  If it's not the cause, I'll reland it.
Comment 31 :Ehsan Akhgari 2011-04-27 13:30:36 PDT
The backout fixed the regression.  Somebody more familiar with the code here should pick it up and investigate the regression.
Comment 32 Myk Melez [:myk] [@mykmelez] 2011-04-27 13:34:40 PDT
Brian & Honza: would one of you be able to pick this up?
Comment 33 Boris Zbarsky [:bz] (still a bit busy) 2011-04-27 13:45:46 PDT
To be clear, this regressed Tp4_shutdown.
Comment 34 Honza Bambas (:mayhemer) 2011-04-27 14:07:38 PDT
I will do that.
Comment 35 Jesse Ruderman 2011-06-05 19:14:48 PDT
*** Bug 428749 has been marked as a duplicate of this bug. ***
Comment 36 Jesse Ruderman 2011-06-05 19:15:11 PDT
*** Bug 430260 has been marked as a duplicate of this bug. ***
Comment 37 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-06-06 11:13:48 PDT
What's blocking this from moving forward?  I'm hitting this about every other time I shut down a Nightly ...
Comment 38 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-06-06 11:22:51 PDT
I do not think the Tshutdown regression is worse than this hang; the hang is actually a worse-than-400% shutdown time regression and lots of people are running into this. The patch can and should be improved. I will help Honza with it.
Comment 39 :Ehsan Akhgari 2011-06-06 19:43:03 PDT
(In reply to comment #38)
> I do not think the Tshutdown regression is worse than this hang

I beg to differ.  The Tshutdown regression might affect all of our users, but the hang definitely only affects some of them.

That being said, I also definitely think that this could be improved, but I didn't know enough about the code in question to do that myself.
Comment 40 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-06-07 11:06:28 PDT
We know this old bug is happening much more frequently in nightly. Do we know why? Is this occurring more frequently on FF5 beta too? If so, then we should try to address it for FF5. 

Did patches related to SYN retry and/or pipelining cause this to happen more frequently? I know SYN retry itself doesn't cause this, because I have seen it even when the SYN retry timeout is zero (which disables the actual retry). But, I wouldn't be surprised if the associated changes to support SYN retry had some effect in increasing the frequency of occurrence.
Comment 41 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-06-07 11:16:45 PDT
Comment on attachment 527557 [details] [diff] [review]
Updated patch

Review of attachment 527557 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/manager/ssl/src/nsPSMBackgroundThread.cpp
@@ +94,5 @@
>    }
> +  while (GetExitStateThreadSafe() < ePSMThreadStopped) {
> +    NS_ProcessNextEvent(nsnull, PR_FALSE);
> +    if (GetExitStateThreadSafe() < ePSMThreadStopped)
> +      PR_Sleep(PR_SecondsToInterval(5));

Why not PR_Sleep(PR_INTERVAL_NO_WAIT) or a lower-than-5-seconds sleep interval here? Five seconds seems ridiculously long.
Comment 42 Honza Bambas (:mayhemer) 2011-06-07 11:28:29 PDT
(In reply to comment #41)
> Why not PR_Sleep(PR_INTERVAL_NO_WAIT) or a lower-than-5-seconds sleep
> interval here? Five seconds seems ridiculously long.

Because otherwise we would have 100% load on that thread w/o any context switches.  It would lead to hang of UI thread as well.

(In reply to comment #40)
> Did patches related to SYN retry and/or pipelining cause this to happen more
> frequently?

I strongly believe there is no relation.  I was experiencing this bug also pretty often year or longer ago.  Probably some pattern makes this appear more often, therefor people experience it often when working on a bug and repeating the same behavior.

Brian, if you have time to investigate the regression, then please do so.  If not, I can get to this maybe at the end of this week.
Comment 43 Honza Bambas (:mayhemer) 2011-06-07 11:39:52 PDT
We potentially could:
- if there is an event queue (e.g. during profile change) then just process messages with canWait = true and do not PR_Sleep
- if there is no event queue (usual case during shutdown) then we could use a monitor or cond var bound to the existing thread mutex notified when the state changes

I would just need to figure out how to check whether there's still the event queue.
Comment 44 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-06-07 11:59:39 PDT
(In reply to comment #42)
> Because otherwise we would have 100% load on that thread w/o
> any context switches.  It would lead to hang of UI thread as
> well.

It would still yield. I can see why it might be better to sleep a little, but less than five seconds. I am going to investigate trying a smaller interval first.

> I strongly believe there is no relation.  I was experiencing this bug also
> pretty often year or longer ago.  Probably some pattern makes this appear
> more often, therefor people experience it often when working on a bug and
> repeating the same behavior.

There are 4 people (including me) sitting within 10 meters of me that are regularly running into this for the last four weeks, so I think something has changed recently that makes this occur much more frequently. Maybe it's even IPV6 related, because I think IPV6 was turned on our office recently.

> Brian, if you have time to investigate the regression, then please do so. 
> If not, I can get to this maybe at the end of this week.

I am working on it now.
Comment 45 Honza Bambas (:mayhemer) 2011-06-07 12:07:10 PDT
> PR_SecondsToInterval

Oh!  Right, not 5 SECONDS, but rather MILLISECONDS...  I completely overlooked that...
Comment 46 Patrick McManus [:mcmanus] 2011-06-07 12:21:48 PDT
(In reply to comment #45)
> > PR_SecondsToInterval
> 
> Oh!  Right, not 5 SECONDS, but rather MILLISECONDS...  I completely
> overlooked that...

:)

as a mostly unimportant aside, I would make it 15ms (still a massive drop from 5000) instead of 5ms as a lot of the older NT derivations had a 15ms tick by default.. if you specify less it sometimes just ends up busy waiting anyhow for you.
Comment 47 Asa Dotzler [:asa] 2011-06-07 14:58:19 PDT
too late for us to do anything in 5.
Comment 48 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-06-08 16:15:19 PDT
Created attachment 538139 [details] [diff] [review]
Wait for 25ms or until notified by background thread

I made every change of mExitState cause a notification on mCond; requestExit will then wait until mExitState changes, while processing an event every 25ms.
Comment 49 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-06-08 16:16:07 PDT
I have not figured out yet why this is happening more often recently for many of us.
Comment 50 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-06-08 16:27:10 PDT
(In reply to comment #49)
> I have not figured out yet why this is happening more often recently for
> many of us.

Oh, you missed the news: http://www.nasa.gov/mission_pages/sunearth/news/News060711-blast.html
Comment 51 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-06-09 11:42:28 PDT
Created attachment 538307 [details] [diff] [review]
Wait for 25ms or until notified by background thread, fixed

The previous patch wasn't folded into the patch I based it on, so it was really an interdiff against the patch that got backed out. Here is the folded patch.
Comment 52 Honza Bambas (:mayhemer) 2011-06-09 13:14:48 PDT
Comment on attachment 538307 [details] [diff] [review]
Wait for 25ms or until notified by background thread, fixed

Review of attachment 538307 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the updated patch, Brian.

But, I was wrong about one major thing.  I was convinced that during shutdown there is no event queue anymore at the point of requestExit being called.  But there is.  We shut the main thread loop later in ShutdownXPCOM function.

Based on that and fact that nsNSSComponent is thanks nsPSMInitPanic only instantiated on the main thread I propose:
- remove the code that is waiting for the cond var with 25ms delays until the thread sets the state to 'stopped'
- let the thread we want to stop set the state to 'stopped' and then post just an empty event to the main thread after it has finished
- main thread waiting for shutdown will do:

  while (GetExitState(threadLock) != ePSMThreadStopped) {
     threadLock.unlock();
     NS_ProcessPendingEvents(do_GetCurrentThread(), PR_IntervalToMilliseconds(100));
     threadLock.lock();
  }

  .. or what ever form of this code

- the timeout should be there just for safety, but I will r+ patch even w/o it

This gives us no delays and fluent dispatch of all events pending on the main thread.

Time will prove if processing events from this place won't cause any trouble with reentering.

Dropping the r flag since the patch will be adjusted anyway.

General comments:

::: security/manager/ssl/src/nsCertVerificationThread.cpp
@@ +132,5 @@
>  
>      {
>        MutexAutoLock threadLock(verification_thread_singleton->mMutex);
> +
> +      while (GetExitState(threadLock) == ePSMThreadRunning &&

Repeating this on and on is unreadable and tends to mistakes.  Maybe change this to bool IsRunning() method?  Also could have IsStopped() method for the check in the request exiting thread.

::: security/manager/ssl/src/nsNSSComponent.cpp
@@ +432,5 @@
> +    delete mCertVerificationThread;
> +    mCertVerificationThread = nsnull;
> +  }
> +}
> +

Let's make these two functions members of nsNSSComponent, no need to separate them.  Also have the two threads as arguments suggests these are one-purpose functions.

Nit: prefixing arguments with 'm' is generally quit unusual.

::: security/manager/ssl/src/nsPSMBackgroundThread.cpp
@@ +77,5 @@
> +  return mExitState;
> +}
> +
> +nsPSMBackgroundThread::ExitState
> +nsPSMBackgroundThread::GetExitStateWithoutLock() const

Maybe rather GetExitStateUnlocked?

Kai: why exactly we need access to this w/o a lock?  exitRequest seems not to be called under the thread's lock, or it is? 

http://mxr.mozilla.org/mozilla-central/search?string=exitRequested

@@ +109,5 @@
> +      if (GetExitState(threadLock) == ePSMThreadStopped) {
> +        break;
> +      }
> +    }
> +    NS_ProcessNextEvent(nsnull, PR_FALSE);

NS_ProcessPendingEvents(do_GetCurrentThread()) is better on this place.  It handles all events that might get posted, not just the first one.

::: security/manager/ssl/src/nsPSMBackgroundThread.h
@@ +67,5 @@
> +    ePSMThreadStopRequested = 1,
> +    ePSMThreadStopped = 2
> +  };
> +
> +  ExitState GetExitState(::mozilla::MutexAutoLock const & proofOfLock) const;

Little bit knotty this 'proofOfLock' but let's take it.

@@ +84,5 @@
>    nsresult startThread();
>    void requestExit();
> +
> +private:
> +  PRUint32 GetExitStateThreadSafe();

You don't need this anymore.
Comment 53 Johnathan Nightingale [:johnath] 2011-06-09 14:17:51 PDT
Minusing this for tracking - would like to see an approval request for this to land on aurora once we have a safe, reviewed patch baking on central.
Comment 54 David :Bienvenu 2011-06-22 09:24:08 PDT
I used to be able to easily reproduce this on Windows with Thunderbird on my desktop, but not so much on my windows laptop for some reason. But FTR, if I got IMAP autosync going on SSL connection and shutdown in the middle, I would generally hit this hang.
Comment 55 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-06-22 15:53:31 PDT
Created attachment 541211 [details] [diff] [review]
Have background threads send exit notification to requestExit in the main thread

Here is the fix that we talked before. The background thread will post a message to the main thread, as the main thread spins the event loop in requestExit waiting for it.
Comment 56 Honza Bambas (:mayhemer) 2011-06-23 09:17:46 PDT
Comment on attachment 541211 [details] [diff] [review]
Have background threads send exit notification to requestExit in the main thread

Review of attachment 541211 [details] [diff] [review]:
-----------------------------------------------------------------

Looks nice.

r+ after following comments gets addressed, manly the last one:

::: security/manager/ssl/src/nsNSSComponent.cpp
@@ +397,2 @@
>  {
> +  if (mSSLThread) {

Why not left in the following form?

if ()
{
}

You are against the file style.

@@ +409,5 @@
> +
> +void
> +nsNSSComponent::createBackgroundThreads()
> +{
> +  NS_ASSERTION(mSSLThread == nsnull, "SSL  thread already created.");

New line after this line please.  Only one space between SSL and thread.

@@ +419,5 @@
> +    return;
> +  }
> +
> +  NS_ASSERTION(mCertVerificationThread == nsnull,
> +               "Cert verification thread already created.");

New line as well here.  Maybe put this at the top of the method to get assertion/warning even we fail to create the ssl thread?

::: security/manager/ssl/src/nsPSMBackgroundThread.cpp
@@ +73,5 @@
>  
> +PRBool
> +nsPSMBackgroundThread::exitRequested(const MutexAutoLock & /*proofOfLock*/) const
> +{
> +  return mExitState != ePSMThreadRunning;

Maybe let this method return exitRequestedNoLock() ?  exitRequestedNoLock then could be explicitly inline.

@@ +81,5 @@
> +nsPSMBackgroundThread::postStoppedEventToMainThread(
> +    MutexAutoLock const & /*proofOfLock*/)
> +{
> +  NS_ASSERTION(PR_GetCurrentThread() == mThreadHandle,
> +               "Background thread stopped from another thread");

New line after this line please.

::: security/manager/ssl/src/nsPSMBackgroundThread.h
@@ +64,4 @@
>    mozilla::CondVar mCond;
>  
> +  PRBool exitRequested(::mozilla::MutexAutoLock const & proofOfLock) const;
> +  PRBool exitRequestedNoLock() const { return mExitState; }

return mExitState != ePSMThreadRunning;

@@ +81,5 @@
>    nsresult startThread();
>    void requestExit();
> +
> +private:
> +  PRUint32 GetExitStateThreadSafe();

What is this used for?

::: security/manager/ssl/src/nsSSLThread.cpp
@@ +1143,3 @@
>  {
> +  return !ssl_thread_singleton ||  
> +         ssl_thread_singleton->exitRequestedNoLock();

Why exactly are you changing this code, I mean the result value here?

This code has been introduced in bug 420187.  I understand the code looks wrong (we should return PR_TRUE for !ssl_thread_singleton).  Not sure if it really is wrong or not.

However, I would change the method's result in a different bug, not here, where we solve a quit different issue.
Comment 57 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-06-24 11:41:44 PDT
Created attachment 541736 [details] [diff] [review]
v2 Have background threads send exit notification to requestExit in the main thread
Comment 58 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-06-24 11:46:33 PDT
Comment on attachment 541736 [details] [diff] [review]
v2 Have background threads send exit notification to requestExit in the main thread

Carrying forward r+. Review comments addressed. Bug 666999 was filed for the bug in nsSSLThread::stoppedOrStopping.
Comment 59 Marco Bonardo [::mak] 2011-06-25 03:32:29 PDT
http://hg.mozilla.org/mozilla-central/rev/1d2879d39b2a
Comment 60 Stuart Morgan 2012-03-22 00:02:16 PDT
*** Bug 738012 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.