Closed
Bug 378241
Opened 18 years ago
Closed 18 years ago
Changes from bug 107491 lead to tons of thread-safety asserts
Categories
(Core :: Networking, defect, P1)
Core
Networking
Tracking
()
RESOLVED
FIXED
People
(Reporter: bzbarsky, Assigned: benjamin)
References
Details
Attachments
(5 files, 1 obsolete file)
|
8.82 KB,
patch
|
Details | Diff | Splinter Review | |
|
8.39 KB,
patch
|
Details | Diff | Splinter Review | |
|
21.85 KB,
patch
|
Details | Diff | Splinter Review | |
|
5.29 KB,
patch
|
benjamin
:
superreview+
|
Details | Diff | Splinter Review |
|
5.33 KB,
patch
|
rrelyea
:
review+
wtc
:
review+
KaiE
:
superreview+
|
Details | Diff | Splinter Review |
STEPS TO REPRODUCE:
1) Compile Seamonkey (don't know how to trigger this code in Firefox, though I'm sure there's a way... in any case, the app is not relevant here).
2) Type "84582" in the URL bar
3) Hit enter.
EXPECTED RESULTS: Error page
ACTUAL RESULT: Tons of thread-safety asserts.
For the first assert:
###!!! ASSERTION: Using observer service off the main thread!: 'Error', file ../../../mozilla/xpcom/ds/nsObserverService.cpp, line 130
the issue is that nsSocketTransport::InitiateSocket ends up calling ErrorAccordingToNSPR from a non-main thread (perfectly OK). But ErrorAccordingToNSPR now gets the nsINSSErrorService, which leads to PSM component init. And that code assumes it's on the main thread. In particular, nsNSSComponent::RegisterObservers is called by nsNSSComponent::Init.
Then this same code calls nsNSSComponent::GetPIPNSSBundleString, which touches the chrome registry from off-thread, and leads to:
###!!! ASSERTION: wrong thread: 'NS_IsMainThread()', file ../../../../mozilla/netwerk/base/src/nsIOService.cpp, line 488
as we try to open channels on a background thread to load stringbundles. And so forth.
The only approach I see here offhand is to create an nsIRunnable impl that makes sure PSM init happens and to sync-proxy that Run() method to the main thread...
Flags: blocking1.9?
| Assignee | ||
Comment 1•18 years ago
|
||
The solution is to use a sync-proxy to get the service.
Assignee: kengert → nobody
Component: Security: PSM → Networking
Flags: blocking1.9? → blocking1.9+
OS: Linux → All
QA Contact: psm → networking
Hardware: PC → All
| Assignee | ||
Updated•18 years ago
|
Assignee: nobody → dcamp
Updated•18 years ago
|
Priority: -- → P1
| Assignee | ||
Comment 3•18 years ago
|
||
Attachment #288687 -
Flags: review?(cbiesinger)
Comment 4•18 years ago
|
||
Benjamin, thanks a lot for working on this bug and your idea with the runnable.
Question regarding
nsresult temprv = NS_DispatchToMainThread();
If temprv indicates whether the dispatching, starting the runner worked, then I'm ok with adding the NS_WARNING.
GetXPCOMFromNSSError gets called without knowing whether it's an NSS error.
If GetXPCOMFromNSSError returns a failure (indicating it's unable to convert), that's not a failure for function ErrorAccordingToNSPR. It's just supposed to mean "ok, it's not an NSS error, so let's use the default error code set in ErrorAccordingToNSPR".
So instead of the current
try to convert, if it worked, then set rv to the conversion result
you do
try to convert, it the conversion fails, the converter will update
rv to NS_ERROR_FAILURE. This duplicates ErrorAccordingToNSPR default
error code in the runnable.
Maybe that's not a big deal, but if we wanted to stay closer to the current behavior and keep control over the default value in ErrorAccordingToNSPR, we could change the patch to only update rv if the runnable indicates a successful conversion.
Comment 5•18 years ago
|
||
Benjamin, are you ok with this modified patch?
| Assignee | ||
Comment 6•18 years ago
|
||
How about this? It avoids the extra member and IMO is a little easier to read.
Attachment #288998 -
Flags: review?(kengert)
Comment 7•18 years ago
|
||
Yes, I'm now happy with NSSErrorRunnable and ErrorAccordingToNSPR.
I went on to review your other changes in the file, your attempt to move the call to ErrorAccordingToNSPR out of the locks.
I think in nsSocketInputStream::Available you want to revert the if test.
I think you want
if (n < 0) ###
mCondition = ErrorAccordingToNSPR(code);
But I'm worried, because you'll update mCondition without holding the lock.
I don't know if mCondition is supposed to be protected by the lock, but it shoulds like it should?
This all brings me to a completely different proposal.
I think GetXPCOMFromNSSError is really a trivial function.
While it needs to have access to NSS headers, and therefore must reside in PSM, it doesn't need access to any state (besides the parameters).
I understand the problem comes from running through the NSS init service (which registers observers).
I think it should be fine to implement GetXPCOMFromNSSError without triggering NSS init.
At least the following functions can be decoupled from nsNSSComponent:
nsINSSErrorsService::isNSSErrorCode
nsINSSErrorsService::getXPCOMFromNSSError
nsINSSErrorsService::getErrorClass
We could also decouple
nsINSSErrorsService::getErrorMessage
although that has some more dependencies, but it should be ok to have another string bundle object.
Let me try something.
Comment 8•18 years ago
|
||
This no longer requires init of NSS/PSM component when calling ErrorAccordingToNSPR.
Maybe the original test case still triggers the assertion, we'll have to try that.
If it does, the assertions might also get triggered by docshell when it attempts to obtain an error message... Not sure if that code is reached in your test case. If it does, and we must still isolate it, I think the problem is now much more simple. We could use Benjamin's approach only for getting the error string in docshell. This would not require changing the scope of locks. I hope.
| Assignee | ||
Comment 9•18 years ago
|
||
The rule (though it may be unwritten) is that you must not call any XPCOM method while holding a lock. If this were just a direct call into the NSS error provider I might not worry about it... but in this case we're calling into the component manager first, which does locking which could potentially deadlock.
If we want a solution like this I think we at least need to cache the NSS error provider so that we don't have to enter the component manager. :-(
Comment 10•18 years ago
|
||
Just wanted to check in on this - we are comming up on M10/b2 and it would be great to wrap this up before then...
| Assignee | ||
Comment 11•18 years ago
|
||
Comment on attachment 288687 [details] [diff] [review]
Move ErrorAccordingToNSPR out of locks and sync-proxy to the main thread, rev. 1
I don't know whether rev. 3 is acceptable or not... it sets the status outside of the lock, which may not be acceptable. If that's the case, I think we're going to have to cache the NSS error reporter object somewhere.
Attachment #288687 -
Attachment is obsolete: true
Attachment #288687 -
Flags: review?(cbiesinger)
Comment 12•18 years ago
|
||
re mCondition. I quick look at the code shows that there are tons of places mCondition is set and reverenced without lock protection. (OnSocketReady() RecoverFromError(), OnSocketEvent() OnMsgInputClosed(), OnMsgOutputClosed()).
The only race I see is the potential for a partial setting of mCondition (that is the write to mCondition isn't atomic). That could be handled by using a local to read ErrorAccordingToNSPR(), and protecting the write with a lock or an PR_AtomicWrite() call. This is only useful if all the reference calls are protected by locks or atomicRead calls as well, however.
| Assignee | ||
Comment 13•18 years ago
|
||
32-bit writes are atomic on all processors that I'm aware of, so I think we're safe in that regard.
| Assignee | ||
Updated•18 years ago
|
Attachment #288998 -
Flags: review?(dcamp)
Comment 14•18 years ago
|
||
I want to take a step back.
I think we are over-engineering here.
This whole problem only exists because mozilla/netwerk is unaware of 4 numeric constants, which are defined in mozilla/security/nss:
#define SEC_ERROR_BASE (-0x2000)
#define SEC_ERROR_LIMIT (SEC_ERROR_BASE + 1000)
#define SSL_ERROR_BASE (-0x3000)
#define SSL_ERROR_LIMIT (SSL_ERROR_BASE + 1000)
In my very first proposed patch, I duplicated the numerical constants into function ErrorAccordingToNSPR. That was seen as "not elegant".
In a second step I had proposed to use a mechanism at the NSPR level, where we could register the error code ranges used by NSS/libSSL. But that was seen as unwanted, because NSPR and NSS are separate...
I think if the lack of 4 numeric constants leads to threading, locking, and order-of-init problems, I think we have gone too far.
Let's find a way to make mozilla/netwerk or mozilla/nsprpub aware of the range of error codes that are used by NSS and libSSL, and we're done.
Comment 15•18 years ago
|
||
(In reply to comment #14)
>
> Let's find a way to make mozilla/netwerk or mozilla/nsprpub aware of the range
> of error codes that are used by NSS and libSSL, and we're done.
... without using XPCOM, compile time knowledge ...
Comment 16•18 years ago
|
||
We could hard code the constants into mozilla/netwerk/base/public/nsINSSErrorsService.idl
We could make mozilla/netwerk use these constants directly.
In order to ensure the duplicated constants will not get out of synch between netwerk and security, we could do a run time check. When PSM inits, it could compare the values available in nsINSSErrorsService.idl with the values defined by NSS.
If PSM discovers a difference, it will assert.
This will assure that we update the netwerk code should the values used by NSS ever change.
| Assignee | ||
Comment 17•18 years ago
|
||
Comment on attachment 288998 [details] [diff] [review]
Proposed rev. 3
That sounds like a great solution to me... can you make a patch or should I?
Attachment #288998 -
Flags: review?(kengert)
Attachment #288998 -
Flags: review?(dcamp)
Comment 18•18 years ago
|
||
Bob, Benjamin, what do you think?
Attachment #291440 -
Flags: superreview?(benjamin)
Attachment #291440 -
Flags: review?(rrelyea)
| Assignee | ||
Comment 19•18 years ago
|
||
Comment on attachment 291440 [details] [diff] [review]
Patch v5
I think you should use PR_STATIC_ASSERT instead of NS_ASSERTION... sr=me with that change http://mxr.mozilla.org/mozilla/source/nsprpub/pr/include/prlog.h#259
Attachment #291440 -
Flags: superreview?(benjamin) → superreview+
Comment 20•18 years ago
|
||
Comment on attachment 291440 [details] [diff] [review]
Patch v5
+isNSSErrorCode(PRErrorCode code)
uppercase start letter please
and can you linewrap the lines after the && operators?
+#define NSS_NSPRCODE_TO_XPCOMCODE(nspr_code) \
how about making this an inline function?
Comment 21•18 years ago
|
||
Addressed comments from Benjamin and Biesi.
Carrying forward sr=benjamin
Attachment #291445 -
Flags: superreview+
Attachment #291445 -
Flags: review?(rrelyea)
Comment 22•18 years ago
|
||
Comment on attachment 291445 [details] [diff] [review]
Patch v6
r=wtc.
It seems better to do the PR_STATIC ASSERT the first thing in
nsNSSComponent::InitializeNSS.
Attachment #291445 -
Flags: review+
Comment 23•18 years ago
|
||
Comment on attachment 291445 [details] [diff] [review]
Patch v6
r+, with some notes:
NOTE: These error limits are compile time in both cases. Basically it means PSM and XPCOM match, which is fine. If it were a real runtime check with NSS, I would have to give an r-.
The primary reason for non-approval of a true run-time check would be PSM should be able to run with new versions of NSS installed. On some platforms NSS is updated independently of mozilla, it shouldn't fail to start simply because a newer version of NSS is installed.
Since translation of the NSS errors into strings is done by PSM, the important value here isn't what errors NSS can return, but which NSS errors PSM understands, so a compile type check in PSM is sufficient, thus the r+.
bob
Attachment #291445 -
Flags: review?(rrelyea) → review+
Comment 24•18 years ago
|
||
Is this going to land by tonight in time for the B2 freeze?
Comment 25•18 years ago
|
||
Comment on attachment 291445 [details] [diff] [review]
Patch v6
Bob, just to clarify -- SEC_ERROR_LIMIT means the upper end of the
range of error codes that NSS has reserved. It doesn't mean the
current maximum NSS SEC error code. So SEC_ERROR_LIMIT is unlikely
to change. Right now SEC_ERROR_LIMIT equals SEC_ERROR_BASE + 1000.
Comment 27•18 years ago
|
||
The 4 constants shown in comment 14 were defined prior to 1996, and have
never changed since then. I think you could copy them into whatever other
file (idl?) needs to know them. I wouldn't expend too much effort trying
to detect a change in those values. I predict they will never change in
the lifetime of NSS.
Comment 28•18 years ago
|
||
Comment on attachment 291440 [details] [diff] [review]
Patch v5
remove obsolte review request.
Attachment #291440 -
Flags: review?(rrelyea)
You need to log in
before you can comment on or make changes to this bug.
Description
•