Closed Bug 378241 Opened 18 years ago Closed 18 years ago

Changes from bug 107491 lead to tons of thread-safety asserts

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: bzbarsky, Assigned: benjamin)

References

Details

Attachments

(5 files, 1 obsolete file)

Move ErrorAccordingToNSPR out of locks and sync-proxy to the main thread, rev. 1 18 years ago Benjamin Smedberg 8.36 KB, patch		Details \| Diff \| Splinter Review
proposed rev. 2 18 years ago Kai Engert [:KaiE:] 8.82 KB, patch		Details \| Diff \| Splinter Review
Proposed rev. 3 18 years ago Benjamin Smedberg 8.39 KB, patch		Details \| Diff \| Splinter Review
alternative attempt v4 18 years ago Kai Engert [:KaiE:] 21.85 KB, patch		Details \| Diff \| Splinter Review
Patch v5 18 years ago Kai Engert [:KaiE:] 5.29 KB, patch	benjamin : superreview+	Details \| Diff \| Splinter Review
Patch v6 18 years ago Kai Engert [:KaiE:] 5.33 KB, patch	rrelyea : review+ wtc : review+ KaiE : superreview+	Details \| Diff \| Splinter Review

Boris Zbarsky [:bzbarsky]

Reporter

Description

•

18 years ago

STEPS TO REPRODUCE: 1) Compile Seamonkey (don't know how to trigger this code in Firefox, though I'm sure there's a way... in any case, the app is not relevant here). 2) Type "84582" in the URL bar 3) Hit enter. EXPECTED RESULTS: Error page ACTUAL RESULT: Tons of thread-safety asserts. For the first assert: ###!!! ASSERTION: Using observer service off the main thread!: 'Error', file ../../../mozilla/xpcom/ds/nsObserverService.cpp, line 130 the issue is that nsSocketTransport::InitiateSocket ends up calling ErrorAccordingToNSPR from a non-main thread (perfectly OK). But ErrorAccordingToNSPR now gets the nsINSSErrorService, which leads to PSM component init. And that code assumes it's on the main thread. In particular, nsNSSComponent::RegisterObservers is called by nsNSSComponent::Init. Then this same code calls nsNSSComponent::GetPIPNSSBundleString, which touches the chrome registry from off-thread, and leads to: ###!!! ASSERTION: wrong thread: 'NS_IsMainThread()', file ../../../../mozilla/netwerk/base/src/nsIOService.cpp, line 488 as we try to open channels on a background thread to load stringbundles. And so forth. The only approach I see here offhand is to create an nsIRunnable impl that makes sure PSM init happens and to sync-proxy that Run() method to the main thread...

Flags: blocking1.9?

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

18 years ago

Blocks: 107491

Benjamin Smedberg

Assignee

Comment 1

•

18 years ago

The solution is to use a sync-proxy to get the service.

Assignee: kengert → nobody

Component: Security: PSM → Networking

Flags: blocking1.9? → blocking1.9+

OS: Linux → All

QA Contact: psm → networking

Hardware: PC → All

Benjamin Smedberg

Assignee

Updated

•

18 years ago

Assignee: nobody → dcamp

Robert Sayre

Updated

•

18 years ago

Priority: -- → P1

Mike Schroepfer

Comment 2

•

18 years ago

Benjamin can you help out here?

Assignee: dcamp → benjamin

Benjamin Smedberg

Assignee

Comment 3

•

18 years ago

Attached patch Move ErrorAccordingToNSPR out of locks and sync-proxy to the main thread, rev. 1 (obsolete) — Details — Splinter Review

Attachment #288687 - Flags: review?(cbiesinger)

Kai Engert [:KaiE:]

Comment 4

•

18 years ago

Benjamin, thanks a lot for working on this bug and your idea with the runnable. Question regarding nsresult temprv = NS_DispatchToMainThread(); If temprv indicates whether the dispatching, starting the runner worked, then I'm ok with adding the NS_WARNING. GetXPCOMFromNSSError gets called without knowing whether it's an NSS error. If GetXPCOMFromNSSError returns a failure (indicating it's unable to convert), that's not a failure for function ErrorAccordingToNSPR. It's just supposed to mean "ok, it's not an NSS error, so let's use the default error code set in ErrorAccordingToNSPR". So instead of the current try to convert, if it worked, then set rv to the conversion result you do try to convert, it the conversion fails, the converter will update rv to NS_ERROR_FAILURE. This duplicates ErrorAccordingToNSPR default error code in the runnable. Maybe that's not a big deal, but if we wanted to stay closer to the current behavior and keep control over the default value in ErrorAccordingToNSPR, we could change the patch to only update rv if the runnable indicates a successful conversion.

Kai Engert [:KaiE:]

Comment 5

•

18 years ago

Attached patch proposed rev. 2 — Details — Splinter Review

Benjamin, are you ok with this modified patch?

Benjamin Smedberg

Assignee

Comment 6

•

18 years ago

Attached patch Proposed rev. 3 — Details — Splinter Review

How about this? It avoids the extra member and IMO is a little easier to read.

Attachment #288998 - Flags: review?(kengert)

Kai Engert [:KaiE:]

Comment 7

•

18 years ago

Yes, I'm now happy with NSSErrorRunnable and ErrorAccordingToNSPR. I went on to review your other changes in the file, your attempt to move the call to ErrorAccordingToNSPR out of the locks. I think in nsSocketInputStream::Available you want to revert the if test. I think you want if (n < 0) ### mCondition = ErrorAccordingToNSPR(code); But I'm worried, because you'll update mCondition without holding the lock. I don't know if mCondition is supposed to be protected by the lock, but it shoulds like it should? This all brings me to a completely different proposal. I think GetXPCOMFromNSSError is really a trivial function. While it needs to have access to NSS headers, and therefore must reside in PSM, it doesn't need access to any state (besides the parameters). I understand the problem comes from running through the NSS init service (which registers observers). I think it should be fine to implement GetXPCOMFromNSSError without triggering NSS init. At least the following functions can be decoupled from nsNSSComponent: nsINSSErrorsService::isNSSErrorCode nsINSSErrorsService::getXPCOMFromNSSError nsINSSErrorsService::getErrorClass We could also decouple nsINSSErrorsService::getErrorMessage although that has some more dependencies, but it should be ok to have another string bundle object. Let me try something.

Kai Engert [:KaiE:]

Comment 8

•

18 years ago

Attached patch alternative attempt v4 — Details — Splinter Review

This no longer requires init of NSS/PSM component when calling ErrorAccordingToNSPR. Maybe the original test case still triggers the assertion, we'll have to try that. If it does, the assertions might also get triggered by docshell when it attempts to obtain an error message... Not sure if that code is reached in your test case. If it does, and we must still isolate it, I think the problem is now much more simple. We could use Benjamin's approach only for getting the error string in docshell. This would not require changing the scope of locks. I hope.

Benjamin Smedberg

Assignee

Comment 9

•

18 years ago

The rule (though it may be unwritten) is that you must not call any XPCOM method while holding a lock. If this were just a direct call into the NSS error provider I might not worry about it... but in this case we're calling into the component manager first, which does locking which could potentially deadlock. If we want a solution like this I think we at least need to cache the NSS error provider so that we don't have to enter the component manager. :-(

Mike Schroepfer

Comment 10

•

18 years ago

Just wanted to check in on this - we are comming up on M10/b2 and it would be great to wrap this up before then...

Benjamin Smedberg

Assignee

Comment 11

•

18 years ago

Comment on attachment 288687 [details] [diff] [review] Move ErrorAccordingToNSPR out of locks and sync-proxy to the main thread, rev. 1 I don't know whether rev. 3 is acceptable or not... it sets the status outside of the lock, which may not be acceptable. If that's the case, I think we're going to have to cache the NSS error reporter object somewhere.

Attachment #288687 - Attachment is obsolete: true

Attachment #288687 - Flags: review?(cbiesinger)

Robert Relyea

Comment 12

•

18 years ago

re mCondition. I quick look at the code shows that there are tons of places mCondition is set and reverenced without lock protection. (OnSocketReady() RecoverFromError(), OnSocketEvent() OnMsgInputClosed(), OnMsgOutputClosed()). The only race I see is the potential for a partial setting of mCondition (that is the write to mCondition isn't atomic). That could be handled by using a local to read ErrorAccordingToNSPR(), and protecting the write with a lock or an PR_AtomicWrite() call. This is only useful if all the reference calls are protected by locks or atomicRead calls as well, however.

Benjamin Smedberg

Assignee

Comment 13

•

18 years ago

32-bit writes are atomic on all processors that I'm aware of, so I think we're safe in that regard.

Benjamin Smedberg

Assignee

Updated

•

18 years ago

Attachment #288998 - Flags: review?(dcamp)

Kai Engert [:KaiE:]

Comment 14

•

18 years ago

I want to take a step back. I think we are over-engineering here. This whole problem only exists because mozilla/netwerk is unaware of 4 numeric constants, which are defined in mozilla/security/nss: #define SEC_ERROR_BASE (-0x2000) #define SEC_ERROR_LIMIT (SEC_ERROR_BASE + 1000) #define SSL_ERROR_BASE (-0x3000) #define SSL_ERROR_LIMIT (SSL_ERROR_BASE + 1000) In my very first proposed patch, I duplicated the numerical constants into function ErrorAccordingToNSPR. That was seen as "not elegant". In a second step I had proposed to use a mechanism at the NSPR level, where we could register the error code ranges used by NSS/libSSL. But that was seen as unwanted, because NSPR and NSS are separate... I think if the lack of 4 numeric constants leads to threading, locking, and order-of-init problems, I think we have gone too far. Let's find a way to make mozilla/netwerk or mozilla/nsprpub aware of the range of error codes that are used by NSS and libSSL, and we're done.

Kai Engert [:KaiE:]

Comment 15

•

18 years ago

(In reply to comment #14) > > Let's find a way to make mozilla/netwerk or mozilla/nsprpub aware of the range > of error codes that are used by NSS and libSSL, and we're done. ... without using XPCOM, compile time knowledge ...

Kai Engert [:KaiE:]

Comment 16

•

18 years ago

We could hard code the constants into mozilla/netwerk/base/public/nsINSSErrorsService.idl We could make mozilla/netwerk use these constants directly. In order to ensure the duplicated constants will not get out of synch between netwerk and security, we could do a run time check. When PSM inits, it could compare the values available in nsINSSErrorsService.idl with the values defined by NSS. If PSM discovers a difference, it will assert. This will assure that we update the netwerk code should the values used by NSS ever change.

Benjamin Smedberg

Assignee

Comment 17

•

18 years ago

Comment on attachment 288998 [details] [diff] [review] Proposed rev. 3 That sounds like a great solution to me... can you make a patch or should I?

Attachment #288998 - Flags: review?(kengert)

Attachment #288998 - Flags: review?(dcamp)

Kai Engert [:KaiE:]

Comment 18

•

18 years ago

Attached patch Patch v5 — Details — Splinter Review

Bob, Benjamin, what do you think?

Attachment #291440 - Flags: superreview?(benjamin)

Attachment #291440 - Flags: review?(rrelyea)

Benjamin Smedberg

Assignee

Comment 19

•

18 years ago

Comment on attachment 291440 [details] [diff] [review] Patch v5 I think you should use PR_STATIC_ASSERT instead of NS_ASSERTION... sr=me with that change http://mxr.mozilla.org/mozilla/source/nsprpub/pr/include/prlog.h#259

Attachment #291440 - Flags: superreview?(benjamin) → superreview+

Christian :Biesinger (don't email me, ping me on IRC)

Comment 20

•

18 years ago

Comment on attachment 291440 [details] [diff] [review] Patch v5 +isNSSErrorCode(PRErrorCode code) uppercase start letter please and can you linewrap the lines after the && operators? +#define NSS_NSPRCODE_TO_XPCOMCODE(nspr_code) \ how about making this an inline function?

Kai Engert [:KaiE:]

Comment 21

•

18 years ago

Attached patch Patch v6 — Details — Splinter Review

Addressed comments from Benjamin and Biesi. Carrying forward sr=benjamin

Attachment #291445 - Flags: superreview+

Attachment #291445 - Flags: review?(rrelyea)

Wan-Teh Chang

Comment 22

•

18 years ago

Comment on attachment 291445 [details] [diff] [review] Patch v6 r=wtc. It seems better to do the PR_STATIC ASSERT the first thing in nsNSSComponent::InitializeNSS.

Attachment #291445 - Flags: review+

Robert Relyea

Comment 23

•

18 years ago

Comment on attachment 291445 [details] [diff] [review] Patch v6 r+, with some notes: NOTE: These error limits are compile time in both cases. Basically it means PSM and XPCOM match, which is fine. If it were a real runtime check with NSS, I would have to give an r-. The primary reason for non-approval of a true run-time check would be PSM should be able to run with new versions of NSS installed. On some platforms NSS is updated independently of mozilla, it shouldn't fail to start simply because a newer version of NSS is installed. Since translation of the NSS errors into strings is done by PSM, the important value here isn't what errors NSS can return, but which NSS errors PSM understands, so a compile type check in PSM is sufficient, thus the r+. bob

Attachment #291445 - Flags: review?(rrelyea) → review+

Damon Sicore (:damons)

Comment 24

•

18 years ago

Is this going to land by tonight in time for the B2 freeze?

Wan-Teh Chang

Comment 25

•

18 years ago

Comment on attachment 291445 [details] [diff] [review] Patch v6 Bob, just to clarify -- SEC_ERROR_LIMIT means the upper end of the range of error codes that NSS has reserved. It doesn't mean the current maximum NSS SEC error code. So SEC_ERROR_LIMIT is unlikely to change. Right now SEC_ERROR_LIMIT equals SEC_ERROR_BASE + 1000.

Kai Engert [:KaiE:]

Comment 26

•

18 years ago

fixed

Status: NEW → RESOLVED

Closed: 18 years ago

Resolution: --- → FIXED

Nelson Bolyard (seldom reads bugmail)

Comment 27

•

18 years ago

The 4 constants shown in comment 14 were defined prior to 1996, and have never changed since then. I think you could copy them into whatever other file (idl?) needs to know them. I wouldn't expend too much effort trying to detect a change in those values. I predict they will never change in the lifetime of NSS.

Robert Relyea

Comment 28

•

18 years ago

Comment on attachment 291440 [details] [diff] [review] Patch v5 remove obsolte review request.

Attachment #291440 - Flags: review?(rrelyea)

You need to log in before you can comment on or make changes to this bug.