Closed Bug 412456 Opened 14 years ago Closed 14 years ago

###!!! ASSERTION: nsSecureBrowserUIImpl not thread-safe

Categories

(Core :: Security: PSM, defect, P1)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: cbook, Assigned: KaiE)

Details

(Keywords: assertion, regression)

Attachments

(3 files)

###!!! ASSERTION: nsSecureBrowserUIImpl not thread-safe: '_mOwningThread.GetThad() == PR_GetCurrentThread()', file c:/Debug/mozilla/security/manager/boot/srnsSecureBrowserUIImpl.cpp, line 169

Steps to reproduce:
Created a new Firefox Profile
Installed IE View Extension
Browser Restart to enable this Extension
While Restart and restore of 3 Tabs (2 default Startpages for Trunk Build and AMO IE Homepage (https://addons.mozilla.org/en-US/firefox/addon/35) i was running into this Assertion
I'm getting this on linux whenever I load Bugzilla. I think it's a fairly recent regression, within the last week or two.
OS: Windows XP → All
Hardware: PC → All
Flags: blocking1.9?
Keywords: regression
Based on comment 2 I'd like to understand whether this is a bad issue or not
Flags: blocking1.9? → blocking1.9+
Priority: -- → P2
At the very least, this adds to debugging pain when a testcase is attached to bugzilla.
Here's a stack:

#0  nsSecureBrowserUIImpl::AddRef (this=0x1961e450) at ../../../../../source/mozilla/security/manager/boot/src/nsSecureBrowserUIImpl.cpp:167
#1  0x00317c6d in NS_TableDrivenQI (aThis=0x1961e450, entries=0x5e1208, aIID=@0xeddb3c0, aInstancePtr=0xb00a08fc) at nsISupportsImpl.cpp:49
#2  0x005d62db in nsSecureBrowserUIImpl::QueryInterface (this=0x1961e450, aIID=@0xeddb3c0, aInstancePtr=0xb00a08fc) at ../../../../../source/mozilla/security/manager/boot/src/nsSecureBrowserUIImpl.cpp:186
#3  0x0031410d in nsQueryInterface::operator() (this=0xb00a0914, aIID=@0xeddb3c0, answer=0xb00a08fc) at nsCOMPtr.cpp:47
#4  0x0ed7d318 in nsCOMPtr<nsISSLStatusProvider>::assign_from_qi (this=0xb00a0988, qi={mRawPtr = 0x1961e450}, aIID=@0xeddb3c0) at nsCOMPtr.h:1275
#5  0x0ed7d372 in nsCOMPtr<nsISSLStatusProvider>::nsCOMPtr (this=0xb00a0988, qi={mRawPtr = 0x1961e450}) at nsCOMPtr.h:645
#6  0x0ed7d390 in nsCOMPtr<nsISSLStatusProvider>::nsCOMPtr (this=0xb00a0988, qi={mRawPtr = 0x1961e450}) at nsCOMPtr.h:645
#7  0x0ed7bcc0 in nsNSSSocketInfo::SetNotificationCallbacks (this=0x10e4d2b0, aCallbacks=0x10e830d4) at ../../../../../source/mozilla/security/manager/ssl/src/nsNSSIOLayer.cpp:357
#8  0x092eb66e in nsSocketTransport::BuildSocket (this=0x11ad5210, fd=@0xb00a0c50, proxyTransparent=@0xb00a0c4c, usingSSL=@0xb00a0c48) at ../../../../source/mozilla/netwerk/base/src/nsSocketTransport2.cpp:1042
#9  0x092ec04f in nsSocketTransport::InitiateSocket (this=0x11ad5210) at ../../../../source/mozilla/netwerk/base/src/nsSocketTransport2.cpp:1110
#10 0x092ed4f7 in nsSocketTransport::OnSocketEvent (this=0x11ad5210, type=1, status=0, param=0x11a53df0) at ../../../../source/mozilla/netwerk/base/src/nsSocketTransport2.cpp:1426
#11 0x092ef856 in nsSocketEvent::Run (this=0x10ec6330) at ../../../../source/mozilla/netwerk/base/src/nsSocketTransport2.cpp:98
#12 0x00391ca8 in nsThread::ProcessNextEvent (this=0x6342f0, mayWait=1, result=0xb00a0dac) at ../../../source/mozilla/xpcom/threads/nsThread.cpp:510
#13 0x0031dab8 in NS_ProcessNextEvent_P (thread=0x6342f0, mayWait=1) at nsThreadUtils.cpp:227
#14 0x092f1418 in nsSocketTransportService::Run (this=0x867000) at ../../../../source/mozilla/netwerk/base/src/nsSocketTransportService2.cpp:562
#15 0x00391ca8 in nsThread::ProcessNextEvent (this=0x6342f0, mayWait=1, result=0xb00a0ecc) at ../../../source/mozilla/xpcom/threads/nsThread.cpp:510
#16 0x0031dab8 in NS_ProcessNextEvent_P (thread=0x6342f0, mayWait=1) at nsThreadUtils.cpp:227
#17 0x00391eb7 in nsThread::ThreadFunc (arg=0x6342f0) at ../../../source/mozilla/xpcom/threads/nsThread.cpp:254
#18 0x0055f360 in _pt_root (arg=0x635900) at ../../../../../source/mozilla/nsprpub/pr/src/pthreads/ptthread.c:221
#19 0x93142075 in _pthread_start ()
#20 0x93141f32 in thread_start ()

This is on a thread other than the main thread.
Assignee: nobody → kengert
Component: General → Security: PSM
QA Contact: general → psm
This is the third most common assertion on the top sites.
I think it's time to add a lock to nsSecureBrowserUIImpl and declare it as threadsafe.
Attached patch Patch v1Splinter Review
This patch declares the implementation as thread-safe.

The remaining portions of the patch add the locks to really make it so.

I'm using a lock whenever a member variable is read or updated.
I'm trying to keep the lock periods to the minimum, never hold the lock when calling the outside world.
Attachment #306555 - Flags: review?(rrelyea)
Flags: tracking1.9+ → blocking1.9+
Priority: P2 → P1
Comment on attachment 306555 [details] [diff] [review]
Patch v1

r-

In general it looks like you are handling access to the various interal states with locks, however, since the code was not originally thread-safe it makes it very difficult to determine if we have potential race conditions. In reviewing I've found a couple of things which may or may not be issues. Part of the resolution could either be code changes or explanations why this particular race is not an issue...


Issue 1) There appears to be a merge issue from the current trunk. This isn't a race condition (well maybe a check-in race). Places where the mWindow weak reference is made. The current code fetches a new window with a query context. I think that change and this one needs to be in sync (is it possible for the tmpWindow copy of the mWindow weak reference still has the weak reference characteristic of mWindow for instance?).

Issue 2: Are auto locks reentrant? If so, this is not an issue, but if they are not (NSPR PR_Locks are not for instance), then you have a potential deadlock in OnStateChange where you call ResetStateTracking while holding the lock.

Issue 3: Again in OnStateChange, you grap the current mTopLevel, (correctly with the lock), call ObtainEvenSink with it, then set the value holding the lock again. If it is possible for ObtainEventSink to come up with a different answer for temp_ToplevelEventSink in consecutive calls, then you have a race condition setting mToplevelEventSink.

Issue 4: Same type of race in UpdateSecurity state. You call it based on the setting of a global. It in turn modifies some globals. Without knowing what the invariance model is, it's impossible to say whether or not there is a race condition here.

In general the code does a good job of correctly referencing or setting known 'globals' under locks, but it's not clear that these are being set correctly in the race case without a clear definition of what the true invariants are.

Other than Issue 1 you may be able to resolve the other issue simply with documentation.

bob
Attachment #306555 - Flags: review?(rrelyea) → review-
Attached patch Patch v2Splinter Review
(In reply to comment #9)
> Issue 1) There appears to be a merge issue from the current trunk. This isn't a
> race condition (well maybe a check-in race). Places where the mWindow weak
> reference is made. The current code fetches a new window with a query context.
> I think that change and this one needs to be in sync (is it possible for the
> tmpWindow copy of the mWindow weak reference still has the weak reference
> characteristic of mWindow for instance?).

Indeed, this change was made very recently.
I'm attaching a new patch, which has the changes merged, but none of your other requests (yet) addressed.

If we dereference while holding the lock, it's fine to copy a reference to nsCOMPtr for our use. We'll use that object temporary for completing our current activity, and release the reference afterwards. If this causes the object to be freed, it's all good. Our next attempt to dereference the weak reference will do the right thing, whatever happened in between.

In other words, our tmpWindow is not a weak reference, it's a strong reference, and it must be that way.
Bugzilla's interdiff is insufficient again. If you'd like to see the difference between patch v1 and patch v2, download both files, and use tkdiff. (FYI: On Fedora Linux it's contained in the tkcvs package.)
(In reply to comment #9)
> Issue 2: Are auto locks reentrant? If so, this is not an issue, but if they are
> not (NSPR PR_Locks are not for instance), then you have a potential deadlock in
> OnStateChange where you call ResetStateTracking while holding the lock.

The underlying lock is a PRMonitor*.
I'm not using nsAutoLock, I'm using nsAutoMonitor.

nsAutoMonitor calls PR_EnterMonitor on contruction/scope entry and calls PR_ExitMonitor on destruction/scope exit, so we are reentrant and the code location you found is safe.


> Issue 3: Again in OnStateChange, you grap the current mTopLevel, (correctly
> with the lock), call ObtainEvenSink with it, then set the value holding the
> lock again. If it is possible for ObtainEventSink to come up with a different
> answer for temp_ToplevelEventSink in consecutive calls, then you have a race
> condition setting mToplevelEventSink.

I don't think there is a race.
I don't think we'll ever see parallel calls to OnStateChange on the same object.

Actually, the sink only ever gets set once for each instance of nsSecureBrowserUIImpl. We need to know the pointer to "our" window (that this tracker is associated to), and it won't change.

So even if there were a race, it wouldn't matter.

However, your question made me thinking. Should we ensure it's not necessary that OnStateChange is reentrant? Well, it has never been.

But I added a mechanism to detect unexpected reentrance.
Search for nsAutoAtomic and you'll understand.
In my testing this was never hit.

But this adds a bit of overhead.
I think we should avoid it.
(If you agree, I'll attach a follow up patch with nsAutoAtomic and the assertion and variable m...ReentranceDetection removed.)


> Issue 4: Same type of race in UpdateSecurity state. You call it based on the
> setting of a global. It in turn modifies some globals. Without knowing what the
> invariance model is, it's impossible to say whether or not there is a race
> condition here.

I reviewed the code, only the following two events can eventually result in a call to UpdateSecurityState:
- OnStateChange notification
- OnLocationChange notification

I used the above nsAutoAtomic mechanism to detect any races within this group of two functions. I didn't see a race in my testing. This is what I expected. The notifications for a given window come in serialized.
Attached patch Patch v3Splinter Review
Attachment #307505 - Flags: review?(rrelyea)
Attachment #307494 - Attachment description: Patch v2 → Patch v2 (proposed for check in)
Comment on attachment 307505 [details] [diff] [review]
Patch v3

Summary of resolutions to review issues.

Issue 1: this looks much safer, particularly given the previous code.

Issue 2: OK, PR_Monitors have the correct reentry semantic.

Issues 3 & 4: The guard you provide properly detects what would otherwise be race issues.

RE the performance issue:

Atomic operations should be realtively fast (certainly compared to acquisition of locks), and adding the check will instantly show up any threading issues rather than have races create unexpected UI issues without any visibility.

I personally think this is worth it, but it also depends on the frequency of calls to these functions and their sensitivity to performance issues.

If this patch produces noticeable performance problems from the user level, then the trade-off is not so good. I have a hard time believing it will, but will accept patch V2 if that is the case.

bob
Attachment #307505 - Flags: review?(rrelyea) → review+
Attachment #307494 - Attachment description: Patch v2 (proposed for check in) → Patch v2
Comment on attachment 307505 [details] [diff] [review]
Patch v3

ok, let's check in this one
Attachment #307505 - Attachment description: Patch v3 (with race testing, performance penalty) → Patch v3
fixed
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
I am updating MapInternaltoExternalState() in bug 822367 and am considering removing the static keyword from the method.  It was added here in Patch 1:
https://bugzilla.mozilla.org/attachment.cgi?id=306555&action=diff#mozilla/security/manager/boot/src/nsSecureBrowserUIImpl.h_sec4

Does anyone see an issue with this?
(In reply to Tanvi Vyas [:tanvi] from comment #17)
> I am updating MapInternaltoExternalState() in bug 822367 and am considering
> removing the static keyword from the method.  It was added here in Patch 1:
> https://bugzilla.mozilla.org/attachment.cgi?id=306555&action=diff#mozilla/
> security/manager/boot/src/nsSecureBrowserUIImpl.h_sec4
> 
> Does anyone see an issue with this?

If it seems reasonable to you, then it seems reasonable to me. Just have a PSM peer review your patch for bug 822367.
You need to log in before you can comment on or make changes to this bug.