Open Bug 807250 Opened 12 years ago Updated 1 year ago

SSL data race with ssl_GetPrivate vs. ssl_DefRecv

Categories

(NSS :: Libraries, defect, P2)

Tracking

(Not tracked)

3.16.1

People

(Reporter: posidron, Unassigned)

Details

(Whiteboard: [tsan])

Attachments

(5 files, 1 obsolete file)

Attached file callstack
security/nss/lib/ssl/sslsock.c:230
    ss = (sslSocket *)fd->secret;
    ss->fd = fd;

security/nss/lib/ssl/ssldef.c:58
    PRFileDesc *lower = ss->fd->lower;


Tested with m-c changeset: 111684:e19e170d2f6d
ssl_GetPrivate and ssl_FindSocket both end with an assignment to ss->fd.
We should examine the code to see how we can fix this.
Assignee: nobody → nobody
Component: Security: PSM → Libraries
Product: Core → NSS
Target Milestone: --- → 3.14.1
Version: Trunk → trunk
Target Milestone: 3.14.1 → 3.14.2
Target Milestone: 3.14.2 → 3.14.3
Ryan just reported a similar problem in ssl_FindSocket in
https://code.google.com/p/chromium/issues/detail?id=330360.

I reviewed the code in nss/lib/ssl/sslsock.c. I believe it is
not necessary for ssl_GetPrivate and ssl_FindSocket to set the
ss->fd member because ss-fd should have been set by ssl_PushIOLayer.

This patch replaces the ss-fd assignments by assertions. It also
replaces an unnecessary ssl_FindSocket call in ssl_ImportFD with
an assertion. If I understand the code correctly, these assertions
are not necessary. In a future NSS release, we can remove these
assertions.
Assignee: nobody → wtc
Status: NEW → ASSIGNED
Attachment #8355383 - Flags: review?(ryan.sleevi)
Severity: major → minor
OS: Linux → All
Priority: -- → P2
Hardware: x86 → All
Target Milestone: 3.14.3 → 3.15.5
Comment on attachment 8355383 [details] [diff] [review]
Patch (does not work)

Review of attachment 8355383 [details] [diff] [review]:
-----------------------------------------------------------------

r+ ryan.sleevi
Attachment #8355383 - Flags: review?(ryan.sleevi) → review+
Comment on attachment 8355383 [details] [diff] [review]
Patch (does not work)

Review of attachment 8355383 [details] [diff] [review]:
-----------------------------------------------------------------

Patch checked in: https://hg.mozilla.org/projects/nss/rev/4a1849df5e3e

The three assertions added in this patch can be removed after 2014-06-30.
Attachment #8355383 - Flags: checked-in+
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
I think we may need to back this out of NSS 3.15.5 because this assertion in ssl_GetPrivate fails in Gecko: PORT_Assert(ss->fd == fd);

>	nss3.dll!ssl_GetPrivate(PRFileDesc * fd) Line 155	C
 	nss3.dll!ssl_Connect(PRFileDesc * fd, const PRNetAddr * sockaddr, unsigned int timeout) Line 1911	C
 	xul.dll!nsSSLIOLayerConnect(PRFileDesc * fd, const PRNetAddr * addr, unsigned int timeout) Line 691	C++
 	nss3.dll!PR_Connect(PRFileDesc * fd, const PRNetAddr * addr, unsigned int timeout) Line 155	C
 	xul.dll!nsSocketTransport::InitiateSocket() Line 1309	C++
 	xul.dll!nsSocketTransport::OnSocketEvent(unsigned int type, tag_nsresult status, nsISupports * param) Line 1680	C++
 	xul.dll!nsSocketEvent::Run() Line 69	C++
 	xul.dll!nsThread::ProcessNextEvent(bool mayWait, bool * result) Line 637	C++
 	xul.dll!NS_ProcessNextEvent(nsIThread * thread, bool mayWait) Line 263	C++
 	xul.dll!nsSocketTransportService::Run() Line 691	C++
 	xul.dll!nsThread::ProcessNextEvent(bool mayWait, bool * result) Line 637	C++
 	xul.dll!NS_ProcessNextEvent(nsIThread * thread, bool mayWait) Line 263	C++
 	xul.dll!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate * aDelegate) Line 301	C++
 	xul.dll!MessageLoop::RunInternal() Line 227	C++
 	xul.dll!MessageLoop::RunHandler() Line 220	C++
 	xul.dll!MessageLoop::Run() Line 194	C++
 	xul.dll!nsThread::ThreadFunc(void * arg) Line 265	C++
 	nss3.dll!_PR_NativeRunThread(void * arg) Line 397	C
 	nss3.dll!pr_root(void * arg) Line 90	C
 	[External Code]
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached file minidump
This minidump of the assertion failure was created with VS2013 with a VS2013-generated build. The assertion failure occurred when I ran "mach xpcshell-test security/manager/ssl/tests/unit" and the test where the failure occured was a modified version of test_ev_certs.js. I will try to reproduce again.
Comment on attachment 8355383 [details] [diff] [review]
Patch (does not work)

Backed out: http://hg.mozilla.org/projects/nss/rev/ab1e7334bde6

The test failures in Firefox reproduced repeatedly:
hg clone https://hg.mozilla.org/integration/mozilla-inbound src
cd src
mach build
mach xpcshell-test security/manager/ssl/tests/unit

I used mozilla-inbound revision 9380208b53cf.
Attachment #8355383 - Flags: checked-in+
The assertion also fails when Gecko/NSS are build with VS2010SP1.
Comment on attachment 8355383 [details] [diff] [review]
Patch (does not work)

Thank you for reverting this patch. I can reproduce the assertion failure
by running Firefox and visiting any HTTPS website. The assertion failure is
independent of platforms and compilers. It is caused by PR_PushIOLayer switching
the contents of the PRFileDesc at the top and the PRFileDesc to be pushed
onto the top. Since ss->fd is a back pointer to the PRFileDesc for the SSL
layer, if PR_PushIOLayer is called to push another PRFileDesc layer on top
of SSL, the ss->fd back pointer will become invalid, pointing to the new
top layer. So the lazy setting of ss->fd in ssl_GetPrivate and ssl_FindSocket
is actually necessary to defend against this switching of PRFileDesc contents.
The CVS history shows this code is in revision 1.1, so it is not clear
whether the code was intended to do this. I will add a comment to document
this.

I've experimented with pushing either a dummy layer or the special NSPR
PR_IO_LAYER_HEAD layer (which indicates a "new style" PRFileDesc stack)
to prevent PR_PushIOLayer from switching the contents of PRFileDescs.
But I found existing code in Mozilla that assumes the current behavior
of PR_PushIOLayer on "old style" PRFileDesc stacks, so these approaches
won't work.

The remaining solutions are:

1. Remove the ss->fd back pointer. Modify all the functions that take
"ss" to take either just "fd" ("ss" is just fd->secret with a typecast)
or both "ss" and "fd". This is the best solution but will be a tedious,
bug patch.

2. Add a lock or call-once function to protect the setting and probably
also reading of ss->fd. I haven't thought much about this approach.
Attachment #8355383 - Attachment description: Patch → Patch (does not work)
Attachment #8355383 - Flags: checked-in-
Unfortunately I don't have time to change the many functions that take
"ss" to also take "fd", so I'll add a comment to document the problem.
Attachment #8359929 - Flags: superreview?(brian)
Attachment #8359929 - Flags: review?(ryan.sleevi)
Attachment #8359929 - Attachment is obsolete: true
Attachment #8359929 - Flags: superreview?(brian)
Attachment #8359929 - Flags: review?(ryan.sleevi)
Attachment #8359930 - Flags: superreview?(brian)
Attachment #8359930 - Flags: review?(ryan.sleevi)
Status: REOPENED → ASSIGNED
Target Milestone: 3.15.5 → 3.16
(In reply to Wan-Teh Chang from comment #10)
> Unfortunately I don't have time to change the many functions that take
> "ss" to also take "fd", so I'll add a comment to document the problem.

My understanding is that "correct" change is to remove the "PRFileDesc * fd;" member of sslSocketStr in sslt.h, and then add a PRFileDesc * fd parameter to every place that is currently accessing ss->fd. Unfortunately, there are a lot of SSL_DBG/SSL_TRC calls that access ss->fd which means many functions access ss->fd.

Here's an alternate solution:
1. Replace the "PRFileDesc * fd" member with "void * fd".
2. Add a PRFileDesc * fd argument to all the functions that fail to compile and/or produce warnings because they actually need fd to be a PRFileDesc. This would be all the non-logging calls only.

Does this seem reasonable? We may be able to find somebody to make this change.

It is concerning that Gecko has this data race in the first place though. Patrick, see the stack trace. This seems like a(nother) case of badness caused by networking code accessing sockets on the main thread. Is there another way we can do this in Necko?
Flags: needinfo?(mcmanus)
This patch is a prototype of a solution based on adding a dummy higher
layer that ensures the SSL layer's PRFileDesc has stable contents.

We need to deal with two complications.

1. Some Mozilla code assumes the |fd| argument passed to their libSSL
callback functions is exactly one layer below their own layer. So it
is necessary to pass the dummy higher layer (ss->fd->higher) as the |fd|
argument.

2. When a stack of PRFileDescs is being closed, PR_PopIOLayer will be
called, which will switch the contents of PRFileDescs and invalidate
ss->fd. So ssl_Close cannot call ssl_GetPrivate because the ss->fd == fd
assertion will fail. It needs to calculate |ss| manually and update
ss->fd.
Attachment #8359930 - Flags: review?(ryan.sleevi) → review+
Comment on attachment 8359930 [details] [diff] [review]
[Correct patch] Document why ssl_GetPrivate and ssl_FindSocket need to set ss->fd lazily

Review of attachment 8359930 [details] [diff] [review]:
-----------------------------------------------------------------

Patch checked in: https://hg.mozilla.org/projects/nss/rev/622364c673e8
Attachment #8359930 - Flags: checked-in+
Target Milestone: 3.16 → 3.16.1
(In reply to Brian Smith (:briansmith, was :bsmith; NEEDINFO? for response) from comment #12)
>
> 
> It is concerning that Gecko has this data race in the first place though.
> Patrick, see the stack trace. This seems like a(nother) case of badness
> caused by networking code accessing sockets on the main thread. Is there
> another way we can do this in Necko?

This kind of got lost in my backlog - sorry about that.

It looks both easy to do for the case of Peer and Local Addresses (the socket thread can cache these in read-only data structures outside of PSM).. but it looks kind of moot based on comment 14.

let me know if its still desired.
Flags: needinfo?(mcmanus)
Severity: minor → S4

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: wtc → nobody
Status: ASSIGNED → NEW
You need to log in before you can comment on or make changes to this bug.