Open
Bug 807250
Opened 11 years ago
Updated 3 months ago
SSL data race with ssl_GetPrivate vs. ssl_DefRecv
Categories
(NSS :: Libraries, defect, P2)
NSS
Libraries
Tracking
(Not tracked)
NEW
3.16.1
People
(Reporter: posidron, Unassigned)
Details
(Whiteboard: [tsan])
Attachments
(5 files, 1 obsolete file)
8.14 KB,
text/plain
|
Details | |
1.78 KB,
patch
|
ryan.sleevi
:
review+
wtc
:
checked-in-
|
Details | Diff | Splinter Review |
97.78 KB,
application/octet-stream
|
Details | |
1.49 KB,
patch
|
ryan.sleevi
:
review+
wtc
:
checked-in+
|
Details | Diff | Splinter Review |
8.68 KB,
patch
|
Details | Diff | Splinter Review |
security/nss/lib/ssl/sslsock.c:230 ss = (sslSocket *)fd->secret; ss->fd = fd; security/nss/lib/ssl/ssldef.c:58 PRFileDesc *lower = ss->fd->lower; Tested with m-c changeset: 111684:e19e170d2f6d
Comment 1•11 years ago
|
||
ssl_GetPrivate and ssl_FindSocket both end with an assignment to ss->fd. We should examine the code to see how we can fix this.
Assignee: nobody → nobody
Component: Security: PSM → Libraries
Product: Core → NSS
Target Milestone: --- → 3.14.1
Version: Trunk → trunk
Updated•10 years ago
|
Target Milestone: 3.14.1 → 3.14.2
Updated•10 years ago
|
Target Milestone: 3.14.2 → 3.14.3
Comment 2•9 years ago
|
||
Ryan just reported a similar problem in ssl_FindSocket in https://code.google.com/p/chromium/issues/detail?id=330360. I reviewed the code in nss/lib/ssl/sslsock.c. I believe it is not necessary for ssl_GetPrivate and ssl_FindSocket to set the ss->fd member because ss-fd should have been set by ssl_PushIOLayer. This patch replaces the ss-fd assignments by assertions. It also replaces an unnecessary ssl_FindSocket call in ssl_ImportFD with an assertion. If I understand the code correctly, these assertions are not necessary. In a future NSS release, we can remove these assertions.
Updated•9 years ago
|
Severity: major → minor
OS: Linux → All
Priority: -- → P2
Hardware: x86 → All
Target Milestone: 3.14.3 → 3.15.5
Comment 3•9 years ago
|
||
Comment on attachment 8355383 [details] [diff] [review] Patch (does not work) Review of attachment 8355383 [details] [diff] [review]: ----------------------------------------------------------------- r+ ryan.sleevi
Attachment #8355383 -
Flags: review?(ryan.sleevi) → review+
Comment 4•9 years ago
|
||
Comment on attachment 8355383 [details] [diff] [review] Patch (does not work) Review of attachment 8355383 [details] [diff] [review]: ----------------------------------------------------------------- Patch checked in: https://hg.mozilla.org/projects/nss/rev/4a1849df5e3e The three assertions added in this patch can be removed after 2014-06-30.
Attachment #8355383 -
Flags: checked-in+
Updated•9 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 5•9 years ago
|
||
I think we may need to back this out of NSS 3.15.5 because this assertion in ssl_GetPrivate fails in Gecko: PORT_Assert(ss->fd == fd);
> nss3.dll!ssl_GetPrivate(PRFileDesc * fd) Line 155 C
nss3.dll!ssl_Connect(PRFileDesc * fd, const PRNetAddr * sockaddr, unsigned int timeout) Line 1911 C
xul.dll!nsSSLIOLayerConnect(PRFileDesc * fd, const PRNetAddr * addr, unsigned int timeout) Line 691 C++
nss3.dll!PR_Connect(PRFileDesc * fd, const PRNetAddr * addr, unsigned int timeout) Line 155 C
xul.dll!nsSocketTransport::InitiateSocket() Line 1309 C++
xul.dll!nsSocketTransport::OnSocketEvent(unsigned int type, tag_nsresult status, nsISupports * param) Line 1680 C++
xul.dll!nsSocketEvent::Run() Line 69 C++
xul.dll!nsThread::ProcessNextEvent(bool mayWait, bool * result) Line 637 C++
xul.dll!NS_ProcessNextEvent(nsIThread * thread, bool mayWait) Line 263 C++
xul.dll!nsSocketTransportService::Run() Line 691 C++
xul.dll!nsThread::ProcessNextEvent(bool mayWait, bool * result) Line 637 C++
xul.dll!NS_ProcessNextEvent(nsIThread * thread, bool mayWait) Line 263 C++
xul.dll!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate * aDelegate) Line 301 C++
xul.dll!MessageLoop::RunInternal() Line 227 C++
xul.dll!MessageLoop::RunHandler() Line 220 C++
xul.dll!MessageLoop::Run() Line 194 C++
xul.dll!nsThread::ThreadFunc(void * arg) Line 265 C++
nss3.dll!_PR_NativeRunThread(void * arg) Line 397 C
nss3.dll!pr_root(void * arg) Line 90 C
[External Code]
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 6•9 years ago
|
||
This minidump of the assertion failure was created with VS2013 with a VS2013-generated build. The assertion failure occurred when I ran "mach xpcshell-test security/manager/ssl/tests/unit" and the test where the failure occured was a modified version of test_ev_certs.js. I will try to reproduce again.
Comment 7•9 years ago
|
||
Comment on attachment 8355383 [details] [diff] [review] Patch (does not work) Backed out: http://hg.mozilla.org/projects/nss/rev/ab1e7334bde6 The test failures in Firefox reproduced repeatedly: hg clone https://hg.mozilla.org/integration/mozilla-inbound src cd src mach build mach xpcshell-test security/manager/ssl/tests/unit I used mozilla-inbound revision 9380208b53cf.
Attachment #8355383 -
Flags: checked-in+
Comment 8•9 years ago
|
||
The assertion also fails when Gecko/NSS are build with VS2010SP1.
Comment 9•9 years ago
|
||
Comment on attachment 8355383 [details] [diff] [review] Patch (does not work) Thank you for reverting this patch. I can reproduce the assertion failure by running Firefox and visiting any HTTPS website. The assertion failure is independent of platforms and compilers. It is caused by PR_PushIOLayer switching the contents of the PRFileDesc at the top and the PRFileDesc to be pushed onto the top. Since ss->fd is a back pointer to the PRFileDesc for the SSL layer, if PR_PushIOLayer is called to push another PRFileDesc layer on top of SSL, the ss->fd back pointer will become invalid, pointing to the new top layer. So the lazy setting of ss->fd in ssl_GetPrivate and ssl_FindSocket is actually necessary to defend against this switching of PRFileDesc contents. The CVS history shows this code is in revision 1.1, so it is not clear whether the code was intended to do this. I will add a comment to document this. I've experimented with pushing either a dummy layer or the special NSPR PR_IO_LAYER_HEAD layer (which indicates a "new style" PRFileDesc stack) to prevent PR_PushIOLayer from switching the contents of PRFileDescs. But I found existing code in Mozilla that assumes the current behavior of PR_PushIOLayer on "old style" PRFileDesc stacks, so these approaches won't work. The remaining solutions are: 1. Remove the ss->fd back pointer. Modify all the functions that take "ss" to take either just "fd" ("ss" is just fd->secret with a typecast) or both "ss" and "fd". This is the best solution but will be a tedious, bug patch. 2. Add a lock or call-once function to protect the setting and probably also reading of ss->fd. I haven't thought much about this approach.
Attachment #8355383 -
Attachment description: Patch → Patch (does not work)
Attachment #8355383 -
Flags: checked-in-
Comment 10•9 years ago
|
||
Unfortunately I don't have time to change the many functions that take "ss" to also take "fd", so I'll add a comment to document the problem.
Attachment #8359929 -
Flags: superreview?(brian)
Attachment #8359929 -
Flags: review?(ryan.sleevi)
Comment 11•9 years ago
|
||
Attachment #8359929 -
Attachment is obsolete: true
Attachment #8359929 -
Flags: superreview?(brian)
Attachment #8359929 -
Flags: review?(ryan.sleevi)
Attachment #8359930 -
Flags: superreview?(brian)
Attachment #8359930 -
Flags: review?(ryan.sleevi)
Updated•9 years ago
|
Status: REOPENED → ASSIGNED
Target Milestone: 3.15.5 → 3.16
Comment 12•9 years ago
|
||
(In reply to Wan-Teh Chang from comment #10) > Unfortunately I don't have time to change the many functions that take > "ss" to also take "fd", so I'll add a comment to document the problem. My understanding is that "correct" change is to remove the "PRFileDesc * fd;" member of sslSocketStr in sslt.h, and then add a PRFileDesc * fd parameter to every place that is currently accessing ss->fd. Unfortunately, there are a lot of SSL_DBG/SSL_TRC calls that access ss->fd which means many functions access ss->fd. Here's an alternate solution: 1. Replace the "PRFileDesc * fd" member with "void * fd". 2. Add a PRFileDesc * fd argument to all the functions that fail to compile and/or produce warnings because they actually need fd to be a PRFileDesc. This would be all the non-logging calls only. Does this seem reasonable? We may be able to find somebody to make this change. It is concerning that Gecko has this data race in the first place though. Patrick, see the stack trace. This seems like a(nother) case of badness caused by networking code accessing sockets on the main thread. Is there another way we can do this in Necko?
Flags: needinfo?(mcmanus)
Comment 13•9 years ago
|
||
This patch is a prototype of a solution based on adding a dummy higher layer that ensures the SSL layer's PRFileDesc has stable contents. We need to deal with two complications. 1. Some Mozilla code assumes the |fd| argument passed to their libSSL callback functions is exactly one layer below their own layer. So it is necessary to pass the dummy higher layer (ss->fd->higher) as the |fd| argument. 2. When a stack of PRFileDescs is being closed, PR_PopIOLayer will be called, which will switch the contents of PRFileDescs and invalidate ss->fd. So ssl_Close cannot call ssl_GetPrivate because the ss->fd == fd assertion will fail. It needs to calculate |ss| manually and update ss->fd.
Updated•9 years ago
|
Attachment #8359930 -
Flags: review?(ryan.sleevi) → review+
Comment 14•9 years ago
|
||
Comment on attachment 8359930 [details] [diff] [review] [Correct patch] Document why ssl_GetPrivate and ssl_FindSocket need to set ss->fd lazily Review of attachment 8359930 [details] [diff] [review]: ----------------------------------------------------------------- Patch checked in: https://hg.mozilla.org/projects/nss/rev/622364c673e8
Attachment #8359930 -
Flags: checked-in+
Updated•9 years ago
|
Target Milestone: 3.16 → 3.16.1
Comment 15•9 years ago
|
||
(In reply to Brian Smith (:briansmith, was :bsmith; NEEDINFO? for response) from comment #12) > > > It is concerning that Gecko has this data race in the first place though. > Patrick, see the stack trace. This seems like a(nother) case of badness > caused by networking code accessing sockets on the main thread. Is there > another way we can do this in Necko? This kind of got lost in my backlog - sorry about that. It looks both easy to do for the case of Peer and Local Addresses (the socket thread can cache these in read-only data structures outside of PSM).. but it looks kind of moot based on comment 14. let me know if its still desired.
Flags: needinfo?(mcmanus)
Updated•8 years ago
|
Attachment #8359930 -
Flags: superreview?(brian)
Updated•6 months ago
|
Severity: minor → S4
Comment 16•3 months ago
|
||
The bug assignee is inactive on Bugzilla, so the assignee is being reset.
Assignee: wtc → nobody
Status: ASSIGNED → NEW
You need to log in
before you can comment on or make changes to this bug.
Description
•