Closed Bug 126289 Opened 24 years ago Closed 24 years ago

SSL client dies in handshake code

Categories

(NSS :: Libraries, defect, P1)

3.3.2
Sun
Solaris
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: julien.pierre, Assigned: wtc)

References

Details

Attachments

(1 file, 1 obsolete file)

This occurred running my httptest client with NSS 3.3.2 with double-handshake client auth . I have the core file. detected a multithreaded program t@64 (l@58) terminated by signal ABRT (Abort) Current function is PR_Assert 495 abort(); (dbx) where current thread: t@64 [1] __sigprocmask(0x0, 0xfdab1210, 0x0, 0x0, 0x0, 0x0), at 0xfec39bf0 [2] _resetsig(0xfec3c510, 0x0, 0x0, 0xfdab1d78, 0xfec4e000, 0x0), at 0xfec2e620 [3] _sigon(0xfdab1d78, 0xfec55990, 0x6, 0xfdab12e4, 0xfdab1d78, 0xfdab1328), at 0xfec2dd10 [4] _thrp_kill(0x0, 0x40, 0x6, 0xfec4e000, 0x40, 0xfebba428), at 0xfec30e84 [5] raise(0x6, 0x0, 0x0, 0xffffffff, 0xfebba394, 0xfee6b6dc), at 0xfeb49b08 [6] abort(0xfebb6000, 0x3f, 0xfebbd99c, 0xfebb9c78, 0x7b, 0xfeb91c94), at 0xfeb35124 =>[7] PR_Assert(s = 0xff316418 "!ssl_HaveXmitBufLock(ss)", file = 0xff316434 "sslsecur.c", ln = 123), line 495 in "prlog.c" [8] ssl_Do1stHandshake(ss = 0xd3dc750), line 123 in "sslsecur.c" [9] ssl_SecureSend(ss = 0xd3dc750, buf = 0xa2e68 "GET / HTTP/1.0\n\n", len = 16, flags = 0), line 1075 in "sslsecur.c" [10] ssl_Send(fd = 0x19c138, buf = 0xa2e68, len = 16, flags = 0, timeout = 360000000U), line 1192 in "sslsock.c" [11] PR_Send(fd = 0x19c138, buf = 0xa2e68, amount = 16, flags = 0, timeout = 360000000U), line 221 in "priometh.c" [12] SunRequest::send(this = 0x1c21c8, sock = 0x19c138), line 308 in "request.cpp" [13] SunEngine::makeRequest(this = 0xfdab1b64, request = CLASS, server = CLASS, ignoresenderror = 1), line 413 in "engine.cpp" [14] SunTest::run(this = 0xd3758), line 454 in "tests.cpp" [15] TestInstance::execute_test(this = 0xd3708), line 152 in "tests.cpp" [16] thread_entry(arg = (nil)), line 1684 in "tests.cpp" [17] _pt_root(arg = 0xf1380), line 214 in "ptthread.c" (dbx)
Priority: -- → P1
Target Milestone: --- → 3.4
Version: 3.4 → 3.3.2
Summary: SSL client dies in handshake code → SSL client dies in handshake code with NSS 3.3.2
The assertion that failed is in a loop, at the top. It is there to ensure that (a) the lock is not held when entering the loop, and (b) that no function called from within the loop returns while holding the lock. It appears that the caller of this function never holds the lock, so the obvious candidate is that some function that was called in this loop returned while holding the lock. I tried to look at the core file to see if I could determine which function was called in the last iteration of this loop. But the core file is enormous. dbx crashes when I try to use it to inspect this core file. So, we probably won't get far with this one. If this is reproducible, one idea is to have a variable in the function that holds the address of the function called by the previous iteration of the loop. That would make it much easier to find the culprit. But even that doesn't help if dbm cannot display the value.
With Nelson's pointer, I searched all the files in nss/lib/ssl, trying to match ssl_ReleaseRecvBufLock with ssl_GetRecvBufLock and ssl_ReleaseXmitBufLock with ssl_GetXmitBufLock. I believe the culprit is the ssl_CreateSecurityInfo function in sslsecur.c, and the likely cause for the missing unlock error was that sslBuffer_Grow() failed, indicating an out-of-memory error. The relevant code in ssl_CreateSecurityInfo is reproduced below: ssl_GetRecvBufLock(ss); if ((gs = ss->gather) == 0) { ss->gather = gs = ssl_NewGather(); if (!gs) { goto loser; } } rv = sslBuffer_Grow(&gs->buf, 4096); if (rv) { goto loser; } ssl_ReleaseRecvBufLock(ss); ssl_GetXmitBufLock(ss); rv = sslBuffer_Grow(&sec->writeBuf, 4096); if (rv) { goto loser; } ssl_ReleaseXmitBufLock(ss); SSL_TRC(5, ("%d: SSL[%d]: security info created", SSL_GETPID(), ss->fd)); return SECSuccess; loser: if (sec) { PORT_Free(sec); ss->sec = sec = (sslSecurityInfo *)0; } if (gs) { ssl_DestroyGather(gs); ss->gather = gs = (sslGather *)0; } return SECFailure; Nelson, does the loop call ssl_CreateSecurityInfo? The loop invokes ss->handshakeCallback and ss->handshake. It's not clear to me what those function pointers may be. How do I verify this theory with the core file?
Status: NEW → ASSIGNED
Each of the 3 "goto loser" statements quoted above leaves a lock locked when the function returns. This function must be fixed to ensure that it never returns while leaving a lock locked. The pointers ss->handshakeCallback and ss->handshake should always point to one of the following functions, (or be null): AlwaysBlock; ssl2_BeginClientHandshake; ssl2_BeginServerHandshake; ssl2_HandleClientHelloMessage; ssl2_HandleClientSessionKeyMessage; ssl2_HandleMessage; ssl2_HandleServerHelloMessage; ssl2_HandleVerifyMessage; ssl_GatherRecord1stHandshake; BTW, I see that the "lock" in question is actually a PRMonitor, so it is possible that the monitor's lock count was already non-zero when the function was entered. The value of ss->handshake would tell us if that is the case.
This patch gets rid of the problem of leaving a lock locked before going to the 'loser' label. However, I suspect that we also need to lock the recvBufLock when destroying ss->gather and lock the xmitBufLock when destroying ss->sec. Alternatively, I am not sure if we really need to lock the recvBufLock when creating ss->gather and ss->gather->buf and lock the xmitBufLock when creating ss->sec->writeBuf. Nelson, do you know?
Just to summarize what I have done and my findings. I inspected all the files in nss/lib/ssl on the trunk of NSS as of this morning (2002-02-25), trying to match ssl_ReleaseRecvBufLock with ssl_GetRecvBufLock and ssl_ReleaseXmitBufLock with ssl_GetXmitBufLock. The only problems I found are in the ssl_CreateSecurityInfo function in sslsecur.c. Then I verified that that function has not changed since NSS 3.3.2, so NSS 3.3.2 has exactly the same problems in that function. The problems are that ssl_CreateSecurityInfo may leave the recvBufLock or xmitBufLock locked if it fails to create ss->gather, ss->gather->buf, or ss->sec->writeBuf. The most likely cause for the failure is an out-of-memory error from the memory allocation routines(PORT_ZAlloc, PORT_Realloc, or PORT_Alloc). I believe this is the cause of the assertion failures reported in this bug and bug 126501.
The RecvBufLock protects the gather buffer (among other things), and yes this lock should be held while destroying the gather buffer. The XmitBufLock protects the sec->writeBuf, among other things. But the only way to get to the writebuf is through the sec structure that this function is creating. If we didn't assign ss->sec = sec until just before we return SECSuccess, then it should not be necessary to use a lock to protect sec->writeBuf at all until ss->sec = sec.
This rather large patch accomplishes several things: 1. The sslSecurityInfo and sslGather structs were formerly allocated separately from the sslSocket struct, and pointers to them were kept in the sslSocket struct. Now, they are part of the sslSocket struct itself, not separtely allocated. This has several effects: a) slightly fewer malloc and free calls for each ssl socket, b) eliminates the need to double check the pointers for these structs to ensure they're not null in every function that accessed their members, c) reduces the levels of pointer indirection required to access their members, substantially reducing the number of memory cycles required in some functions. E.g. ss->sec->ci.elements is now ss->sec.ci.elements. 2) The sslGather struct is conceptually separate from the sslSecurityInfo struct, and is no longer initialized as a side effect of calling ssl_CreateSecurityInfo(). It is now initialized by the new function ssl_InitGather(). 3) Formerly, the sslSecurityInfo struct was allocated when SSL_ImportFD was called, and then SSL_ResetHandshake would discard the old one and allocate a new one. Now, SSL_ResetHandshake calls ssl_ResetSecurityInfo to simply reset the state of that struct to its initial state. 4) many places that formerly did not check for memory allocation failures now do check, and do the right thing when it fails, avoiding leaks. As a side effect, all the functions that handle sslSecurityInfo structs were rewritten. Hopefully, these changes will eliminate the cause of the assertion failure reported in this bug. I have run the QA test script on this code, and will check it in shortly.
Attachment #71360 - Attachment is obsolete: true
Nelson, This should probably go into the 3.4 branch, right ? There is a 3.4 defect : 126501 .
Summary: SSL client dies in handshake code with NSS 3.3.2 → SSL client dies in handshake code
*** Bug 126501 has been marked as a duplicate of this bug. ***
We have not been able to reproduce this assertion failure. I will assume that the assertion failure was caused by the condition described in comment #2 and the following comments in bug 126501 and is therefore fixed by Nelson's patch: http://bugzilla.mozilla.org/show_bug.cgi?id=126501#c3 http://bugzilla.mozilla.org/show_bug.cgi?id=126501#c4
Marked the bug fixed.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: