Closed Bug 311440 Opened 15 years ago Closed 15 years ago

selfserv with ECC enabled crashes in free()

Categories

(NSS :: Libraries, defect, P1)

3.11
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wtc, Assigned: nelson)

Details

(Keywords: crash, regression)

Attachments

(3 files)

I built the NSS tip (NSS 3.11 pre-release) with NSS_ENABLE_ECC=1.

When running all.sh, selfserv crashes in free() because it frees
memory that was not allocated from the heap (this stack trace was
produced with gdb on Red Hat Enterprise Linux 4):

#0  0x0079d7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x001387d5 in raise () from /lib/tls/libc.so.6
#2  0x0013a149 in abort () from /lib/tls/libc.so.6
#3  0x0016c27a in __libc_message () from /lib/tls/libc.so.6
#4  0x00172abf in _int_free () from /lib/tls/libc.so.6
#5  0x00172e3a in free () from /lib/tls/libc.so.6
#6  0x0032ee61 in PR_Free ()
   from
/home/wtc/nss-ecc/mozilla/dist/Linux2.6_x86_glibc_PTH_OPT.OBJ/lib/libnspr4.so
#7  0x0067f2d5 in PORT_Free ()
   from /home/wtc/nss-ecc/mozilla/dist/Linux2.6_x86_glibc_PTH_OPT.OBJ/lib/libnss3.so
#8  0x00cda221 in ssl3_HandleECDHClientKeyExchange ()
   from /home/wtc/nss-ecc/mozilla/dist/Linux2.6_x86_glibc_PTH_OPT.OBJ/lib/libssl3.so
#9  0x00ccb389 in ssl3_HandleHandshakeMessage ()
   from /home/wtc/nss-ecc/mozilla/dist/Linux2.6_x86_glibc_PTH_OPT.OBJ/lib/libssl3.so
#10 0x00ccc64c in ssl3_HandleRecord ()
   from /home/wtc/nss-ecc/mozilla/dist/Linux2.6_x86_glibc_PTH_OPT.OBJ/lib/libssl3.so
#11 0x00ccd2f4 in ssl3_GatherCompleteHandshake ()
   from /home/wtc/nss-ecc/mozilla/dist/Linux2.6_x86_glibc_PTH_OPT.OBJ/lib/libssl3.so
#12 0x00cce891 in ssl_GatherRecord1stHandshake ()
   from /home/wtc/nss-ecc/mozilla/dist/Linux2.6_x86_glibc_PTH_OPT.OBJ/lib/libssl3.so
#13 0x00cd2f8c in ssl_Do1stHandshake ()
   from /home/wtc/nss-ecc/mozilla/dist/Linux2.6_x86_glibc_PTH_OPT.OBJ/lib/libssl3.so
#14 0x00cd42da in ssl_SecureRecv ()
   from /home/wtc/nss-ecc/mozilla/dist/Linux2.6_x86_glibc_PTH_OPT.OBJ/lib/libssl3.so
#15 0x00cd4555 in ssl_SecureRead ()
   from /home/wtc/nss-ecc/mozilla/dist/Linux2.6_x86_glibc_PTH_OPT.OBJ/lib/libssl3.so
#16 0x00cd7b5a in ssl_Read ()
   from /home/wtc/nss-ecc/mozilla/dist/Linux2.6_x86_glibc_PTH_OPT.OBJ/lib/libssl3.so
#17 0x003283e9 in PR_Read ()
   from
/home/wtc/nss-ecc/mozilla/dist/Linux2.6_x86_glibc_PTH_OPT.OBJ/lib/libnspr4.so
#18 0x0804cbb4 in jobLoop ()
#19 0x0804c5e8 in thread_wrapper ()
#20 0x00340221 in _pt_root ()
   from
/home/wtc/nss-ecc/mozilla/dist/Linux2.6_x86_glibc_PTH_OPT.OBJ/lib/libnspr4.so
#21 0x009ea341 in start_thread () from /lib/tls/libpthread.so.0
#22 0x001d7fee in clone () from /lib/tls/libc.so.6

If I remove the PORT_Free call in ssl3_HandleECDHClientKeyExchange,
the crash is gone, but the ECC TLS tests still fail.
The output.log file is too big to be attached to the
bug report, here is the relevant excerpt.
I examined results.html closer and found that the
ECC SSL Cipher Coverage and Client Authentication
tests passed.  It was only the ECC SSL Stress tests
that failed.  So I've updated the summary of this bug
accordingly.
Summary: ECC TLS doesn't work → ECC TLS stress tests failed
I believe this is the right fix.  Nelson, please review.

On Windows with the debug MSVC 7.1 runtime library, the free()
call generates this debug assertion failure:

  Microsoft Visual C++ Debug Library

  Debug Assertion Failed!

  Program: C:\nss-ecc\mozilla\dist\win954.0_dbg.objd\bin\selfserv.exe
  File: dbgheap.c
  Line: 1132

  Expression: _CrtIsValidHeapPointer(pUserData)

  For information on how your program can cause an assertion
  failure, see the Visual C++ documentation on asserts.

  (Press Retry to debug the application)

    Abort	       Retry		    Ignore
Attachment #198759 - Flags: review?(nelson)
Douglas sent me email, which reminded me that the ECC TLS
stress tests are a known bug (bug 238051).  So the only
regression is that selfserv crashes in free().  I updated
the bug's summary again.
Keywords: crash
Priority: -- → P1
Summary: ECC TLS stress tests failed → selfserv with ECC enabled crashes in free()
Target Milestone: --- → 3.11
(In reply to comment #5)
> Douglas sent me email, which reminded me that the ECC TLS
> stress tests are a known bug (bug 238051).  So the only
> regression is that selfserv crashes in free().  I updated
> the bug's summary again.

So do we know if a build with both patches applied
(i.e. patches for bug Ids 311440 and 238051) passes
the stress tests.

I'm bringing this up because, if I remember correctly,
the ec versions of the files in nss/tests/ssl already
account for the lack of session reuse, e.g. if you
look inside ecsslstress.txt, you'll notice that -N
is set for two of the ECC stress tests which disables
session reuse.

You might have discovered a bug that I recall
seeing in the past but only on some platforms and
not very consistently.

vipul
Vipul: you are right.  I misunderstood what bug 238051 is.

I found that the ECC SSL stress test failures exist in the
first NSS release (NSS 3.9) whose all.sh can do ECC tests.
So this is a pre-existing problem, and is different from
bug 238051.  I will open a new bug about it.

I applied the patch from bug 238051 to the NSS tip.  As
expected, the ECC SSL stress tests still fail.
(In reply to comment #7)
> Vipul: you are right.  I misunderstood what bug 238051 is.
> 
> I found that the ECC SSL stress test failures exist in the
> first NSS release (NSS 3.9) whose all.sh can do ECC tests.
> So this is a pre-existing problem, and is different from
> bug 238051.  I will open a new bug about it.
> 
> I applied the patch from bug 238051 to the NSS tip.  As
> expected, the ECC SSL stress tests still fail.

So does this mean we don't yet have a patch (or combination
of patches) that would cause the stress test to pass?

Here's my current understanding of where we stand. Please
correct me if I'm wrong.

  - the current patch for 311440 fixes the selfserv crash
  - ECC session reuse is disabled but the stress tests still fail

If so, do we know what's causing the failure message to be
printed. Could it be that the test script is expecting the incorrect
number of cache hits (because session reuse is disabled)?
I know I had to make this change but it is possible that the
change never made it back to the NSS tip.

vipul
Vipul:

Your understanding is correct:
  - the current patch for 311440 fixes the selfserv crash
  - ECC session reuse is disabled but the stress tests still fail

All the error messages came from strsclnt:
  strsclnt: PR_Write returned error -12268:
  Cannot connect: SSL is disabled.

selfserv did not emit any messages and was killed by our
test script, apparently because it was still waiting for
connections from strsclnt.

The SSL error code -12268 is SSL_ERROR_SSL_DISABLED,
"Cannot connect: SSL is disabled", which means:

  The local socket is configured in such a way that it
  cannot use any of the SSL cipher suites. Possible causes
  include: (a) both SSL2 and SSL3 are disabled, (b) All the
  individual SSL cipher suites are disabled, or (c) the socket
  is configured to handshake as a server, but the certificate
  associated with that socket is inappropriate for the Key
  Exchange Algorithm selected.

Since strsclnt is an SSL client, (c) does not apply.

I will try to continue looking into this next week.
Comment on attachment 198759 [details] [diff] [review]
Patch to fix the crash in free()

This patch appears to be correct.  
I found what appear to be two places in this function that appear to declare a
fatal error without sending an alert to the peer.  Those should be fixed.  But
they are not relevant to the immediate cause of this bug.
Attachment #198759 - Flags: review?(nelson) → review+
Comment on attachment 198759 [details] [diff] [review]
Patch to fix the crash in free()

Patch checked in on the NSS tip (NSS 3.11)

I also examined all the other call sites of
ssl3_ConsumeHandshakeVariable and verified that
none of them free the SECItem's buffer returned
by ssl3_ConsumeHandshakeVariable.

I saw two places in this function marked with
SEND_ALERT.  I believe that's what Nelson meant.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Let me clarify about the two places in ssl3_HandleECDHClientKeyExchange.

The first "SEND_ALERT", in the error path after the call to
ssl3_ConsumeHandshakeVariable, should be removed because
ssl3_ConsumeHandshakeVariable sends the alert itself, so no 
additional alert needs to be sent in that case.

The comment "/* last gasp.  */" appears in the error path following a call
to PK11_PubDeriveWithKDF.  It appears that no alert is sent in this error
path, unless PK11_PubDeriveWithKDF itself sends an alert (I didn't check).
I also didn't check to see if the caller of ssl3_HandleECDHClientKeyExchange
sends an alert in this case.  

The second "SEND_ALERT", in the error path after the call to
ssl3_InitPendingCipherSpec, is the second path in which no alert is sent.
You need to log in before you can comment on or make changes to this bug.