Closed
Bug 342795
Opened 18 years ago
Closed 18 years ago
PR_Read returns -1 but PR_GetError returns 0 when ECDHE keygen fails
Categories
(NSS :: Libraries, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
3.11.5
People
(Reporter: julien.pierre, Assigned: wtc)
References
Details
Attachments
(1 file)
3.49 KB,
patch
|
nelson
:
review+
alvolkov.bgs
:
superreview+
|
Details | Diff | Splinter Review |
In my tree, I just got a core file in selfserv in the "errWarn" function :
241 static const char *
242 errWarn(char * funcString)
243 {
244 PRErrorCode perr = PR_GetError();
245 const char * errString = SECU_Strerror(perr);
246
247 fprintf(stderr, "selfserv: %s returned error %d:\n%s\n",
248 funcString, perr, errString);
249 return errString;
250 }
The problem is that perr is zero, and errString is NULL .
errString is not checked for NULL, and the fprintf crashes while calling strlen(errString).
The stack is :
[1] strlen(0x8068b3e, 0xfe64b220, 0xfec2b090, 0x0), at 0xfeb84acc
[2] _fprintf(0xfec2b090, 0x8068b1c, 0x8080c08, 0x0, 0x0), at 0xfebdcc0d
=>[3] errWarn(funcString = 0x8080c08 "HDX PR_Read"), line 248 in "selfserv.c"
[4] handle_connection(tcp_sock = 0x81862e0, model_sock = 0x8083800, requestCert = 0), line 968 in "selfserv.c"
[5] jobLoop(a = (nil), b = (nil), c = 0), line 518 in "selfserv.c"
[6] thread_wrapper(arg = 0x816d3f8), line 486 in "selfserv.c"
[7] _pt_root(arg = 0x81788b8), line 220 in "ptthread.c"
[8] _thr_setup(0xfe650400), at 0xfebff6f8
[9] _lwp_start(), at 0xfebff9e0
The caller of errWarn is :
960 rv = PR_Read(ssl_sock, pBuf, bufRem - 1);
961 if (rv == 0 ||
962 (rv < 0 && PR_END_OF_FILE_ERROR == PR_GetError())) {
963 if (verbose)
964 errWarn("HDX PR_Read hit EOF");
965 break;
966 }
967 if (rv < 0) {
968 errWarn("HDX PR_Read");
969 goto cleanup;
970 }
It is the second errWarn that is invoked . rv is -1 .
The command-line that caused selfserv to get in this condition is below :
(dbx) p argc
argc = 16
(dbx) p argv[0..15]
argv[0..15] =
[0] = 0x80469bc "selfserv"
[1] = 0x80469c5 "-D"
[2] = 0x80469c8 "-p"
[3] = 0x80469cb "8443"
[4] = 0x80469d0 "-d"
[5] = 0x80469d3 "../ext_server"
[6] = 0x80469e1 "-n"
[7] = 0x80469e4 "monstre.red.iplanet.com"
[8] = 0x80469fc "-e"
[9] = 0x80469ff "monstre.red.iplanet.com-ec"
[10] = 0x8046a1a "-w"
[11] = 0x8046a1d "nss"
[12] = 0x8046a21 "-c"
[13] = 0x8046a24 "ABCDEF:C001:C002:C003:C004:C005:C006:C007:C008:C009:C00A:C00B:C00C:C00D:C00E:C00F:C010:C011:C012:C013:C014cdefgijklmnvyz"
[14] = 0x8046a9d "-i"
[15] = 0x8046aa0 "../tests_pid.26949"
(dbx)
Reporter | ||
Comment 1•18 years ago
|
||
I got 10 cores in ext_client and 10 cores in client all with this same stack - crash in fprintf/strlen due to the NULL error string.
We should fix errWarn to check for the NULL string before calling fprintf .
But we also need to figure out why PR_Read returned -1 and no error.
Assignee | ||
Comment 2•18 years ago
|
||
Julien,
Instead of fixing errWarn to check for NULL, it may be better
to fix SECU_Strerror so that it never returns NULL.
- map 0 to "no error"
- map an error code not in the errStrings table to "unknown error"
Note that an error code not in the errStrings table also makes
SECU_Strerror return NULL.
The first errWarn call should also be fixed:
961 if (rv == 0 ||
962 (rv < 0 && PR_END_OF_FILE_ERROR == PR_GetError())) {
963 if (verbose)
964 errWarn("HDX PR_Read hit EOF");
965 break;
966 }
errWarn calls PR_GetError(). We should only call PR_GetError()
when an NSPR function failed. So for the rv == 0 case, the
return value of PR_GetError() in errWarn is meaningless.
(NSPR function don't need to clear the error code when they
succeed.)
Reporter | ||
Comment 3•18 years ago
|
||
This problem is reproducible by pulling the tip, setting NSS_ENABLE_ECC and NSS_ECC_MORE_THAN_SUITE_B to 1, and running all.sh .
Re: comment 2,
Good catch, Wan-Teh. The rv==0 case should be changed in selfserv to not invoke errWarn .
I'm not as confident about changing SECU_Strerror to never return NULL . I think it's useful to detect a zero error when one is expected. I am tempted to make that an assertion case, if not in SECU_Strerror, then perhaps in errWarn .
Anyway, the main bug here is not about the tools crashing but the PR_Read API returning -1 and not setting any error code.
Reporter | ||
Comment 4•18 years ago
|
||
Here is the test case for the server side :
selfserv -D -p 8443 -d ../../../tests_results/security/monstre.2/ext_server -n monstre.red.iplanet.com -e monstre.red.iplanet.com-ec -w nss -c :C010 -t 1
Here is the test case for the client side :
tstclnt -p 8443 -h monstre.red.iplanet.com -c :C010 -T -B -s -f -d ../../../tests_results/security/monstre.2/client < /export/home/nss/tip2/mozilla/security/nss/tests/ssl/sslreq.dat
I have to run tstclnt *twice* before the server reports the zero error. The first time around, the client reports :
tstclnt: write to SSL socket failed: Encountered end of file.
And the server reports :
selfserv: HDX PR_Read returned error -8092:
Unable to generate public/private key pair.
The second time, the server aborts with the stack previously reported due to the zero error code.
It appears that the server does an ECDHE keygen the first time, and that fails. libssl ends up correctly setting an NSPR error, and then selfserv reports it.
But the second time, libssl sets no error.
Reporter | ||
Comment 5•18 years ago
|
||
The problem is in ssl3ecc.c :
613 status = PR_CallOnceWithArg(&gECDHEKeyPairs[ec_curve].once,
614 ssl3_CreateECDHEphemeralKeyPair,
615 (void *)ec_curve);
616 if (status != PR_SUCCESS) {
617 return SECFailure;
618 }
The first time PR_CallOnceWithArg is called, it invokes ssl3_CreateECDHEphemeralKeyPair, which ends up setting up the NSPR error code.
But the second and subsequent times, this NSPR function just returns PR_FAILURE, without setting any error code. And nothing else in the SSL stack sets it either all the way to the return from PR_Read .
Assignee: nobody → nelson
Reporter | ||
Updated•18 years ago
|
Summary: PR_Read returns -1 but PR_GetError returns 0 → PR_Read returns -1 but PR_GetError returns 0 when ECDHE keygen fails
Reporter | ||
Updated•18 years ago
|
Priority: -- → P2
Version: trunk → 3.11.2
Assignee | ||
Comment 6•18 years ago
|
||
Julien, since PRCallOnceType is a public type, we
can't change it. Otherwise, we can store the error
code in a PRCallOnceType structure. We can consider
storing the error code in the "PRStatus status" field.
But I am worried that some NSPR clients have started
to use the fields of PRCallOnceType directly. (For
example, there are patches for lib/freebl that
zero a PRCallOnceType structure to fix memory leaks.)
Reporter | ||
Comment 7•18 years ago
|
||
I don't think we necessarily have to change the PRCallOnceType structure. The simplest fix is to have the caller of PR_CallOnceWithArg always set the NSPR error whenever there is a failure. But of course, that will lose the details of the reason for the failure.
Another possibility is to set the NSPR error code to zero before the call, and then query the error code after the call, and if it is zero, set the value. This is equivalent to my first proposal except for the first call, during which we would get the detailed error from the actual call.
Comment 8•18 years ago
|
||
IINM, this bug is paired with bug 341708. In that bug, we found that the
client was receiving a server key exchange with a bogus ECDHE key, and then
was aborting the handshake without sending an alert. The implication is that
the server did not detect that it had failed to generate a valid ECDHE key,
and sent the server key exchange message anyway. That needs to be fixed.
I think that bug is essentially the same as this one.
I suspect that the code shown in comment 5 above is returning PR_SUCCESS
in the second call, even though the first call failed, or alternatively,
the function that calls this one ignores the error return. That hypothesis
needs investigation.
Reporter | ||
Comment 9•18 years ago
|
||
Nelson,
These two bugs are unrelated. In bug 341708, the server "succeeds" at doing the keygen, but the key ends up being invalid, and only the client detects that problem. Bug 341708 is paired with bug 341573 - the combination of the -fsimple=2 compiler option and the floating point ECL library cause the bad ECC computations.
On the other hand, this bug occurs when building with NSS_ECC_MORE_THAN_SUITE_B set to 1, and checking out the ecl-curves.h file from the tip as of yesterday, which only contains 3 curves, then running all.sh again with this environment variable set.
I verified through debugging that PR_CallOnceWithArg returns PR_FAILURE both the first and second times. The only difference is that the first time, the NSPR error code is set by the function it invokes, and the second time, the function is not invoked, and no NSPR error code is set.
Reporter | ||
Updated•18 years ago
|
Assignee: nelson → julien.pierre.bugs
Target Milestone: --- → 3.11.4
Assignee | ||
Comment 10•18 years ago
|
||
Although this bug is caused by a design flaw of PR_CallOnce's error
reporting (bug 353899), we can't change PR_CallOnce to report meaningful
error codes. Therefore, this patch is the only way to get a meaningful
error code rather than a fixed error code for all PR_CallOnce failures.
With PR_CallOnce's limitation, call-once functions are responsible for
storing their error codes so that the error codes can be retrieved later.
So I store the error code in the 'error' member of the ECDHEKeyPair structure.
Attachment #246034 -
Flags: superreview?(alexei.volkov)
Attachment #246034 -
Flags: review?(nelson)
Assignee | ||
Comment 11•18 years ago
|
||
Nelson, please set the target milestone for this bug.
For now I left it targeted at a 3.11.x patch release.
Assignee: bugzilla → wtchang
Target Milestone: 3.11.4 → 3.11.5
Assignee | ||
Updated•18 years ago
|
Attachment #246034 -
Flags: superreview?(alexei.volkov) → superreview?(alexei.volkov.bugs)
Comment 12•18 years ago
|
||
Comment on attachment 246034 [details] [diff] [review]
Proposed patch
Patch is good for fixing reported problem.
Attachment #246034 -
Flags: superreview?(alexei.volkov.bugs) → superreview+
Comment 13•18 years ago
|
||
Comment on attachment 246034 [details] [diff] [review]
Proposed patch
If there is no platform on which this change creates a binary incompatibility
then r=nelson. OTOH, if there is no platform on which the change creates a
binary incompatiblity, then why change it?
> typedef struct ECDHEKeyPairStr {
> ssl3KeyPair * pair;
>- PRInt32 flag;
>+ int error; /* error code of the call-once function */
> PRCallOnceType once;
> } ECDHEKeyPair;
Attachment #246034 -
Flags: review?(nelson) → review+
Assignee | ||
Comment 14•18 years ago
|
||
Comment on attachment 246034 [details] [diff] [review]
Proposed patch
Thanks for the code review. Before I answer Nelson's question,
I just wanted to make sure that you still want to fix this bug
in NSS 3.11.x. (Julien set the target milestone to 3.11.4 on
2006-09-22.)
Nelson, since the ECDHEKeyPairStr structure is private (it's
defined inside a .c file), changing the type of a member won't
break binary compatibility on any platform. It turns out that
the original 'PRInt32 flag' member was unused. I renamed it
'error' because I'm storing the error code in it. Then I changed
the type to 'int' because that's the type of error codes in
NSS (used by PORT_GetError and PORT_SetError).
Does this answer address your concern?
Comment 15•18 years ago
|
||
Wan-Teh, thanks for your answer.
Assignee | ||
Comment 16•18 years ago
|
||
I checked in the patch on the NSS trunk (NSS 3.12) and NSS_3_11_BRANCH
(NSS 3.11.5).
Checking in ssl3ecc.c;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3ecc.c,v <-- ssl3ecc.c
new revision: 1.16; previous revision: 1.15
done
Checking in ssl3ecc.c;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3ecc.c,v <-- ssl3ecc.c
new revision: 1.3.2.10; previous revision: 1.3.2.9
done
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•