Last Comment Bug 417664 - false positive crl revocation test on ppc/ppc64 NSS_ENABLE_PKIX_VERIFY=1
: false positive crl revocation test on ppc/ppc64 NSS_ENABLE_PKIX_VERIFY=1
Status: RESOLVED FIXED
PKIX NSS312B2
:
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: trunk
: Other Linux
: P1 critical (vote)
: 3.12
Assigned To: Nelson Bolyard (seldom reads bugmail)
:
:
Mentors:
: 400676 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-02-14 20:52 PST by Kai Engert (:kaie) (on vacation)
Modified: 2008-02-18 01:00 PST (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
patch v1 - correct off-by-one error (1.03 KB, patch)
2008-02-15 12:04 PST, Nelson Bolyard (seldom reads bugmail)
no flags Details | Diff | Splinter Review
patch v2 - also fix other bugs found by inspection (2.90 KB, patch)
2008-02-15 12:41 PST, Nelson Bolyard (seldom reads bugmail)
no flags Details | Diff | Splinter Review
patch v2 - also fix other bugs found by inspection (2.90 KB, patch)
2008-02-15 12:42 PST, Nelson Bolyard (seldom reads bugmail)
no flags Details | Diff | Splinter Review
patch v3 - address an issue identified by Kai (checked in) (3.14 KB, patch)
2008-02-15 14:22 PST, Nelson Bolyard (seldom reads bugmail)
kaie: review+
Details | Diff | Splinter Review

Description Kai Engert (:kaie) (on vacation) 2008-02-14 20:52:19 PST
This bug is seen with both NSS 3.12 beta 1 
and a snapshot of the tip that is about 2-3 days old.

For the very first time I ran the NSS test suite on Linux machines with ppc and pcc64 processors. Fedora ships for those. Example hardware is Mac or PS3.

With both ppc and ppc64 I ran into the same test failures, CRL revocation tests as part of a SSL client auth handshake.

Number of failures: 130
All failures say: "produced a returncode of 0, expected is {1 or 254}"
All failures were during CRL tests.
Tests failures during all stages of tests, with cache or without, etc.

Sample lines:
ssl.sh: #1195: TLS Request don't require client auth (client auth) (cert TestUser40 - revoked) produced a returncode of 0, expected is 254 - FAILED
ssl.sh: #1197: TLS Request don't require client auth (client auth) (cert TestUser42 - revoked) produced a returncode of 0, expected is 254 - FAILED
ssl.sh: #1204: TLS Require client auth (client auth) (cert TestUser40 - revoked) produced a returncode of 0, expected is 254 - FAILED
ssl.sh: #1206: TLS Require client auth (client auth) (cert TestUser42 - revoked) produced a returncode of 0, expected is 254 - FAILED
ssl.sh: #1213: SSL3 Request don't require client auth (client auth) (cert TestUser40 - revoked) produced a returncode of 0, expected is 254 - FAILED
...
ssl.sh: #1502: SSL3 Require client auth on 2nd hs (client auth)(cert TestUser48 - revoked) produced a returncode of 0, expected is 1 - FAILED
ssl.sh: #1503: SSL3 Require client auth on 2nd hs (client auth)(cert TestUser49 - revoked) produced a returncode of 0, expected is 1 - FAILED
ssl.sh: #1504: SSL3 Require client auth on 2nd hs (client auth)(cert TestUser50 - revoked) produced a returncode of 0, expected is 1 - FAILED
ssl.sh: #1506: SSL3 Require client auth on 2nd hs (client auth)(cert TestUser52 - revoked) produced a returncode of 0, expected is 1 - FAILED


I found that all.sh is executing the tests in 4 "rounds".
All failures happen in the 2nd round which enables NSS_ENABLE_PKIX_VERIFY=1

After the test suite completed, I was able to run the following commands to reproduce the false positive:

mkdir test
cd test
NSS_ENABLE_PKIX_VERIFY="1" selfserv -D -p 8555 -d ../pkix/server -n localhost.localdomain -w nss -r -i ./spid &
tstclnt -p 8555 -h localhost.localdomain -f -d ../pkix/client -w nss -n TestUser40  < /home/kaie/nss/head/mozilla/security/nss/tests/ssl/sslreq.dat
echo $?

this returns 0 instead of the expected 254.

It does not matter whether I specify env variable for tstclnt.

When I start selfserv without NSS_ENABLE_PKIX_VERIFY="1",
then the above sequence gives the expected return code 0.
Comment 1 Kai Engert (:kaie) (on vacation) 2008-02-15 10:03:54 PST
I have an isolated test case that can be used without running the test suite.
I made use of the test suite files.

I copied the "client" database directory.
I imported the CRL from the server directory that revokes TestUser40.


On PS3 / ppc64:

# correct:
[kaie@kaieps3 cc2]$ certutil -d . -V -n TestUser40 -u C
certutil: certificate is invalid: Peer's Certificate has been revoked.

# false positive:
[kaie@kaieps3 cc2]$ NSS_ENABLE_PKIX_VERIFY="1" certutil -d . -V -n TestUser40 -u C
certutil: certificate is valid


I compiled the same NSS snapshot on a i386 machine.
I copied the very same db directory to the i386 machine, and I'm excuting the same commands. Here are the results:

# i386

# correct
[kaie@r61fast:~/moz/nss/head/cc2]$ certutil -d . -V -n TestUser40 -u C
certutil: certificate is invalid: Peer's Certificate has been revoked.

# different wording, but at least it indicates the verification failure
[kaie@r61fast:~/moz/nss/head/cc2]$ NSS_ENABLE_PKIX_VERIFY="1" certutil -d . -V -n TestUser40 -u C
certutil: certificate is invalid: This certificate is not valid.


I hope this reduced test case allows us to debug the failure more easily.

Alexei, Julien, do you have ideas where I could set breakpoints to find the difference in behavior?

Comment 2 Kai Engert (:kaie) (on vacation) 2008-02-15 10:06:21 PST
Or is there any logging functionality in libpkix that I could switch on and that would actually provide us with some helpful output?
Comment 3 Kai Engert (:kaie) (on vacation) 2008-02-15 10:25:50 PST
I just tested this on a Mac OS X 10.4 machine with a ppc processor.
I tested the reduced test case (not the full test suite).

It shows the same false positive.

Linux PS3 PPC64 and Mac OS X PPC behave the same.
Comment 4 Kai Engert (:kaie) (on vacation) 2008-02-15 10:26:33 PST
kai-engerts-mac-mini:~/nss/head/cc2 kengert$ certutil -d . -V -n TestUser40 -u C
certutil: certificate is invalid: Peer's Certificate has been revoked.

kai-engerts-mac-mini:~/nss/head/cc2 kengert$ NSS_ENABLE_PKIX_VERIFY="1" certutil -d . -V -n TestUser40 -u C
certutil: certificate is valid
Comment 5 Nelson Bolyard (seldom reads bugmail) 2008-02-15 10:34:01 PST
I'm assigning to Alexei, who is out today.  Kai, your continued help is 
appreciated.
Comment 6 Kai Engert (:kaie) (on vacation) 2008-02-15 10:38:43 PST
I uploaded a tar.gz archive containing that database directory: http://kuix.de/mozilla/bug417664/cc2.tar.gz
Comment 7 Kai Engert (:kaie) (on vacation) 2008-02-15 11:57:21 PST
The bug is due to some broken code in libpkix.
Nelson is currently looking into the right fix.

FYI, the following minimal code behaves differently on i386 and pcc.
On i386 it prints 2147483648.
On ppc it prints 0.


#include <stdio.h>

int main(int argc, char *argv[])
{
int shifter = 0;
unsigned int target = 0;
target |= (1 << (shifter -1));
printf("%u\n", target);
}

Comment 8 Nelson Bolyard (seldom reads bugmail) 2008-02-15 12:04:05 PST
Created attachment 303585 [details] [diff] [review]
patch v1 - correct off-by-one error

This fixes line 732 to match line 439.
Comment 9 Nelson Bolyard (seldom reads bugmail) 2008-02-15 12:12:13 PST
By stepping through the code on two systems, Kai found that the behavior 
of line 732 was different on the two systems.
 732                             allReasonCodes |= (1 << (reasonCode - 1));
with both variables initially zero, the PPC system set allReasonCodes to zero,
the X86 set it to 0x8000000.  The X86 treated the negative value as a circular
right shift, and the PPC apparently treated it as an end-off (non-circular) 
right shift.  Then the code stores this value in another variable

 743                 state->reasonCodeMask |= allReasonCodes;

Examining the code, I found that on line 439, the code computes it as

 439                     state->reasonCodeMask |= 1 << reasonCode;

So, I believe the fix is to change line 732 to match line 439 in this shift
computation.
Comment 10 Nelson Bolyard (seldom reads bugmail) 2008-02-15 12:41:22 PST
Created attachment 303590 [details] [diff] [review]
patch v2 - also fix other bugs found by inspection
Comment 11 Nelson Bolyard (seldom reads bugmail) 2008-02-15 12:42:13 PST
Created attachment 303591 [details] [diff] [review]
patch v2 - also fix other bugs found by inspection

This patch also fixes some other egregious errors in that source file.
Comment 12 Nelson Bolyard (seldom reads bugmail) 2008-02-15 12:44:13 PST
Comment on attachment 303590 [details] [diff] [review]
patch v2 - also fix other bugs found by inspection

Don't know how this got submitted twice.  I got an error first time, but it looks like the patch was attached anyway.
Comment 13 Kai Engert (:kaie) (on vacation) 2008-02-15 13:13:20 PST
Nelson, what should happen if PKIX_PL_CRLEntry_GetCRLEntryReasonCode returns a value greater than the number of defined strings? In other words if reasonCode >= numReasonCodes?

With this patch, unknown codes will cause allReasonCodes to remain unchanged (at 0). If allReasonCodes remains at zero, no revocation will be reported.

Would it be wise to fail on any reasonCode >= 0 ?

(Because of the buggy old code, the upper limit for codes treated as "revoked" was much higher.)
Comment 14 Nelson Bolyard (seldom reads bugmail) 2008-02-15 13:47:59 PST
Yes.  Perhaps this code should say something like this:
    if (reasonCode >= 0) {
        if (reasonCode >= numReasonCodes)
            reasonCode = 0;  /* use "unknown" */
        ...
Comment 15 Nelson Bolyard (seldom reads bugmail) 2008-02-15 14:22:50 PST
Created attachment 303613 [details] [diff] [review]
patch v3 - address an issue identified by Kai (checked in)

Kai observes that if we have a reason code, the cert should be considered
revoked, even if that reason code number is out of range.
Comment 16 Julien Pierre 2008-02-15 16:35:23 PST
I think this may be a dupe of bug 400676 , which is about CRL problems on AIX - an OS that's also PPC-based. The bug has been on my plate for a while but I hadn't been able to investigate yet for various reasons. I will try the patch and see if it resolves the issue there too.
Comment 17 Kai Engert (:kaie) (on vacation) 2008-02-15 19:53:37 PST
Comment on attachment 303591 [details] [diff] [review]
patch v2 - also fix other bugs found by inspection

I ran the test suite on both i386 and ppc64 with this patch and it succeeded!
Comment 18 Kai Engert (:kaie) (on vacation) 2008-02-15 19:54:27 PST
Comment on attachment 303613 [details] [diff] [review]
patch v3 - address an issue identified by Kai (checked in)

r=kengert

This is an improvement over v2, thanks for addressing my concern.
Comment 19 Nelson Bolyard (seldom reads bugmail) 2008-02-15 20:26:24 PST
Comment on attachment 303613 [details] [diff] [review]
patch v3 - address an issue identified by Kai (checked in)

Checking in pkix/top/pkix_defaultcrlchecker.c;
               new revision: 1.7; previous revision: 1.6
Comment 20 Slavomir Katuscak 2008-02-18 01:00:15 PST
*** Bug 400676 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.