Last Comment Bug 334057 - 160 and 192 bit curves fail on AMD64 Red Hat builds
: 160 and 192 bit curves fail on AMD64 Red Hat builds
Status: RESOLVED FIXED
ECC
:
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: 3.11
: x86 Linux
: P1 critical (vote)
: 3.11.1
Assigned To: Robert Relyea
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-04-14 15:13 PDT by Slavomir Katuscak
Modified: 2007-05-06 20:49 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Add AMD64 Linux support to mpi/target.mk (911 bytes, patch)
2006-04-15 20:08 PDT, Nelson Bolyard (seldom reads bugmail)
no flags Details | Diff | Splinter Review
hack to find problematic curves (2.15 KB, patch)
2006-04-15 21:48 PDT, Nelson Bolyard (seldom reads bugmail)
no flags Details | Diff | Splinter Review
Turn off Inline assemble for Linux. (599 bytes, patch)
2006-04-19 16:31 PDT, Robert Relyea
nelson: review+
Details | Diff | Splinter Review

Description Slavomir Katuscak 2006-04-14 15:13:39 PDT
Many tests using ECDHE + RSA failed, log of one failure is there:

ssl.sh: running SSL3 ECDHE RSA WITH NULL SHA ----------------------------
tstclnt -p 8444 -h nssamdrhel3.red.iplanet.com -c :C010 -T -B -s \
        -f -d ../client < /share/builds/mccrel3/security/securitytip/builds/20060414.1/wozzeck_Solaris8/mozilla/security/nss/tests/ssl/sslreq.dat
tstclnt: write to SSL socket failed: Unspecified failure while processing SSL Client Key Exchange handshake.
ssl.sh: SSL3 ECDHE RSA WITH NULL SHA produced a returncode of 254, expected is 0 FAILED

All others looks similar.
Comment 1 Nelson Bolyard (seldom reads bugmail) 2006-04-14 18:09:26 PDT
Taking.  
The problem seems to affect all ECDHE_RSA tests on 64-bit RedHat linux.
It is not obviously affecting any other platforms.  
It is almost certainly due to my checkin yesterday. 
I am trying to track this down, but don't have a 64-bit redhat box 
at my immediate disposal.
Comment 2 Julien Pierre 2006-04-14 18:20:50 PDT
Nelson, you can use the box in our lab on which the nightly QA failed.
Comment 3 Nelson Bolyard (seldom reads bugmail) 2006-04-14 20:56:44 PDT
I went to system nssamdrhel3, the system on which the test was run that was
the cause of this bug report.  There I built the trunk of NSS with the gcc 
compiler in /usr/bin/gcc, which is gcc 3.2.3.  
I used NSPR source from the NSPR_4_6_BRANCH.  All built debug.  
The name of the object directory for my build was the same as in the nightly
QA test that failed, namely, Linux2.4_x86_64_glibc_PTH_64_DBG.OBJ
The build finished without errors. 
Then I ran all.sh.  It passed.  All Green.  No red/orange.  

What could be different?  Here are some ideas:

a) the version of NSPR.  I used the 4.6 branch.  I think the nightlies use 
the trunk.  So I will retest with NSPR from the trunk.

b) the compiler.  I have no idea what compiler was actually used for the 
nightlies.  /tools/ns/bin doesn't work any more.  Apparently we have reverted
to building with whatever gcc is locally installed, something we carefully
avoided doind in years past.  We're evidently no longer controlling the 
compiler version carefully.  

I'll let you know what I find after rebuilding with NSPR from the trunk.
Comment 4 Nelson Bolyard (seldom reads bugmail) 2006-04-14 21:13:31 PDT
results with NSPR from the trunk:  all passed.  
Comment 5 Nelson Bolyard (seldom reads bugmail) 2006-04-14 22:50:42 PDT
OK, I was able to reproduce it.  (Had to define NSS_ENABLE_ECC :-/  )
The immediate cause of failure is in strsclnt, on this stack:

#0  ec_GFp_validate_point    at ecl/ecp_aff.c:336
#1  in ECPoint_validate      at ecl/ecl.c:397
#2  in EC_ValidatePublicKey  at ec.c:513
#3  in EC_ValidatePublicKey  at loader.c:1365
#4  in NSC_DeriveKey (hSession=19, pMechanism=0x409fef50,
    hBaseKey=5, pTemplate=0x409feec0, ulAttributeCount=3, phKey=0x5bfe88)
    at pkcs11c.c:5507
#5  in PK11_PubDeriveWithKDF (privKey=0x5c36a0,
    pubKey=0x5bec20, isSender=0, randomA=0x0, randomB=0x0, derive=4176,
    target=887, operation=268, keySize=0, kdf=1, sharedData=0x0, wincx=0x0)
    at pk11skey.c:1731
#6   in ssl3_SendECDHClientKeyExchange  at ssl3ecc.c:290
#7   in ssl3_SendClientKeyExchange      at ssl3con.c:4213
#8   in ssl3_HandleServerHelloDone      at ssl3con.c:5105
#9   in ssl3_HandleHandshakeMessage     at ssl3con.c:7593
#10  in ssl3_HandleHandshake            at ssl3con.c:7680
#11  in ssl3_HandleRecord               at ssl3con.c:7943
#12  in ssl3_GatherCompleteHandshake    at ssl3gthr.c:206
#13  in ssl_GatherRecord1stHandshake    at sslcon.c:1260
#14  in ssl_Do1stHandshake              at sslsecur.c:149
#15  in ssl_SecureSend                  at sslsecur.c:1096
#16  in ssl_Send                        at sslsock.c:1373
#17  in PR_Send   at ../../../../pr/src/io/priometh.c:226
#18 in handle_connection                at strsclnt.c:693
#19 in do_connects                      at strsclnt.c:884

At this line of code:

333     /* check LHS - RHS == 0 */
334     MP_CHECKOK( group->meth->field_sub(&accl, &accr, &accr, group->meth) );
335     if (mp_cmp_z(&accr) != 0) {
336 >           res = MP_NO;
337             goto CLEANUP;
338     }

Now, AFAIK, there have been no changes to freebl in the last few days,
so I don't suspect this is really a freebl bug.  
Comment 6 Nelson Bolyard (seldom reads bugmail) 2006-04-15 13:55:42 PDT
More news and analysis.

This bug was reported by our "nightly QA" tests, which are supposed to run
every night of the year.  So, when the results reported Friday morning from
the previous night's nighly QA test showed that this test failed on a 64-bit
AMD box running RHEL3, our natural conclusion was that this was a new bug 
caused by checkins from the day before.  But then we learned that the 
"nightly" QA doesn't run every night on some boxes.  In fact, it's not clear 
WHEN was the last time that it ran on that box.  It has not run again on that 
box since Thursday night (Friday morning).  So, at this moment, we don't know 
when this error actually began to occur.  I will begin to back out changes 
(in my workarea) and rebuild and retest, to determine when it really began.

Second, so far, this problem is seen only on that AMD RHEL3 box.  It has not
been reported for any other 64-bit platform, nor for ANY 32 bit platform.
Tests on Solaris 10 for AMD64 pass (something I will double check today).
What about the code could be so specific as to only be a problem on that 
combination of CPU and OS.  Is it really a compiler issue?

Third, The failing test is checking that the point in the peer's ephemeral public key is really on the curve.  There are many potential explanations 
for this, including
 - point really not on the curve, sender sent a bad point (EVERY TIME).
 - wrong curve being checked (two sides not agreeing on the curve)
 - test itself is incorrect, point is right, test is wrong
 - point is being improperly encoded or decoded

Other related possiblities:
 - Did the checkin that enabled ECC hello extensions cause us to start
   using a different curve than before?  IOW, is this a latent bug in
   the code for some curve that was not previousl being exercised? 

And I wonder:
- did the ECC code get tested (apart from SSL) for every curve on every
  platform?
- Did we check that the points we create are really on the curve, 
  for every curve on every platform?  

The curve in question seems to be the one identified in SSL's numbered
curve name space as curve 0x11.  Is that one we were not using prior to
the introduction of the curve negotiation code?

I'm going to work on this all weekend, or until I find a clear cause.
i intend to try testing the AMD 64 RHEL client against other servers,
and test other clients against the AMD64 RHEL server, to try to determine
if the fault is in the client code path or in the server code path.  

I'd appreciate some support from our EC experts, because I cannot readily tell 
by inspection if the computations are producing the correct results or not.  
Comment 7 Douglas Stebila 2006-04-15 15:03:28 PDT
(In reply to comment #6)
> Third, The failing test is checking that the point in the peer's ephemeral
> public key is really on the curve.  There are many potential explanations 
> for this, including
>  - point really not on the curve, sender sent a bad point (EVERY TIME).
>  - wrong curve being checked (two sides not agreeing on the curve)
>  - test itself is incorrect, point is right, test is wrong
>  - point is being improperly encoded or decoded

Can you provide the value of &accr at the line of code specified in comment #5, and also other relevant values at that time?  It might help determine whether it's a point arithmetic problem or not.  Can you also verify that the ecp_test suite passes successfully on the platform in question.

> - did the ECC code get tested (apart from SSL) for every curve on every
>   platform?
> - Did we check that the points we create are really on the curve, 
>   for every curve on every platform?  

The underlying ECC code was not tested by me on too many platforms.  I did test 64-bit platforms, but only PowerPC 64-bit and UltraSPARC 64-bit.  Maybe Vipul has since done more testing.

> The curve in question seems to be the one identified in SSL's numbered
> curve name space as curve 0x11.  Is that one we were not using prior to
> the introduction of the curve negotiation code?

0x11 is secp160r2.  That's the curve that's been used for a while, as far as I know, but it shouldn't be used anymore as Internet Explorer won't support it.  There have been wiring changes in ECL that affected secp160r2 in the past couple of months, so that could be the cause of the problem, but without knowing if ecp_test passes, I can't say any more.

Douglas
Comment 8 Nelson Bolyard (seldom reads bugmail) 2006-04-15 15:36:49 PDT
(In reply to comment #7)

Douglas, thanks for your answer.

> Can you provide the value of &accr at the line of code specified in comment 
> #5, and also other relevant values at that time?  It might help determine 
> whether it's a point arithmetic problem or not.  

Will do.

> Can you also verify that the ecp_test
> suite passes successfully on the platform in question.

How do I do that?  Tell me and I'll do it.

> > - did the ECC code get tested (apart from SSL) for every curve on every
> >   platform?
> > - Did we check that the points we create are really on the curve, 
> >   for every curve on every platform?  
> 
> The underlying ECC code was not tested by me on too many platforms.  I did 
> test 64-bit platforms, but only PowerPC 64-bit and UltraSPARC 64-bit.  
> Maybe Vipul has since done more testing.

For our RSA, RSA and other algorithms, we have test programs that test just
the raw math/crypto code itself, so that we dont have rely on SSL testing
as the sole means of finding crypto code problems.  This has the added 
benefit of helping us determine wether a problem is in SSL code or in freebl
code.  But AFAIK, we don't have any ECC arithmetic tests (or test programs)
in our nightly QA test scripts.  We should.  We run our automated QA tests 
on every supported platform before we release.  It's imporant that we have
clear confirmation on every supported platform (CPU, OS, word-size combination)
before we ship.  


> 0x11 is secp160r2.  That's the curve that's been used for a while, as far as I
> know, but it shouldn't be used anymore as Internet Explorer won't support it. 

We claim to support all the curves in the TLS draft.  We have to test them all.
What IE does or does not run does not define what we do.

> There have been wiring changes in ECL that affected secp160r2 in the past
> couple of months, so that could be the cause of the problem, but without
> knowing if ecp_test passes, I can't say any more.

Does this ecp_test build as part of our regular NSS builds?  Can we readily
add it to the nightly QA test scripts?  (If this is something that is only
part of mpi/Makefile, and not freebl/Makefile, that has to get fixed first.)
Comment 9 Douglas Stebila 2006-04-15 16:26:40 PDT
(In reply to comment #8)
> > Can you also verify that the ecp_test
> > suite passes successfully on the platform in question.
> 
> How do I do that?  Tell me and I'll do it.

cd mozilla/security/nss/lib/freebl/mpi
make libs
cd ../ecl
make tests
./ecp_test

> Does this ecp_test build as part of our regular NSS builds?  Can we readily
> add it to the nightly QA test scripts?  (If this is something that is only
> part of mpi/Makefile, and not freebl/Makefile, that has to get fixed first.)

As you can see above, it's only part of ecl/Makefile and not part of freebl/Makefile.  There are EC related tests in cmd/bltest, but I don't know what part that plays in your nightly QA test scripts.  As for adding the ecp_test (and ec2_test) to your nightly QA builds, the process for doing that would probably be similar to the work that Bob Relyea did in porting the ECC performance tests from ecl/Makefile to the NSS build process.
Comment 10 Nelson Bolyard (seldom reads bugmail) 2006-04-15 17:47:57 PDT
More findings:

a) Problem occurs if either client or server (or both) is 64-bit.
   Only combination that works (on RHEL3 Opteron) is when both are 32-bit.

b) Initial implementation was        hard coded to always use secp224r1.
   Then in december, it was changed, hard coded to always use secp256r1.
   Recently it was changed to use any one of the following, depending on 
   other key sizes:
        secp160r2
        secp224r1
        secp256r1
        secp384r1
        secp521r1
   and in the current ssl.sh tests, it is choosing secp160r2

So, I *suspect* that perhaps secp160r2 was not tested on 64-bit builds
until the recent TLS change that made it start using the above set of 
6 curves.

(Note that the logic that chose that set of 6 was found to be basing its
decision on the wrong things (or an insufficient set of inputs), and will
be changing again shortly.  It's important that ALL the curves we implement
work on ALL the platforms.  Shame that our test script doesn't automatically
test all the curves on whatever platform they happen to be built on.)

Finally, I also found that there are still places in NSS's libSSL that 
think that a curve name is a single byte value.  Those will need to be 
fixed.  I really need to get a copy of the FINAL* draft of the TLS/ECC RFC.
Comment 11 Nelson Bolyard (seldom reads bugmail) 2006-04-15 18:27:51 PDT
In reply to the request for accl and accr values,  
at line 334 (before line 334 executed) the values were 
(from most significant word to least significant word):

(gdb) x /3xg accl->dp
0x80e00f8:      0xeb57803259ed4c1e      0x6b0a92fb677c0869
0x80e0108:      0x000000007a3698e6
(gdb) x /3xg accr->dp
0x80e19b0:      0x05f51536f1c7c439      0xca2a8b1a1b40361d
0x80e19c0:      0x00000000c4311535
(gdb) print group[0]
$10 = {
  constructed = 0,
  meth = 0x80e0308,
  text = 0x80d8030 "SECP-160R2",
  curvea = { sign = 0, alloc = 64, used = 5, dp = 0x80e35a8},
  curveb = { sign = 0, alloc = 64, used = 5, dp = 0x80e1d58},
  genx = {   sign = 0, alloc = 64, used = 5, dp = 0x80dfee8},
  geny = {   sign = 0, alloc = 64, used = 5, dp = 0x80dfff0},
  order = {  sign = 0, alloc = 64, used = 6, dp = 0x80e0200},
  cofactor = 1,
  point_add = 0x4037da4c <ec_GFp_pt_add_aff>,
  point_sub = 0x4037e020 <ec_GFp_pt_sub_aff>,
  point_dbl = 0x4037e0d2 <ec_GFp_pt_dbl_aff>,
  point_mul = 0x4038570c <ec_GFp_pt_mul_jm_wNAF>,
  base_point_mul = 0,
  points_mul     = 0x4037f5e1 <ec_GFp_pts_mul_jac>,
  validate_point = 0x4037e11f <ec_GFp_validate_point>,
  extra1 = 0x0,
  extra2 = 0x0,
  extra_free = 0
}

At line 335, accr contains:
(gdb) x /3xg accr->dp
0x80e19b0:      0xe5626afa68253458      0xa0e007e14c3bd24c
0x80e19c0:      0x00000000b60583b0
Comment 12 Nelson Bolyard (seldom reads bugmail) 2006-04-15 19:57:57 PDT
I found that mpi/target.mk has no provision for building 64-bit on linux
at all.  64-bit linux is not a defined target platform.   That makes me
suspect that only 32-bit builds and tests have ever been done before.
Comment 13 Nelson Bolyard (seldom reads bugmail) 2006-04-15 20:08:03 PDT
Created attachment 218569 [details] [diff] [review]
Add AMD64 Linux support to mpi/target.mk

Thispatch makes mpi build and run using mpi's own Makefile.
Trouble is, it doesn't use exactly the same set of source files
and -D options as freebl/Makefile.  When I use exactly the same
list of source files as freebl's Makefile, the build won't link.
Comment 14 Nelson Bolyard (seldom reads bugmail) 2006-04-15 20:10:27 PDT
I meant to add, with that patch, the ecp_test output says:

> Testing SECP-160R2 using specific implementation...
> ... okay.

For my next test, I'm going to try substituting a different curve for 
secp160r2 in ssl3ecc.c and see if that makes a difference.
Comment 15 Nelson Bolyard (seldom reads bugmail) 2006-04-15 21:48:34 PDT
Created attachment 218573 [details] [diff] [review]
hack to find problematic curves

I hacked the server side ECDHE code so that it would pick a different 
curve each time, and go through them all round-robin.  Then I hacked
the client side to report the curve number every time it had a problem
with the ECDHE key from the server.  Then I sorted and counted the 
results.  The looked like this:

     59 Derive Failed with curve 15
     59 Derive Failed with curve 16
     58 Derive Failed with curve 17
     58 Derive Failed with curve 18
     58 Derive Failed with curve 19

Those curves are:

               ec_secp160k1  = 15,
               ec_secp160r1  = 16,
               ec_secp160r2  = 17,
               ec_secp192k1  = 18,
               ec_secp192r1  = 19,

So, I think we have a little problem with those curves.
Comment 16 Robert Relyea 2006-04-17 15:15:22 PDT
The commonality of those curves are their size: they are all 3 64-bit words long.

in freebl/ecl/ecl_gf.c, there is a function GFMethod_consGFp which sets different add and subract functions based on the size of the curve.

In the add ec_GFp_add_3, ec_GFp_add_4, ec_GFp_sub_3, and ec_GFp_sub_4, there are some inline assembler functions.

They are under #ifndef MPI_AMD_64_ADD. try turning those ifdef's off and see if the problem goes away. If they go way for all but the 192r1 curve, then look try turning off the inline add code in ecp_192.c (ec_group_set_gfp192).

bob

Comment 17 Jason Reid 2006-04-18 12:56:33 PDT
Hit this again on 20060418 securitytip run.

nssamdrhel3	Linux2.4_x86_64_glibc_PTH_64_DBG.OBJ	1364 / 1412 Passed
SSL3 ECDHE RSA WITH NULL SHA 			Failed
SSL3 ECDHE RSA WITH RC4 128 SHA 		Failed
SSL3 ECDHE RSA WITH 3DES EDE CBC SHA 		Failed
SSL3 ECDHE RSA WITH AES 128 CBC SHA 		Failed
SSL3 ECDHE RSA WITH AES 256 CBC SHA 		Failed
TLS ECDHE RSA WITH NULL SHA 			Failed
TLS ECDHE RSA WITH RC4 128 SHA 			Failed
TLS ECDHE RSA WITH 3DES EDE CBC SHA 		Failed
TLS ECDHE RSA WITH AES 128 CBC SHA 		Failed
TLS ECDHE RSA WITH AES 256 CBC SHA 		Failed
Stress SSL3 ECDHE-RSA AES 128 CBC with SHA 	Failed
Stress TLS ECDHE-RSA AES 128 CBC with SHA 	Failed
SSL3 ECDHE RSA WITH NULL SHA 			Failed
SSL3 ECDHE RSA WITH RC4 128 SHA 		Failed
SSL3 ECDHE RSA WITH 3DES EDE CBC SHA 		Failed
SSL3 ECDHE RSA WITH AES 128 CBC SHA 		Failed
SSL3 ECDHE RSA WITH AES 256 CBC SHA 		Failed
TLS ECDHE RSA WITH NULL SHA 			Failed
TLS ECDHE RSA WITH RC4 128 SHA 			Failed
TLS ECDHE RSA WITH 3DES EDE CBC SHA 		Failed
TLS ECDHE RSA WITH AES 128 CBC SHA 		Failed
TLS ECDHE RSA WITH AES 256 CBC SHA 		Failed
Stress SSL3 ECDHE-RSA AES 128 CBC with SHA 	Failed
Stress TLS ECDHE-RSA AES 128 CBC with SHA 	Failed
SSL3 ECDHE RSA WITH NULL SHA 			Failed
SSL3 ECDHE RSA WITH RC4 128 SHA 		Failed
SSL3 ECDHE RSA WITH 3DES EDE CBC SHA 		Failed
SSL3 ECDHE RSA WITH AES 128 CBC SHA 		Failed
SSL3 ECDHE RSA WITH AES 256 CBC SHA 		Failed
TLS ECDHE RSA WITH NULL SHA 			Failed
TLS ECDHE RSA WITH RC4 128 SHA 			Failed
TLS ECDHE RSA WITH 3DES EDE CBC SHA 		Failed
TLS ECDHE RSA WITH AES 128 CBC SHA 		Failed
TLS ECDHE RSA WITH AES 256 CBC SHA 		Failed
Stress SSL3 ECDHE-RSA AES 128 CBC with SHA 	Failed
Stress TLS ECDHE-RSA AES 128 CBC with SHA 	Failed
SSL3 ECDHE RSA WITH NULL SHA 			Failed
SSL3 ECDHE RSA WITH RC4 128 SHA 		Failed
SSL3 ECDHE RSA WITH 3DES EDE CBC SHA 		Failed
SSL3 ECDHE RSA WITH AES 128 CBC SHA 		Failed
SSL3 ECDHE RSA WITH AES 256 CBC SHA 		Failed
TLS ECDHE RSA WITH NULL SHA 			Failed
TLS ECDHE RSA WITH RC4 128 SHA 			Failed
TLS ECDHE RSA WITH 3DES EDE CBC SHA 		Failed
TLS ECDHE RSA WITH AES 128 CBC SHA 		Failed
TLS ECDHE RSA WITH AES 256 CBC SHA 		Failed
Stress SSL3 ECDHE-RSA AES 128 CBC with SHA 	Failed
Stress TLS ECDHE-RSA AES 128 CBC with SHA 	Failed

output.log: fragments are like this.
ssl.sh: running SSL3 ECDHE RSA WITH NULL SHA ----------------------------
tstclnt -p 8444 -h nssamdrhel3.red.iplanet.com -c :C010 -T -B -s \
        -f -d ../client < /share/builds/mccrel3/security/securitytip/builds/20060418.1/wozzeck_Solaris8/mozilla/security/nss/tests/ssl/sslreq.dat
tstclnt: write to SSL socket failed: Unspecified failure while processing SSL Client Key Exchange handshake.
Comment 18 Robert Relyea 2006-04-19 16:29:16 PDT
I've verified that the current code works on:
RHEL4 64 bit optimized Intel
RHEL4 64 bit debug Intel
RHEL4 64 bit debug Opteron

I'm currently running the tests for RHEL4 64 bit optimized Opteron.

At this point it looks like a RHEL3 issue or a local machine tools issue.
My guess is that the gcc compiler on RHEL3 isn't handling the inline assembler correctly. I'll attach a patch shortly that will turn the inline assembler off, but I need someone to test it on a RHEL3 box.

bob
Comment 19 Robert Relyea 2006-04-19 16:31:05 PDT
Created attachment 219081 [details] [diff] [review]
Turn off Inline assemble for Linux.
Comment 20 Nelson Bolyard (seldom reads bugmail) 2006-04-19 17:35:59 PDT
Thanks, Bob.
So, if this is a compiler problem, what compiler version do we need?
Comment 21 Robert Relyea 2006-04-19 17:38:07 PDT
I'm running gcc 3.4.4 .

Comment 22 Nelson Bolyard (seldom reads bugmail) 2006-04-19 18:18:53 PDT
(In reply to comment #18)
> I've verified that the current code works on:
> RHEL4 64 bit optimized Intel
> RHEL4 64 bit debug Intel
> RHEL4 64 bit debug Opteron

Bob, what is your test method?  How are you testing?  ssl.sh?  or ?
Comment 23 Robert Relyea 2006-04-20 09:27:41 PDT
yes, I was running all.sh. Hmm, my tests were against the trunk, not the 3.11 branch. I'll go verify them as well, though the relevant code is in both.

bob
Comment 24 Alexei Volkov 2006-04-20 20:05:49 PDT
Ran tests on RHEL3 64 opteron dbg/opt ( 3.2.3). Last patch (from Bob) fixes the problem reported in this bug. Problem reported in bug 334522 still exists.
Comment 25 Christophe Ravel 2006-04-21 13:47:39 PDT
We see this failure (before the patch #3 was applied) on the following platforms:

- RHEL 3.0 U3 64/DBG (nssamdrhel3)
   $ cat /etc/redhat-release
   Red Hat Enterprise Linux AS release 3 (Taroon Update 3)
   $ rpm -q gcc
   gcc-3.2.3-39
   $ rpm -q glibc
   glibc-2.3.2-95.24
   glibc-2.3.2-95.24

- RHEL 3.0 U6 64/DBG (attic)
   $ cat /etc/redhat-release
    Red Hat Enterprise Linux AS release 3 (Taroon Update 6)
    $ rpm -q gcc
    gcc-3.2.3-53
    $ rpm -q glibc
    glibc-2.3.2-95.37
    glibc-2.3.2-95.37

- RHEL 4.0 U1 64/DBG (nssamdrhel4)
   $ cat /etc/redhat-release
   Red Hat Enterprise Linux AS release 4 (Nahant Update 1)
   $ rpm -q gcc
   gcc-3.4.3-22.1
   $ rpm -q glibc
   glibc-2.3.4-2.9
   glibc-2.3.4-2.9
Comment 26 Wan-Teh Chang 2006-04-21 14:23:06 PDT
Christophe:

How did you build the binary that you ran on
RHEL 4.0 U1 64/DBG (nssamdrhel4)?  Was that
binary built on RHEL 3 or RHEL 4?

If that binary was built on RHEL 4, then the
failures on RHEL 4.0 U1 64/DBG (nssamdrhel4)
are important new information.
Comment 27 Christophe Ravel 2006-04-24 16:38:30 PDT
The previous tests where run with a build on nssamdrhel3 (RHEL 3.0 U3)

More tests - more results:

* Build on touquet (RHEL 3.0 U7 - the latest and greatest RHEL 3.0)
  gcc-3.2.3-54
  glibc-2.3.2-95.39

  - test on nssamdrhel3 (RHEL 3.0 U3): fail
  - test on touquet (RHEL 3.0 U7): fail

* Build on nssamdrhel4 (RHEL 4.0 U1)
  gcc-3.4.3-22.1
  glibc-2.3.4-2.9

  - test on nssamdrhel4 (RHEL 4.0 U1): fail
Comment 28 Nelson Bolyard (seldom reads bugmail) 2006-04-24 22:52:25 PDT
Comment on attachment 219081 [details] [diff] [review]
Turn off Inline assemble for Linux.

Committed on trunk
lib/freebl/Makefile; new revision: 1.82; previous revision: 1.81
Comment 29 Christophe Ravel 2006-04-25 20:15:58 PDT
I have tested libfreebl3.so built with gcc 3.4.4-2 and glibc 2.3.4-2.13 provided by Wan-Teh on nssamdrhel4 (RHEL 4.0 U1) with all.sh on 64bit/debug/ECC.
The tests are still failing for ECDHE RSA.

This experiment shows that this bug is not related to the
compiler.
Comment 30 Alexei Volkov 2006-04-25 23:45:22 PDT
Attachment #219081 [details] [diff] to 3.11 branch:
/cvsroot/mozilla/security/nss/lib/freebl/Makefile,v  <--  Makefile
new revision: 1.70.2.8; previous revision: 1.70.2.7
Comment 31 Wan-Teh Chang 2006-05-05 16:33:39 PDT
Someone at Sun might want to manually enable the code
ifdef'ed with MPI_AMD64_ADD one by one to find out
which one causes the test failure.  I would do that,
but we can't reproduce this test failure at Red Hat.

There are 10 MPI_AMD64_ADD ifdef's in
mozilla/security/nss/lib/freebl/ecl/ecl_gf.c, and
7 MPI_AMD64_ADD ifdef's in
mozilla/security/nss/lib/freebl/ecl/ecp_192.c.  Since
the test fails for 160 bit curve as well, most likely
it is caused by one of the 10 ifdef's in
ecl_gf.c.
Comment 32 Nelson Bolyard (seldom reads bugmail) 2006-05-18 21:23:08 PDT
Do we still see this?  Or is it fixed now?
Comment 33 Nelson Bolyard (seldom reads bugmail) 2006-05-27 17:03:28 PDT
Marking worksforme.
If anyone sees this again, please reopen this bug.
Comment 34 Wan-Teh Chang 2006-05-28 20:59:34 PDT
The proper resolution for this bug is FIXED.  It's fixed
by the third attachment in this bug.
Comment 35 Nelson Bolyard (seldom reads bugmail) 2007-05-06 20:48:23 PDT
Reopening to correct the resolution per comment 34 and changing the assignee 
to show that Bob fixed it (see comment 19).
Comment 36 Nelson Bolyard (seldom reads bugmail) 2007-05-06 20:49:06 PDT
resolved: fixed.

Note You need to log in before you can comment on or make changes to this bug.