I had thought that the recent QA problems were some flutter in the way we test things, but this no longer appears to be the case. Thus I'm filing a bug. Here is what I know so far. (I'm substituing "accessor" for "database key" to avoid confusion) When certutil generates the DSA key pair, it stores the private key in the database using the public key as the accessor. At this time, the public key does not ever appear to have a leading 0. When pk12util attempts to retrieve the key, it occassionally has a leading 0. When it does, retrieval fails. Adding code that says "if you don't find the key and the public key has a leading 0, try with an accessor that is the public key without the leading 0" causes retrieval to work. However, later on, pk12util fails to import the key. I know that somewhere deep in my memory is the reason why we sometimes add leading 0's, and it has something to do with when the first byte exceeds a certain value. But I can't for the life of me remember exactly what it is. At any rate, the code needs to be consistent. It must either store the key with a leading zero, or not expect it to be there when it goes looking for it. Note, of course, that the prescence/absence of a leading 0 does not affect the math.
Ian, does this mean your previous fix of reading the password file once and leaving the password in memory is not the right fix? Bob, could you help Ian investigate this bug?
Way back in the early days, we had a bignum library which we had aquired from a vendor. That bignum library was a signed library (it would treat the various values as signed values), therefore all our code would be care to make sure that all our values were positive. PKCS #11 uses unsigned values. There is code in the PKCS #11 wrappers and softoken to convert from signed to unsigned and back. Now our big num library operates with unsigned values, but our der decoder and our legacy databases still store the signed values. So how do we turn an unsigned value into a signed value? signed: if the most significant bit is '1' then we add a leading '0'. (That will explain your 1 in 10 or so failure rate). bob
I still have some questions: 1. I know our RSA code did the same thing. Why are we only having database problems with DSA? 2. How do we fix it? Do we remove the code that prepends the leading 0? I assume that would have wide-ranging consequences. Do you think my workaround is correct? If so, I will start investigating why it failed later on.
The RSA Modulus always has the high bit set. I don't think that is necessary true of DSA public keys. bob
I may have jumped the gun when I said my workaround failed. It did; but that may have been local environment issues when I ran the test. I have since set the QA off and running, and it has completed over 50 times successfully. However, I forgot to put a log message in to let me know when my workaround was used to verify that it functions correctly. I'm redoing it now. If it does work, I'll attach the patch showing what I did.
ok, it didn't work. But that is because of *another* bug. See 95150. I now believe that this workaround is correct for this case. Thus, patch coming up.
see comment in bug 95150 r=relyea
checked in. waiting for tomorrow's QA before resolving.
Julien, please verify the target milestone and mark this bug fixed.
Sorry, I meant to say Ian, please verify the target milestone and mark this bug fixed.
fixed in 3.2.2.
The fix is not in 3.3 but is in 3.3.1.
Reopening bug because it exists again, on the tip (it is the reason for the intermittent QA failures, about 1 in 10 times again). Bob, there are now four calls to nsslowkey_FindKeyByPublicKey. Either that function needs this patch, or every place that calls it does. Actually, it is needed for at least three functions: http://lxr.mozilla.org/mozilla/ident?i=nsslowkey_FindKeyByPublicKey http://lxr.mozilla.org/mozilla/ident?i=nsslowkey_FindKeyNicknameByPublicKey http://lxr.mozilla.org/mozilla/ident?i=nsslowkey_DeleteKey Can you take care of this?
Yes, I want to take a look at when and how the failures happen. Many of the new calls have a key read from the database now, not calculated. I think I need to look more closely at the get public value calls. bob
Created attachment 57481 [details] script that reveals bug by generating DSA certs until failure You can use this script to find the bug (it's what I used)
OK, I've completed the fix. The problem in this case wasn't the lookup code, but the storage code. I'm now up to 200 reps and counting. bob
OK the fix is checked in...
Adding myself to cc list. Wish I'd seen this bug long ago. I'll send comments about this bug and bug 115360 in private email