I had thought that the recent QA problems were some flutter in the way we test
things, but this no longer appears to be the case. Thus I'm filing a bug. Here
is what I know so far.
(I'm substituing "accessor" for "database key" to avoid confusion)
When certutil generates the DSA key pair, it stores the private key in the
database using the public key as the accessor. At this time, the public key does
not ever appear to have a leading 0.
When pk12util attempts to retrieve the key, it occassionally has a leading 0.
When it does, retrieval fails. Adding code that says "if you don't find the key
and the public key has a leading 0, try with an accessor that is the public key
without the leading 0" causes retrieval to work. However, later on, pk12util
fails to import the key.
I know that somewhere deep in my memory is the reason why we sometimes add
leading 0's, and it has something to do with when the first byte exceeds a
certain value. But I can't for the life of me remember exactly what it is.
At any rate, the code needs to be consistent. It must either store the key with
a leading zero, or not expect it to be there when it goes looking for it.
Note, of course, that the prescence/absence of a leading 0 does not affect the math.
Ian, does this mean your previous fix of reading the password
file once and leaving the password in memory is not the right
Bob, could you help Ian investigate this bug?
Way back in the early days, we had a bignum library which we had aquired from a
vendor. That bignum library was a signed library (it would treat the various
values as signed values), therefore all our code would be care to make sure that
all our values were positive. PKCS #11 uses unsigned values. There is code in
the PKCS #11 wrappers and softoken to convert from signed to unsigned and back.
Now our big num library operates with unsigned values, but our der decoder and
our legacy databases still store the signed values.
So how do we turn an unsigned value into a signed value? signed: if the most
significant bit is '1' then we add a leading '0'. (That will explain your 1 in
10 or so failure rate).
I still have some questions:
1. I know our RSA code did the same thing. Why are we only having database
problems with DSA?
2. How do we fix it? Do we remove the code that prepends the leading 0? I
assume that would have wide-ranging consequences. Do you think my workaround is
correct? If so, I will start investigating why it failed later on.
The RSA Modulus always has the high bit set. I don't think that is necessary
true of DSA public keys.
I may have jumped the gun when I said my workaround failed. It did; but that
may have been local environment issues when I ran the test. I have since set
the QA off and running, and it has completed over 50 times successfully.
However, I forgot to put a log message in to let me know when my workaround was
used to verify that it functions correctly. I'm redoing it now.
If it does work, I'll attach the patch showing what I did.
ok, it didn't work. But that is because of *another* bug. See 95150.
I now believe that this workaround is correct for this case. Thus, patch coming up.
Created attachment 45692 [details] [diff] [review]
see comment in bug 95150
checked in. waiting for tomorrow's QA before resolving.
Julien, please verify the target milestone and
mark this bug fixed.
Sorry, I meant to say
Ian, please verify the target milestone and
mark this bug fixed.
fixed in 3.2.2.
The fix is not in 3.3 but is in 3.3.1.
Reopening bug because it exists again, on the tip (it is the reason for the
intermittent QA failures, about 1 in 10 times again). Bob, there are now four
calls to nsslowkey_FindKeyByPublicKey. Either that function needs this patch,
or every place that calls it does.
Actually, it is needed for at least three functions:
Can you take care of this?
Yes, I want to take a look at when and how the failures happen. Many of the new
calls have a key read from the database now, not calculated. I think I need to
look more closely at the get public value calls.
Created attachment 57481 [details]
script that reveals bug by generating DSA certs until failure
You can use this script to find the bug (it's what I used)
OK, I've completed the fix. The problem in this case wasn't the lookup code, but
the storage code. I'm now up to 200 reps and counting.
OK the fix is checked in...
Adding myself to cc list. Wish I'd seen this bug long ago.
I'll send comments about this bug and bug 115360 in private email