Closed Bug 486304 Opened 16 years ago Closed 16 years ago

cert7.db/cert8.db "corruption" when importing a large certificate (>64K)

Categories

(NSS :: Libraries, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED
3.12.4

People

(Reporter: mozbgz, Assigned: nelson)

Details

(Whiteboard: FIPS)

Attachments

(4 files, 1 obsolete file)

Importing a (relatively) large certificate into cert8.db fails, and will corrupt cert8.db. The attached certificate can be used to reproduce the problem. With trunk (3.12.x), I get: C:\>certutil -d mydb -A -t ,, -n foo -i mozchomp.crt certutil: could not add certificate to token or database: Error adding certificate to database. ... while 3.11.9 just fails silently. In both cases, the cert8.dir subdirectory is created, with a 90564-byte file holding a blob with this certificate (as expected for records larger than DBS_MAX_ENTRY_SIZE). Trying to list the cert8.db contents afterwards, however, just gives an empty result. And while it's possible to add additional certs to the DB, they silently "disappear" - i.e. the record is apparently added (judging from the hex dump of cert8.db), but it can't be retrieved... making the DB read-only from that point on, effectively. What's most worrisome, however, is the behavior when trying to add such a certificate to a cert8.db with existing entries - it will get corrupted, and you may lose access to some of the existing certificates (in one of my tests almost half of them - depending on where the new entry is added into the Berkeley DB's hash structure, I assume). The issue seems to be limited to cert entries, as I could not reproduce similar failures when importing CRLs (I used a sample CRL with ~750 KByte). And as a last observation, the problem does not exist with the sqlite backend (NSS_DEFAULT_DB_TYPE=sql), fortunately enough.
Assignee: nobody → rrelyea
Priority: -- → P1
Whiteboard: FIPS
Target Milestone: --- → 3.12.4
I tested this using a tip of trunk certutil and a test DB that had 14 certs. I got the error message > certutil: could not add certificate to token or database: Error adding > certificate to database. after which, a certutil -L showed only 7 of the previous 14 certs. :( I guess the next step is to trace through it. (BTW, I like the .gif file in the cert. :)
OK, I hunted this down. The first problem is with the format of an "encoded" cert DB "entry". The length of a certificate is only a 16-bit value there. Even when we write the certificate as a "blob", the blob contains an "encoded DB cert entry", with an explicit length that is only 16 bits. The problem is seen in two functions in file pcertdb.c. 1) in EncodeDBCertEntry(), 646 buf[6] = ( entry->derCert.len >> 8 ) & 0xff; 647 buf[7] = entry->derCert.len & 0xff; 2) in DecodeDBCertEntry() 760 /* is database entry correct length? */ 761 entry->derCert.len = ( ( dbentry->data[lenoff] << 8 ) | 762 dbentry->data[lenoff+1] ); 763 nnlen = ( ( dbentry->data[lenoff+2] << 8 ) | dbentry->data[lenoff+3] ); 764 if ( ( entry->derCert.len + nnlen + headerlen ) 765 != dbentry->len) { 766 PORT_SetError(SEC_ERROR_BAD_DATABASE); <== failure is here 767 goto loser; 768 } I think I can fix this with a heuristic that will not require any change to the encoded cert entry format, because we also have the overall length of the encoded entry. I think we can reconstruct the correct derCert.len with the information at hand. The second question/issue is: why does this make the DB behave as if it was corrupted? and is it really corrupted? If not, if the only problem is the incorrect length in the blob, then it should not cause the DB to seem to lose other certs in the DB. This is bad. I will continue my investigation.
The reason for the second problem, the appearance that the cert DB has lost entries, rather than merely having one bad entry, is in the function nsslowcert_TraverseDBEntries() (and maybe in other similar functions). It stops the traversal if the attempt to handle any one cert fails. 4239 do { ... 4252 rv = (* callback)(&dataitem, &keyitem, type, udata); 4253 if ( rv != SECSuccess ) { 4254 return(rv); 4255 } 4256 } 4257 } while ( certdb_Seq(handle->permCertDB, &key, &data, R_NEXT) == 0 ); removing lines 4253-4257 makes the DB appear complete and behave in a more normal fasion. I'm not sure what the best way is to deal with that. Maybe it is to merely count the errors, and only return an error result if there was NO successfull call to (*callback). Bob, what do you think?
Attached patch proposed patch v1 (obsolete) — Splinter Review
This patch solves both problems described above. Bob, please review for NSS 3.12.4
Assignee: rrelyea → nelson
Status: NEW → ASSIGNED
Attachment #370518 - Flags: review?(rrelyea)
Attachment #370518 - Flags: review?(rrelyea) → review+
Comment on attachment 370518 [details] [diff] [review] proposed patch v1 r+ rrelyea So the issue was: 1) we couldn't handle entries bigger that 64K (the resulting entry would always show an error). 2) once we had an error in an entry, we would never find the other entries. So the database itself is not 'corrupted', but we couldn't read such a database. This makes more sense to me than the earlier proposition that somehow we weren't going through the 'entry too large' code for the certs, since I was pretty sure that code was a shim dropped into almost the last layer before the db call.
That previous patch didn't really do what I wanted. This one does, I believe. The difference between the two is the value finally returned by nsslowcert_TraverseDBEntries. The first patch returns SECFailure if ANY call to the callback fails. It returns SECSuccess if either a) no calls to the callback were made, or b) all calls to the callback succeeded. The second patch returns SECSuccess if a) no calls to the callback were made, or b) ANY call to the callback succeeded. It returns SECFailure only if ALL calls to the callback failed. That's what I originally intended for it to do. Bob, please decide which of these alternatives is best.
Attachment #370548 - Flags: review?(rrelyea)
Kaspar, do you think anyone cares enough about 3.11.x, going forward, that we should try to carry this fix back to the 3.11 branch?
Thanks for promptly looking into this, Nelson. (In reply to comment #7) > Kaspar, do you think anyone cares enough about 3.11.x, going forward, > that we should try to carry this fix back to the 3.11 branch? As a Thunderbird 2.x user, I still do, to a certain extent... as do the Seamonkey 1.x folks, probably ;-) For Firefox, it's true that it's less of a concern given the fact that 2.x was EOL'd last December.
Comment on attachment 370548 [details] [diff] [review] Alternative patch (checked in) As I look more closely, in both patches, the overloading 'ret' can be confusing. It's both the return code from dbm and a counter of successes or failures. In the existing code it requires constructing a logic map to convince myself that it does what you expect. That map should probably be in a comment. Finally, currently this function returns failure if it can't find the first entry, so there is one case where the database is completely empty, that it would return a failure (that may not be a problem if we always have some record in a database). Any way, I'll r+ if you add a comment explaining that 1) ret is initialized to zero implicitly by the previous call (before the loop), so if ret is non-zero some call back call succeeded (return true). 2) if ret is zero and no call backs were made, rv will not change from it's initialized value of SECSuccess. 3) if ret is zero and any callbacks were made, we know that all the callbacks failed. rv is set to the last callback value. Since they all failed, the last one also failed. I'm ok with the semantic
Attachment #370548 - Flags: review?(rrelyea) → review+
(BTW feel free to cast the comments in your own words, just as long as they contain the appropriate logic). bob
Turns out that this issue has been present ever since NSS 3 was released (which also means that 64K certs are definitely rarely seen in the wild, otherwise I assume that it wouldn't have gone unnoticed that long :-). Similar code for length checks is also found in DecodeDBSubjectEntry - I guess this should be adapted as well, right? Let me know if you would like to see a sample certificate for testing this.
Summary: cert8.db corruption when trying to import a large certificate → cert7.db/cert8.db "corruption" when importing a large certificate (>64K)
Version: trunk → 3.0
Kaspar, If you're volunteering to make a cert with a 64K+ subject name, Please do, and please file a separate bug for that. Thanks. However, that bug may get "WONTFIX"ed. Now that we have cert9, we're not really trying to perfect cert8. There won't be a cert8.1 :) But if it's as easy to fix as this one, then I expect it will get fixed about as quickly.
I believe a subject entry doesn't have the subject itself, but a list of issuer/Serial numbers indexed by that subject. That means you need to create N certs with the same subject such that the sum of the issuer/serial numbers (encoded) is greater that 64k. If you create an intermediate CA with a 1K subject, then issue 64 differrent certificates from that intermediate with the same subject, that should do the trick.;). bob
A single issuer name larger than 64K will suffice to meet the criterion that "the sum of the issuer/serial numbers (encoded) is greater that 64k." But also, consider that the DB size limitations apply to DB keys as well as to DB "entries".
Well, a certificate with a 64K subject DN signed by attachment 370394 [details] actually does the trick. NSS will write blobs to cert8.dir for both the cert and the subject. With the "alternative patch" applied, I can then list the certificate, but sadly, I can no longer get at it with certutil ;-) C:\>certutil -L -d mydb Certificate Nickname Trust Attributes SSL,S/MIME,JAR/XPI mozchomp2 ,, C:\>certutil -L -d mydb -n mozchomp2 certutil: Could not find: mozchomp2 : security library: bad database. (In reply to comment #12) > Please do, and please file a separate bug for that. Thanks. > However, that bug may get "WONTFIX"ed. I don't like filing bugs which might get WONTFIXed :-) But anyway, why should it be a separate bug? It's again a 64K+ certificate, just a somewhat more peculiar one... actually one which more closely follows the idea in that (in)famous posting from Bob Jueneman in 1997: > Therefore, if I choose to include as one of the RDN components of the DN > (perhaps as a component of a multi-valued RDN for the leaf node) a 1 > gigabyte MPEG movie of me playing with my cat, that is perfectly allowable > per the present standard, no matter how unwise or brain-dead such an > approach might be. (http://www.cs.auckland.ac.nz/~pgut001/pubs/dave.der doesn't really count, the movie is only in an extension ;-)
Please attach mozchomp2 to this bug. I'll at least look at it.
(In reply to comment #16) > Please attach mozchomp2 to this bug. Ok, here it is. Doesn't include a 1 gigabyte MPEG movie (but the 90K GIF instead, which you said you like in comment 1 :-).
Attachment #370518 - Attachment is obsolete: true
Comment on attachment 370548 [details] [diff] [review] Alternative patch (checked in) I added additional comments as Bob requested and committed it. pcertdb.c; new revision: 1.8; previous revision: 1.7
Attachment #370548 - Attachment description: Alternative patch → Alternative patch (checked in)
With this patch, I'm able to list the contents of mozchomp2. I also fixed some other 16-bit length problems that I found by code inspection.
Comment on attachment 371634 [details] [diff] [review] patch part 2, fix a few more length errors, v1 (checked in) The problem that is shown with Kaspar's two existing certs is actually a problem with a cert 7/8 db "nickname" record being too long. This patch also fixes problems with the SMIME record being too long and with the CRL entry being too long. (I'm really stunned that we haven't run into the CRL problem before, because CRLs larger than 64KB are not uncommon.) Kaspar and/or Bob, please review and/or test.
Attachment #371634 - Flags: superreview?(rrelyea)
Attachment #371634 - Flags: review?(mozbugzilla)
Comment on attachment 371634 [details] [diff] [review] patch part 2, fix a few more length errors, v1 (checked in) Note: we are still assuming urls, smimeOptions and optionsDates cannot go over 64k. It's unlikely that they will, and we have no recourse if they do...... but we should at least remember the possibility. bob
Comment on attachment 371634 [details] [diff] [review] patch part 2, fix a few more length errors, v1 (checked in) Hmmm, I must have forgotten to set the + flag when I made my comment... r+
Attachment #371634 - Flags: superreview?(rrelyea) → superreview+
re: CRL problem We did run into the CRL problem -- long ago. The current code has similar functionality to what you added (even a comment that says 'CRL entry is greater than 64 K. Hack to make this continue to work'). Your code is functionally equivalent to the code you replaced, but now that that test is prevalent through the code, it's best to keep it the same form (rather than 2 different versions that do the same thing but don't look the same). It's also pretty clear that when we ran into this case initially for CRL's, we made the same choice as we have today (calculate the correct length if the length overflows our 16 bit length value). bob
Comment on attachment 371634 [details] [diff] [review] patch part 2, fix a few more length errors, v1 (checked in) Checking in pcertdb.c; new revision: 1.9; previous revision: 1.8
Attachment #371634 - Flags: review?(mozbugzilla)
Attachment #371634 - Attachment description: patch part 2, fix a few more length errors, v1 → patch part 2, fix a few more length errors, v1 (checked in)
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: