Open Bug 1659659 Opened 5 years ago Updated 5 years ago

libsoftoken causing memory leak in the given flow from certcallback pcertdb.c

Categories

(NSS :: Libraries, defect, P5)

Tracking

(Not tracked)

People

(Reporter: raghavb1979, Unassigned)

Details

Attachments

(2 files)

Attached file memleak_0208appsrv.log

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0) Gecko/20100101 Firefox/78.0

Steps to reproduce:

This is PKCS11 flow causing memory leak, PKCS11 certificate FindObjects were initiated from java and causing this memory leak, The full stack trace is shown below.

0x00007f97832ee790  libc-2.17.so  __GI___libc_malloc()+0
0x00007f975130f1d0  libnssutil3.so  PORT_Alloc_Util()+32
0x00007f975035eaf7  libnssdbm3.so  pkcs11_copyStaticData()+23
0x00007f975035ebb6  libnssdbm3.so  DecodeDBCertEntry.isra.30()+150
0x00007f975035ed5a  libnssdbm3.so  certcallback()+154
0x00007f975035f9d4  libnssdbm3.so  nsslowcert_TraverseDBEntries()+308
0x00007f975035fa7f  libnssdbm3.so  nsslowcert_TraversePermCerts()+79
0x00007f9750359959  libnssdbm3.so  lg_searchTokenList.constprop.5()+3641
0x00007f9750359a1a  libnssdbm3.so  lg_FindObjectsInit()+106
0x00007f9750a940a1  libsoftokn3.so  sftkdb_FindObjectsInit()+81
0x00007f9750a7b23e  libsoftokn3.so  sftk_searchDatabase()+62
0x00007f9750a805fe  libsoftokn3.so  NSC_FindObjectsInit()+302
0x00007f9751862003  libj2pkcs11.so     Java_sun_security_pkcs11_wrapper_PKCS11_C_1FindObjectsInit()+147

Actual results:

GNU tools like memleax reporting memory leak in the path listed above. It shows the leak
The below trace shows that 2122 bytes caused leak. over a period of time the leak goes and goes on, we need to control this leak ASAP, customers are waiting on this bug.

CallStack[15]: memory expires with 2122 bytes, backtrace:
0x00007f97832ee790 libc-2.17.so __GI___libc_malloc()+0
0x00007f975130f1d0 libnssutil3.so PORT_Alloc_Util()+32
0x00007f975035eaf7 libnssdbm3.so pkcs11_copyStaticData()+23
0x00007f975035ebb6 libnssdbm3.so DecodeDBCertEntry.isra.30()+150
0x00007f975035ed5a libnssdbm3.so certcallback()+154
0x00007f975035f9d4 libnssdbm3.so nsslowcert_TraverseDBEntries()+308
0x00007f975035fa7f libnssdbm3.so nsslowcert_TraversePermCerts()+79
0x00007f9750359959 libnssdbm3.so lg_searchTokenList.constprop.5()+3641
0x00007f9750359a1a libnssdbm3.so lg_FindObjectsInit()+106
0x00007f9750a940a1 libsoftokn3.so sftkdb_FindObjectsInit()+81
0x00007f9750a7b23e libsoftokn3.so sftk_searchDatabase()+62
0x00007f9750a805fe libsoftokn3.so NSC_FindObjectsInit()+302
0x00007f9751862003 libj2pkcs11.so Java_sun_security_pkcs11_wrapper_PKCS11_C_1FindObjectsInit()+147
0x00007f9771f19fce ??

Expected results:

The memory should have freed properly. The certcallback finally calls the DestroyCertificate static function in pcertdb.c. But when the refCount becomes zero still the static data is not deleted as the arena is still not cleared. In that case when the static data will be released, there is no clue. Kindly check that flow as well. Released patch covered the false path for sure, But the above scenario also needs to be taken care. If we fix this issue we are done I think.

please analyse the log attached from the below attached file mycertdb_clear_1.log which just gives the log from DestroyCertificate and DestroyDBEntry. In the case where the arena not NULL in mycertdb_clear_1.log we can see if we call the DestroyDBEntry in that since arena is not NULL then the free will not happen.

if (arena == NULL) {
certDBEntryCert *certEntry;
if (entry->common.type != certDBEntryTypeCert) {
return;
}
certEntry = (certDBEntryCert *)entry;

    pkcs11_freeStaticData(certEntry->derCert.data, certEntry->derCertSpace);
    pkcs11_freeNickname(certEntry->nickname, certEntry->nicknameSpace);

    nsslowcert_LockFreeList();
    if (entryListCount > MAX_ENTRY_LIST_COUNT) {
        PORT_Free(certEntry);
    } else {
        entryListCount++;
        PORT_Memset(certEntry, 0, sizeof(*certEntry));
        certEntry->next = entryListHead;
        entryListHead = certEntry;
    }
    nsslowcert_UnlockFreeList();
    return;
}

/* Zero out the entry struct, so that any further attempts to use it
 * will cause an exception (e.g. null pointer reference). */
PORT_Memset(&entry->common, 0, sizeof entry->common);
PORT_FreeArena(arena, PR_FALSE);

if arena is not NULL then it simply clears the arena but sets the entry->common to NULL, So in such cases there is no way you will free

pkcs11_freeStaticData(certEntry->derCert.data, certEntry->derCertSpace);
pkcs11_freeNickname(certEntry->nickname, certEntry->nicknameSpace);

because now the entry->common.type is also reset to 0, If if you call this same function for the second time there is now way you will delete this static data and the memory leak is created. Kindly check this flow and let me know.
If you have any 'C' program in this flow it is very easy to analyse too to see this memory leak happening.

The work around I did was like this but the double free is happening

must be one of our certDBEntry from the free list */
if (arena == NULL) {
certDBEntryCert *certEntry;
if (entry->common.type != certDBEntryTypeCert) {
return;
}
certEntry = (certDBEntryCert *)entry;

    pkcs11_freeStaticData(certEntry->derCert.data, certEntry->derCertSpace);
    pkcs11_freeNickname(certEntry->nickname, certEntry->nicknameSpace);

    nsslowcert_LockFreeList();
    if (entryListCount > MAX_ENTRY_LIST_COUNT) {
        PORT_Free(certEntry);
    } else {
        entryListCount++;
        PORT_Memset(certEntry, 0, sizeof(*certEntry));
        certEntry->next = entryListHead;
        entryListHead = certEntry;
    }
    nsslowcert_UnlockFreeList();
    return;
}
else {
          // it is not from freelist so simply destroy it
          certDBEntryCert *certEntry;
          if (entry->common.type == certDBEntryTypeCert) {
             certEntry = (certDBEntryCert *)entry;
             sprintf(buf,"Entry in %s Freeeing arena: %p data: %p nickname: %p Line: %d",__FUNCTION__,arena,certEntry->derCert.data,certEntry->nickname,__LINE__);
             debug_log(3,buf);
             pkcs11_freeStaticData(certEntry->derCert.data, certEntry->derCertSpace);
             pkcs11_freeNickname(certEntry->nickname, certEntry->nicknameSpace);
          }
}
Attached file mycertdb_clear_1.log

This file proves that DestriyCertificate before calling the DestroyDBEntry the arena is still holding a valid value. So the memory within the entry is going for a toss.

Severity: -- → S1
Priority: -- → P1

This bug should be easy to fix from your side as you know the code very well.

Also check the other leaks on PORT_ARENA only related to this flow.

(In reply to Raghavendran Bhiman from comment #4)

Also check the other leaks on PORT_ARENA only related to this flow attached in the memleak_0208appsrv.log
so that your product will also be more stable.

This bug we are facing in in CISCO ISE.

Flags: needinfo?(nobody)

Resetting priority and severity so we don't miss this during triage.

Severity: S1 → --
Flags: needinfo?(nobody)
Priority: P1 → --

Can you take a triage pass at this one, Kevin?

Flags: needinfo?(kjacobs.bugzilla)
Attachment #9170610 - Attachment mime type: application/octet-stream → text/plain
Attachment #9170609 - Attachment mime type: text/x-log → text/plain

I have only taken a high-level look at this, but it appears that the leak occurs in libdbm's mishandling of certificates during traversal. If it really is DBM's fault, we would consider accepting a patch, but we won't fix it ourselves. In NSS version 3.12 (2008) we began shipping a newer database implementation based on SQLite, deprecating DBM, and made it the default in NSS 3.35 in 2018. DBM support has been deprecated for 12 years now, and is unmaintained. All users have been encouraged for more than a decade to migrate to the SQLite format database.

I'll await Kevin's analysis before taking any action, as I do know we have had some PKCS11 traversal leaks (which we would fix), which this could still be, potentially.

This patch is having the problem of double free which we want to solve. for that I need your help. Do you want us to migrate to sqlite type database? do you have any pointer how to compile for migrating to sqlite type dabase to avoid libnssdbm3.so library? Thanks for the understanding.

if you could give the link for migrating the libraries to avoid libnssdbm3.so that will be great.

I concur with J.C.'s assessment - the code at fault is within the deprecated DBM library.

For an existing database, the migration should be automatic on newer versions of NSS. Note that for NSS 3.49+ gyp builds, the --enable-legacydb flag is required in order to compile-in the DBM code (which will be required in order to upgrade an existing database). If you only need to create a new database, things are easier.

Here are two links that you might find helpful:
https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/NSS_3.35_release_notes
https://fedoraproject.org/wiki/Changes/NSSDefaultFileFormatSql

Flags: needinfo?(kjacobs.bugzilla)

nss-3.44.0-4.el7.x86_64 -> we are using this version hopefully this should convert the old dbm databases to new sqlite format automatically.

I wrote a simple patch and built a test package for the issue.
The leak was resolved by always calling pkc11_freeStaticData(),
and setting entry->derCert.data to NULL in DestroyDBEntry(),
regardless of arena being NULL or not, and if it was of type
certDBEntryTypeCert.

The double free happens when derCert.data is initialized in
another source file and other code path :

attribute = lg_FindAttribute(CKA_VALUE, templ, count);
...
derCert.data = (unsigned char *)attribute->pValue;

I am waiting for a reproducer and/or valgrind logs, otherwise cannot do
much more, as I do not understand the code enough to understand the
condition it is not initialized from pkc11_CopyStaticData().

A possible fix could be to increase the size of _certDBEntryCert.derCertSpace
to guarantee the data would always fit (current value is 2048 bytes), but
this could break the abi in unexpected conditions.

Here's an outline for a possible fix:
It looks like pkcs11_copyStaticData should take an arena.

  • In DecodeDBCertEntry, both cases of pkcs11_copyStaticData should take entry->common.arena as a parameter.
  • Probably the same DecodeTrustEntry
  • pk11_copyStaticData should pass the arena to pkcs11_allocStaticData.
  • pkcs11_allocStaticData should allocate the space from the arena if it's passed in.
  • UpdateV7DB should set certEntry.common.arena = NULL before it calls DecodeDBCertEntry

If you are just looking to work around the issue, increase the size of derCertSpace[] in pcertt.h to something bigger than the size of the biggest certs you are encountering. (it's currently 2048 bytes).

The bug is in the NSS glue to dbm, so it's a (low priority) candidate to be fixed. This is an actual bug, the probably hasn't shown up before because it requires certificates stored in the database which are bigger than 2K to trigger. The work around would be to use the sql: database (which is now the default). Customers could do certutil -K -X -d sql:{nss directory} to update their databases to sql. Modern version of nss will access the sql database automatically, older versions can be coerced into using sql by setting the NSS_DEFAULT_DB_TYPE environment variable to sql.

Thanks, Bob.

All-told, your best bet is to work toward migrating to the SQLite database format. We don't have a definite date for the removal of the DBM code, but it will eventually be removed, and that will make migration much more difficult (requiring pre-removal versions, naturally).

Severity: -- → S4
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P5

My command is just an easy way with NSS tools to migrate. Any application that opens the database R/W and authenticates will automatically migrate the database. Now the firefox and TB are using sqlite by default (for over a year), it's quite likely those users have already migrated their databases. We are actually working to remove DBM from Fedora soon (or have already in the latest release). The assumption is Fedora has had sqlite by default for over a year, which means even servers (which normally open databases Read only) will have updated the databases when they installed new certificates). I think the only issue was a kernel tool which needed instructions to manually upgrade the database.

bob

BTW the dbm removal will mark a point where we finally sunset numerous old Netscape databases. The dbm code has update code that can read all the old certx.db and keyx.db and upgrade them to cert8 and key3. Since nothing has written such databases in 2.5 decades, I think that's pretty safe;).

Thank you very much for all your suggestions. I am trying to migrate the same to sql in java 1.8.242 b08 version of jdk.
There was an exception while use nssdb with sql format secmod.db not found exception thrown.
I think this bug we can bypass by creating empty secmod.db
hopefully this should work.
https://stackoverflow.com/questions/11538988/nss-shared-db-not-working-with-sunpkcs11
kindly correct if I am wrong.

(In reply to Raghavendran Bhiman from comment #17)

Thank you very much for all your suggestions. I am trying to migrate the same to sql in java 1.8.242 b08 version of jdk.
There was an exception while use nssdb with sql format secmod.db not found exception thrown.
I think this bug we can bypass by creating empty secmod.db
hopefully this should work.
https://stackoverflow.com/questions/11538988/nss-shared-db-not-working-with-sunpkcs11
kindly correct if I am wrong.

That seems reasonable. NSS does not require secmod.db anymore and does not check for it, but I certainly could imagine the Java wrapper not understanding that.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: