bugzilla.mozilla.org has resumed normal operation. Attachments prior to 2014 will be unavailable for a few days. This is tracked in Bug 1475801.
Please report any other irregularities here.

ssl_gtest crash on shutdown, Assertion failure: secmod_PrivateModuleCount == 0, at pk11util.c:88

RESOLVED FIXED in 3.25

Status

NSS
Test
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: kaie, Assigned: ttaubert)

Tracking

trunk
3.25

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 obsolete attachment)

(Reporter)

Description

3 years ago
I see a crash at the end of the ssl_gtest execution.
  Assertion failure: secmod_PrivateModuleCount == 0, at pk11util.c:88

I traced the changes of variable secmod_PrivateModuleCount, and it's set to "2" during nss init, but during nss shutdown, it only gets decreased once.

I wonder if there's anything wrong with the parameter string, as NSS recurses and attempts to construct a second module.


First:

#1  0x00007ffff764335e in SECMOD_LoadModule (
    modulespec=0x6eac50 "name=\"NSS Internal Module\" parameters=\"configdir='/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/' certPrefix='' keyPrefix='' secmod='secmod.db' flags=readOnly updatedir='' update"..., parent=0x0, recurse=1) at pk11pars.c:994
#2  0x00007ffff76055e9 in nss_InitModules (configdir=0x6e84e8 "/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/", certPrefix=0x47e453 "", keyPrefix=0x47e453 "", 
    secmodName=0x47e449 "secmod.db", updateDir=0x7ffff7715400 "", updCertPrefix=0x7ffff7715400 "", updKeyPrefix=0x7ffff7715400 "", updateID=0x7ffff7715400 "", updateName=0x7ffff7715400 "", configName=0x0, 
    configStrings=0x0, pwRequired=0, readOnly=1, noCertDB=0, noModDB=0, forceOpen=0, optimizeSpace=0, isContextInit=0) at nssinit.c:435


Second:
#1  0x00007ffff764335e in SECMOD_LoadModule (
    modulespec=0x6f1b50 " name=\"NSS Internal PKCS #11 Module\" parameters=\"configdir='/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/' certPrefix='' keyPrefix='' secmod='secmod.db' flags=readOnly updatedir"..., parent=0x6ebf70, recurse=1) at pk11pars.c:994
#2  0x00007ffff7643505 in SECMOD_LoadModule (
    modulespec=0x6eac50 "name=\"NSS Internal Module\" parameters=\"configdir='/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/' certPrefix='' keyPrefix='' secmod='secmod.db' flags=readOnly updatedir='' update"..., parent=0x0, recurse=1) at pk11pars.c:1045
#3  0x00007ffff76055e9 in nss_InitModules (configdir=0x6e84e8 "/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/", certPrefix=0x47e453 "", keyPrefix=0x47e453 "", 
    secmodName=0x47e449 "secmod.db", updateDir=0x7ffff7715400 "", updCertPrefix=0x7ffff7715400 "", updKeyPrefix=0x7ffff7715400 "", updateID=0x7ffff7715400 "", updateName=0x7ffff7715400 "", configName=0x0, 
    configStrings=0x0, pwRequired=0, readOnly=1, noCertDB=0, noModDB=0, forceOpen=0, optimizeSpace=0, isContextInit=0) at nssinit.c:435
This crash doesn't happen outside of running all.sh.  Does that give you any clues?
(Reporter)

Comment 2

3 years ago
(In reply to Martin Thomson [:mt] from comment #1)
> This crash doesn't happen outside of running all.sh.  Does that give you any
> clues?

Yes, it's related to NSS_STRICT_SHUTDOWN=1
which verifies that the application has used NSS correctly wrt reference counting.
(and that NSS doesn't have internal errors wrt reference counting)
The module named "NSS Internal PKCS #11 Module" (the one that is created recursively) is not freed.  That seems like the real bug here.
(Reporter)

Comment 4

3 years ago
I took me a while to analyze this.
It isn't a failure in NSS.

In order to shutdown NSS correctly, it's required that the application destroys all resources that have been obtained from NSS, prior to calling NSS shutdown.

However, ssl_gtests doesn't clean up.

TlsConnectTestBase::TearDown() simply leaks the client_ and server_ objects...

I suspect the original developer might have chosen to do so because of bug 1177430.

If I change TlsConnectTestBase::TearDown() to
+  delete client_;
+  delete server_;

and in addition remove the assertion as suggested in 1177430, then the gtests complete successfully, without crash.


Also, I noticed that UnitTest::GetInstance() uses a static, non-allocated object. At least on some platforms it isn't freed on exit.

-  // When compiled with MSVC 7.1 in optimized mode, destroying the
-  // UnitTest object upon exiting the program messes up the exit code,
-  // causing successful tests to appear failed.  We have to use a
-  // different implementation in this case to bypass the compiler bug.
-  // This implementation makes the compiler happy, at the cost of
-  // leaking the UnitTest object.
-
-  // CodeGear C++Builder insists on a public destructor for the
-  // default implementation.  Use this implementation to keep good OO
-  // design with private destructor.
-
-#if (_MSC_VER == 1310 && !defined(_DEBUG)) || defined(__BORLANDC__)
-  static UnitTest* const instance = new UnitTest;
-  return instance;
-#else
-  static UnitTest instance;
-  return &instance;
-#endif  // (_MSC_VER == 1310 && !defined(_DEBUG)) || defined(__BORLANDC__)
-}


I had removed the above, and replaced it with code to dynamically allocated the instance, and free it prior to calling NSS shutdown.

I don't know yet if that's still required, or if the fix in TearDown is sufficient. I didn't have time yet to work on that. I will follow up.
(Reporter)

Updated

3 years ago
Depends on: 1177430

Comment 5

3 years ago
Thanks. To be honest, I can't recall if I leaked these intentionally, it was a bug, or Camilo did it intentionally.

Anyway once you fix 1177430 the fix here to the test code seems easy.

Comment 6

3 years ago
(In reply to Kai Engert (:kaie) from comment #4)
> I took me a while to analyze this.
> It isn't a failure in NSS.
> 
> In order to shutdown NSS correctly, it's required that the application
> destroys all resources that have been obtained from NSS, prior to calling
> NSS shutdown.
> 
> However, ssl_gtests doesn't clean up.
> 
> TlsConnectTestBase::TearDown() simply leaks the client_ and server_
> objects...
> 
> I suspect the original developer might have chosen to do so because of bug
> 1177430.
> 
> If I change TlsConnectTestBase::TearDown() to
> +  delete client_;
> +  delete server_;
> 
> and in addition remove the assertion as suggested in 1177430, then the
> gtests complete successfully, without crash.
> 
> 
> Also, I noticed that UnitTest::GetInstance() uses a static, non-allocated
> object. At least on some platforms it isn't freed on exit.
> 
> -  // When compiled with MSVC 7.1 in optimized mode, destroying the
> -  // UnitTest object upon exiting the program messes up the exit code,
> -  // causing successful tests to appear failed.  We have to use a
> -  // different implementation in this case to bypass the compiler bug.
> -  // This implementation makes the compiler happy, at the cost of
> -  // leaking the UnitTest object.
> -
> -  // CodeGear C++Builder insists on a public destructor for the
> -  // default implementation.  Use this implementation to keep good OO
> -  // design with private destructor.
> -
> -#if (_MSC_VER == 1310 && !defined(_DEBUG)) || defined(__BORLANDC__)
> -  static UnitTest* const instance = new UnitTest;
> -  return instance;
> -#else
> -  static UnitTest instance;
> -  return &instance;
> -#endif  // (_MSC_VER == 1310 && !defined(_DEBUG)) || defined(__BORLANDC__)
> -}
> 
> 
> I had removed the above, and replaced it with code to dynamically allocated
> the instance, and free it prior to calling NSS shutdown.
> 
> I don't know yet if that's still required, or if the fix in TearDown is
> sufficient. I didn't have time yet to work on that. I will follow up.

Kai,

Looking at this again, aren't these objects deleted in the the dtor?

https://dxr.mozilla.org/mozilla-central/source/security/nss/external_tests/ssl_gtest/tls_connect.cc#62
(Reporter)

Comment 7

3 years ago
I think TearDown gets always called, so the dtor simply does "delete null".

Comment 8

3 years ago
Oh, I see what you're talking about. I think this may be an artifact of converting from smart pointers.
(Reporter)

Comment 9

3 years ago
Created attachment 8626527 [details] [diff] [review]
not for review

this is a backup of a patch that works. It needs cleanup and get removed to the minimum required.

(I need to postpone this work for a while, due to other priorities.)
(Assignee)

Updated

2 years ago
Assignee: nobody → ttaubert
Status: NEW → ASSIGNED
Version: 3.19 → trunk
(Assignee)

Comment 10

2 years ago
Patch at: https://codereview.appspot.com/299800043

mt and/or ekr, please take a look. The clientSigAndHash leak isn't actually registered as it's not refcounted but it's something I discovered while running ssl_gtests with Valgrind.
Flags: needinfo?(martin.thomson)
Flags: needinfo?(ekr)
(Assignee)

Updated

2 years ago
Attachment #8626527 - Attachment is obsolete: true
LGTM, provided you can address ekr's comments regarding leaks.  BTW, have you had any luck with address sanitizer?
Flags: needinfo?(martin.thomson)
(Assignee)

Comment 12

2 years ago
(In reply to Martin Thomson [:mt:] from comment #11)
> BTW, have you had any luck with address sanitizer?

Didn't try to compile NSS with ASan yet. I'll add this to my list of things we want for a CI, together with Valgrind builds.
I reviewed this Rietveld
Flags: needinfo?(ekr)
(Assignee)

Comment 14

2 years ago
https://hg.mozilla.org/projects/nss/rev/ca23f9e14d63
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
This fails some tests. Backing out for now.
https://hg.mozilla.org/projects/nss/rev/88e50da58633

> Assertion failure: secmod_PrivateModuleCount == 0, at pk11util.c:88
> ./ssl_gtests.sh: line 106: 90829 Abort trap: 6           (core dumped) ${BINDIR}/ssl_gtest -d "${SSLGTESTDIR}" --gtest_output=xml:"${SSLGTESTREPORT}"
> ssl_gtest.sh: #8: ssl_gtest run successfully  - FAILED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Tim, maybe you can split this and keep the code changes, which are good.
(Assignee)

Comment 17

2 years ago
Pushed the leak fixes:

https://hg.mozilla.org/projects/nss/rev/d5e4ad836426

Unfortunately, the TLS 1.3 code is leaking so we can't set NSS_STRICT_SHUTDOWN=1 until we fixed the leaks that appear when running with NSS_ENABLE_TLS_1_3=1. Will take a look.
Keywords: leave-open
(Assignee)

Updated

2 years ago
Target Milestone: --- → 3.25
(Assignee)

Comment 19

2 years ago
New patch with all the TLS 1.3 leak fixes:

https://codereview.appspot.com/293350043
Flags: needinfo?(martin.thomson)
Flags: needinfo?(ekr)
(Assignee)

Comment 20

2 years ago
https://hg.mozilla.org/projects/nss/rev/09378e284035
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Flags: needinfo?(martin.thomson)
Flags: needinfo?(ekr)
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.