Closed Bug 1168425 Opened 10 years ago Closed 9 years ago

ssl_gtest crash on shutdown, Assertion failure: secmod_PrivateModuleCount == 0, at pk11util.c:88

Categories

(NSS :: Test, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: KaiE, Assigned: ttaubert)

References

Details

Attachments

(1 obsolete file)

I see a crash at the end of the ssl_gtest execution. Assertion failure: secmod_PrivateModuleCount == 0, at pk11util.c:88 I traced the changes of variable secmod_PrivateModuleCount, and it's set to "2" during nss init, but during nss shutdown, it only gets decreased once. I wonder if there's anything wrong with the parameter string, as NSS recurses and attempts to construct a second module. First: #1 0x00007ffff764335e in SECMOD_LoadModule ( modulespec=0x6eac50 "name=\"NSS Internal Module\" parameters=\"configdir='/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/' certPrefix='' keyPrefix='' secmod='secmod.db' flags=readOnly updatedir='' update"..., parent=0x0, recurse=1) at pk11pars.c:994 #2 0x00007ffff76055e9 in nss_InitModules (configdir=0x6e84e8 "/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/", certPrefix=0x47e453 "", keyPrefix=0x47e453 "", secmodName=0x47e449 "secmod.db", updateDir=0x7ffff7715400 "", updCertPrefix=0x7ffff7715400 "", updKeyPrefix=0x7ffff7715400 "", updateID=0x7ffff7715400 "", updateName=0x7ffff7715400 "", configName=0x0, configStrings=0x0, pwRequired=0, readOnly=1, noCertDB=0, noModDB=0, forceOpen=0, optimizeSpace=0, isContextInit=0) at nssinit.c:435 Second: #1 0x00007ffff764335e in SECMOD_LoadModule ( modulespec=0x6f1b50 " name=\"NSS Internal PKCS #11 Module\" parameters=\"configdir='/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/' certPrefix='' keyPrefix='' secmod='secmod.db' flags=readOnly updatedir"..., parent=0x6ebf70, recurse=1) at pk11pars.c:994 #2 0x00007ffff7643505 in SECMOD_LoadModule ( modulespec=0x6eac50 "name=\"NSS Internal Module\" parameters=\"configdir='/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/' certPrefix='' keyPrefix='' secmod='secmod.db' flags=readOnly updatedir='' update"..., parent=0x0, recurse=1) at pk11pars.c:1045 #3 0x00007ffff76055e9 in nss_InitModules (configdir=0x6e84e8 "/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/", certPrefix=0x47e453 "", keyPrefix=0x47e453 "", secmodName=0x47e449 "secmod.db", updateDir=0x7ffff7715400 "", updCertPrefix=0x7ffff7715400 "", updKeyPrefix=0x7ffff7715400 "", updateID=0x7ffff7715400 "", updateName=0x7ffff7715400 "", configName=0x0, configStrings=0x0, pwRequired=0, readOnly=1, noCertDB=0, noModDB=0, forceOpen=0, optimizeSpace=0, isContextInit=0) at nssinit.c:435
This crash doesn't happen outside of running all.sh. Does that give you any clues?
(In reply to Martin Thomson [:mt] from comment #1) > This crash doesn't happen outside of running all.sh. Does that give you any > clues? Yes, it's related to NSS_STRICT_SHUTDOWN=1 which verifies that the application has used NSS correctly wrt reference counting. (and that NSS doesn't have internal errors wrt reference counting)
The module named "NSS Internal PKCS #11 Module" (the one that is created recursively) is not freed. That seems like the real bug here.
I took me a while to analyze this. It isn't a failure in NSS. In order to shutdown NSS correctly, it's required that the application destroys all resources that have been obtained from NSS, prior to calling NSS shutdown. However, ssl_gtests doesn't clean up. TlsConnectTestBase::TearDown() simply leaks the client_ and server_ objects... I suspect the original developer might have chosen to do so because of bug 1177430. If I change TlsConnectTestBase::TearDown() to + delete client_; + delete server_; and in addition remove the assertion as suggested in 1177430, then the gtests complete successfully, without crash. Also, I noticed that UnitTest::GetInstance() uses a static, non-allocated object. At least on some platforms it isn't freed on exit. - // When compiled with MSVC 7.1 in optimized mode, destroying the - // UnitTest object upon exiting the program messes up the exit code, - // causing successful tests to appear failed. We have to use a - // different implementation in this case to bypass the compiler bug. - // This implementation makes the compiler happy, at the cost of - // leaking the UnitTest object. - - // CodeGear C++Builder insists on a public destructor for the - // default implementation. Use this implementation to keep good OO - // design with private destructor. - -#if (_MSC_VER == 1310 && !defined(_DEBUG)) || defined(__BORLANDC__) - static UnitTest* const instance = new UnitTest; - return instance; -#else - static UnitTest instance; - return &instance; -#endif // (_MSC_VER == 1310 && !defined(_DEBUG)) || defined(__BORLANDC__) -} I had removed the above, and replaced it with code to dynamically allocated the instance, and free it prior to calling NSS shutdown. I don't know yet if that's still required, or if the fix in TearDown is sufficient. I didn't have time yet to work on that. I will follow up.
Depends on: 1177430
Thanks. To be honest, I can't recall if I leaked these intentionally, it was a bug, or Camilo did it intentionally. Anyway once you fix 1177430 the fix here to the test code seems easy.
(In reply to Kai Engert (:kaie) from comment #4) > I took me a while to analyze this. > It isn't a failure in NSS. > > In order to shutdown NSS correctly, it's required that the application > destroys all resources that have been obtained from NSS, prior to calling > NSS shutdown. > > However, ssl_gtests doesn't clean up. > > TlsConnectTestBase::TearDown() simply leaks the client_ and server_ > objects... > > I suspect the original developer might have chosen to do so because of bug > 1177430. > > If I change TlsConnectTestBase::TearDown() to > + delete client_; > + delete server_; > > and in addition remove the assertion as suggested in 1177430, then the > gtests complete successfully, without crash. > > > Also, I noticed that UnitTest::GetInstance() uses a static, non-allocated > object. At least on some platforms it isn't freed on exit. > > - // When compiled with MSVC 7.1 in optimized mode, destroying the > - // UnitTest object upon exiting the program messes up the exit code, > - // causing successful tests to appear failed. We have to use a > - // different implementation in this case to bypass the compiler bug. > - // This implementation makes the compiler happy, at the cost of > - // leaking the UnitTest object. > - > - // CodeGear C++Builder insists on a public destructor for the > - // default implementation. Use this implementation to keep good OO > - // design with private destructor. > - > -#if (_MSC_VER == 1310 && !defined(_DEBUG)) || defined(__BORLANDC__) > - static UnitTest* const instance = new UnitTest; > - return instance; > -#else > - static UnitTest instance; > - return &instance; > -#endif // (_MSC_VER == 1310 && !defined(_DEBUG)) || defined(__BORLANDC__) > -} > > > I had removed the above, and replaced it with code to dynamically allocated > the instance, and free it prior to calling NSS shutdown. > > I don't know yet if that's still required, or if the fix in TearDown is > sufficient. I didn't have time yet to work on that. I will follow up. Kai, Looking at this again, aren't these objects deleted in the the dtor? https://dxr.mozilla.org/mozilla-central/source/security/nss/external_tests/ssl_gtest/tls_connect.cc#62
I think TearDown gets always called, so the dtor simply does "delete null".
Oh, I see what you're talking about. I think this may be an artifact of converting from smart pointers.
Attached patch not for review (obsolete) — Splinter Review
this is a backup of a patch that works. It needs cleanup and get removed to the minimum required. (I need to postpone this work for a while, due to other priorities.)
Assignee: nobody → ttaubert
Status: NEW → ASSIGNED
Version: 3.19 → trunk
Patch at: https://codereview.appspot.com/299800043 mt and/or ekr, please take a look. The clientSigAndHash leak isn't actually registered as it's not refcounted but it's something I discovered while running ssl_gtests with Valgrind.
Flags: needinfo?(martin.thomson)
Flags: needinfo?(ekr)
Attachment #8626527 - Attachment is obsolete: true
LGTM, provided you can address ekr's comments regarding leaks. BTW, have you had any luck with address sanitizer?
Flags: needinfo?(martin.thomson)
(In reply to Martin Thomson [:mt:] from comment #11) > BTW, have you had any luck with address sanitizer? Didn't try to compile NSS with ASan yet. I'll add this to my list of things we want for a CI, together with Valgrind builds.
I reviewed this Rietveld
Flags: needinfo?(ekr)
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
This fails some tests. Backing out for now. https://hg.mozilla.org/projects/nss/rev/88e50da58633 > Assertion failure: secmod_PrivateModuleCount == 0, at pk11util.c:88 > ./ssl_gtests.sh: line 106: 90829 Abort trap: 6 (core dumped) ${BINDIR}/ssl_gtest -d "${SSLGTESTDIR}" --gtest_output=xml:"${SSLGTESTREPORT}" > ssl_gtest.sh: #8: ssl_gtest run successfully - FAILED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Tim, maybe you can split this and keep the code changes, which are good.
Pushed the leak fixes: https://hg.mozilla.org/projects/nss/rev/d5e4ad836426 Unfortunately, the TLS 1.3 code is leaking so we can't set NSS_STRICT_SHUTDOWN=1 until we fixed the leaks that appear when running with NSS_ENABLE_TLS_1_3=1. Will take a look.
Keywords: leave-open
Target Milestone: --- → 3.25
New patch with all the TLS 1.3 leak fixes: https://codereview.appspot.com/293350043
Flags: needinfo?(martin.thomson)
Flags: needinfo?(ekr)
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Flags: needinfo?(martin.thomson)
Flags: needinfo?(ekr)
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: