Closed
Bug 1168425
Opened 10 years ago
Closed 9 years ago
ssl_gtest crash on shutdown, Assertion failure: secmod_PrivateModuleCount == 0, at pk11util.c:88
Categories
(NSS :: Test, defect)
NSS
Test
Tracking
(Not tracked)
RESOLVED
FIXED
3.25
People
(Reporter: KaiE, Assigned: ttaubert)
References
Details
Attachments
(1 obsolete file)
I see a crash at the end of the ssl_gtest execution.
Assertion failure: secmod_PrivateModuleCount == 0, at pk11util.c:88
I traced the changes of variable secmod_PrivateModuleCount, and it's set to "2" during nss init, but during nss shutdown, it only gets decreased once.
I wonder if there's anything wrong with the parameter string, as NSS recurses and attempts to construct a second module.
First:
#1 0x00007ffff764335e in SECMOD_LoadModule (
modulespec=0x6eac50 "name=\"NSS Internal Module\" parameters=\"configdir='/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/' certPrefix='' keyPrefix='' secmod='secmod.db' flags=readOnly updatedir='' update"..., parent=0x0, recurse=1) at pk11pars.c:994
#2 0x00007ffff76055e9 in nss_InitModules (configdir=0x6e84e8 "/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/", certPrefix=0x47e453 "", keyPrefix=0x47e453 "",
secmodName=0x47e449 "secmod.db", updateDir=0x7ffff7715400 "", updCertPrefix=0x7ffff7715400 "", updKeyPrefix=0x7ffff7715400 "", updateID=0x7ffff7715400 "", updateName=0x7ffff7715400 "", configName=0x0,
configStrings=0x0, pwRequired=0, readOnly=1, noCertDB=0, noModDB=0, forceOpen=0, optimizeSpace=0, isContextInit=0) at nssinit.c:435
Second:
#1 0x00007ffff764335e in SECMOD_LoadModule (
modulespec=0x6f1b50 " name=\"NSS Internal PKCS #11 Module\" parameters=\"configdir='/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/' certPrefix='' keyPrefix='' secmod='secmod.db' flags=readOnly updatedir"..., parent=0x6ebf70, recurse=1) at pk11pars.c:994
#2 0x00007ffff7643505 in SECMOD_LoadModule (
modulespec=0x6eac50 "name=\"NSS Internal Module\" parameters=\"configdir='/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/' certPrefix='' keyPrefix='' secmod='secmod.db' flags=readOnly updatedir='' update"..., parent=0x0, recurse=1) at pk11pars.c:1045
#3 0x00007ffff76055e9 in nss_InitModules (configdir=0x6e84e8 "/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/", certPrefix=0x47e453 "", keyPrefix=0x47e453 "",
secmodName=0x47e449 "secmod.db", updateDir=0x7ffff7715400 "", updCertPrefix=0x7ffff7715400 "", updKeyPrefix=0x7ffff7715400 "", updateID=0x7ffff7715400 "", updateName=0x7ffff7715400 "", configName=0x0,
configStrings=0x0, pwRequired=0, readOnly=1, noCertDB=0, noModDB=0, forceOpen=0, optimizeSpace=0, isContextInit=0) at nssinit.c:435
Comment 1•10 years ago
|
||
This crash doesn't happen outside of running all.sh. Does that give you any clues?
Reporter | ||
Comment 2•10 years ago
|
||
(In reply to Martin Thomson [:mt] from comment #1)
> This crash doesn't happen outside of running all.sh. Does that give you any
> clues?
Yes, it's related to NSS_STRICT_SHUTDOWN=1
which verifies that the application has used NSS correctly wrt reference counting.
(and that NSS doesn't have internal errors wrt reference counting)
Comment 3•10 years ago
|
||
The module named "NSS Internal PKCS #11 Module" (the one that is created recursively) is not freed. That seems like the real bug here.
Reporter | ||
Comment 4•10 years ago
|
||
I took me a while to analyze this.
It isn't a failure in NSS.
In order to shutdown NSS correctly, it's required that the application destroys all resources that have been obtained from NSS, prior to calling NSS shutdown.
However, ssl_gtests doesn't clean up.
TlsConnectTestBase::TearDown() simply leaks the client_ and server_ objects...
I suspect the original developer might have chosen to do so because of bug 1177430.
If I change TlsConnectTestBase::TearDown() to
+ delete client_;
+ delete server_;
and in addition remove the assertion as suggested in 1177430, then the gtests complete successfully, without crash.
Also, I noticed that UnitTest::GetInstance() uses a static, non-allocated object. At least on some platforms it isn't freed on exit.
- // When compiled with MSVC 7.1 in optimized mode, destroying the
- // UnitTest object upon exiting the program messes up the exit code,
- // causing successful tests to appear failed. We have to use a
- // different implementation in this case to bypass the compiler bug.
- // This implementation makes the compiler happy, at the cost of
- // leaking the UnitTest object.
-
- // CodeGear C++Builder insists on a public destructor for the
- // default implementation. Use this implementation to keep good OO
- // design with private destructor.
-
-#if (_MSC_VER == 1310 && !defined(_DEBUG)) || defined(__BORLANDC__)
- static UnitTest* const instance = new UnitTest;
- return instance;
-#else
- static UnitTest instance;
- return &instance;
-#endif // (_MSC_VER == 1310 && !defined(_DEBUG)) || defined(__BORLANDC__)
-}
I had removed the above, and replaced it with code to dynamically allocated the instance, and free it prior to calling NSS shutdown.
I don't know yet if that's still required, or if the fix in TearDown is sufficient. I didn't have time yet to work on that. I will follow up.
Comment 5•10 years ago
|
||
Thanks. To be honest, I can't recall if I leaked these intentionally, it was a bug, or Camilo did it intentionally.
Anyway once you fix 1177430 the fix here to the test code seems easy.
Comment 6•10 years ago
|
||
(In reply to Kai Engert (:kaie) from comment #4)
> I took me a while to analyze this.
> It isn't a failure in NSS.
>
> In order to shutdown NSS correctly, it's required that the application
> destroys all resources that have been obtained from NSS, prior to calling
> NSS shutdown.
>
> However, ssl_gtests doesn't clean up.
>
> TlsConnectTestBase::TearDown() simply leaks the client_ and server_
> objects...
>
> I suspect the original developer might have chosen to do so because of bug
> 1177430.
>
> If I change TlsConnectTestBase::TearDown() to
> + delete client_;
> + delete server_;
>
> and in addition remove the assertion as suggested in 1177430, then the
> gtests complete successfully, without crash.
>
>
> Also, I noticed that UnitTest::GetInstance() uses a static, non-allocated
> object. At least on some platforms it isn't freed on exit.
>
> - // When compiled with MSVC 7.1 in optimized mode, destroying the
> - // UnitTest object upon exiting the program messes up the exit code,
> - // causing successful tests to appear failed. We have to use a
> - // different implementation in this case to bypass the compiler bug.
> - // This implementation makes the compiler happy, at the cost of
> - // leaking the UnitTest object.
> -
> - // CodeGear C++Builder insists on a public destructor for the
> - // default implementation. Use this implementation to keep good OO
> - // design with private destructor.
> -
> -#if (_MSC_VER == 1310 && !defined(_DEBUG)) || defined(__BORLANDC__)
> - static UnitTest* const instance = new UnitTest;
> - return instance;
> -#else
> - static UnitTest instance;
> - return &instance;
> -#endif // (_MSC_VER == 1310 && !defined(_DEBUG)) || defined(__BORLANDC__)
> -}
>
>
> I had removed the above, and replaced it with code to dynamically allocated
> the instance, and free it prior to calling NSS shutdown.
>
> I don't know yet if that's still required, or if the fix in TearDown is
> sufficient. I didn't have time yet to work on that. I will follow up.
Kai,
Looking at this again, aren't these objects deleted in the the dtor?
https://dxr.mozilla.org/mozilla-central/source/security/nss/external_tests/ssl_gtest/tls_connect.cc#62
Reporter | ||
Comment 7•10 years ago
|
||
I think TearDown gets always called, so the dtor simply does "delete null".
Comment 8•10 years ago
|
||
Oh, I see what you're talking about. I think this may be an artifact of converting from smart pointers.
Reporter | ||
Comment 9•10 years ago
|
||
this is a backup of a patch that works. It needs cleanup and get removed to the minimum required.
(I need to postpone this work for a while, due to other priorities.)
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → ttaubert
Status: NEW → ASSIGNED
Version: 3.19 → trunk
Assignee | ||
Comment 10•9 years ago
|
||
Patch at: https://codereview.appspot.com/299800043
mt and/or ekr, please take a look. The clientSigAndHash leak isn't actually registered as it's not refcounted but it's something I discovered while running ssl_gtests with Valgrind.
Flags: needinfo?(martin.thomson)
Flags: needinfo?(ekr)
Assignee | ||
Updated•9 years ago
|
Attachment #8626527 -
Attachment is obsolete: true
Comment 11•9 years ago
|
||
LGTM, provided you can address ekr's comments regarding leaks. BTW, have you had any luck with address sanitizer?
Flags: needinfo?(martin.thomson)
Assignee | ||
Comment 12•9 years ago
|
||
(In reply to Martin Thomson [:mt:] from comment #11)
> BTW, have you had any luck with address sanitizer?
Didn't try to compile NSS with ASan yet. I'll add this to my list of things we want for a CI, together with Valgrind builds.
Assignee | ||
Comment 14•9 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 15•9 years ago
|
||
This fails some tests. Backing out for now.
https://hg.mozilla.org/projects/nss/rev/88e50da58633
> Assertion failure: secmod_PrivateModuleCount == 0, at pk11util.c:88
> ./ssl_gtests.sh: line 106: 90829 Abort trap: 6 (core dumped) ${BINDIR}/ssl_gtest -d "${SSLGTESTDIR}" --gtest_output=xml:"${SSLGTESTREPORT}"
> ssl_gtest.sh: #8: ssl_gtest run successfully - FAILED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 16•9 years ago
|
||
Tim, maybe you can split this and keep the code changes, which are good.
Assignee | ||
Comment 17•9 years ago
|
||
Pushed the leak fixes:
https://hg.mozilla.org/projects/nss/rev/d5e4ad836426
Unfortunately, the TLS 1.3 code is leaking so we can't set NSS_STRICT_SHUTDOWN=1 until we fixed the leaks that appear when running with NSS_ENABLE_TLS_1_3=1. Will take a look.
Keywords: leave-open
Assignee | ||
Comment 18•9 years ago
|
||
Fixing a clang-format failure:
https://hg.mozilla.org/projects/nss/rev/35d368fc3213
Assignee | ||
Updated•9 years ago
|
Target Milestone: --- → 3.25
Assignee | ||
Comment 19•9 years ago
|
||
New patch with all the TLS 1.3 leak fixes:
https://codereview.appspot.com/293350043
Flags: needinfo?(martin.thomson)
Flags: needinfo?(ekr)
Assignee | ||
Comment 20•9 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Flags: needinfo?(martin.thomson)
Flags: needinfo?(ekr)
Keywords: leave-open
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•