Closed Bug 1168425 Opened 9 years ago Closed 8 years ago

ssl_gtest crash on shutdown, Assertion failure: secmod_PrivateModuleCount == 0, at pk11util.c:88

Categories

(NSS :: Test, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: KaiE, Assigned: ttaubert)

References

Details

Attachments

(1 obsolete file)

I see a crash at the end of the ssl_gtest execution.
  Assertion failure: secmod_PrivateModuleCount == 0, at pk11util.c:88

I traced the changes of variable secmod_PrivateModuleCount, and it's set to "2" during nss init, but during nss shutdown, it only gets decreased once.

I wonder if there's anything wrong with the parameter string, as NSS recurses and attempts to construct a second module.


First:

#1  0x00007ffff764335e in SECMOD_LoadModule (
    modulespec=0x6eac50 "name=\"NSS Internal Module\" parameters=\"configdir='/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/' certPrefix='' keyPrefix='' secmod='secmod.db' flags=readOnly updatedir='' update"..., parent=0x0, recurse=1) at pk11pars.c:994
#2  0x00007ffff76055e9 in nss_InitModules (configdir=0x6e84e8 "/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/", certPrefix=0x47e453 "", keyPrefix=0x47e453 "", 
    secmodName=0x47e449 "secmod.db", updateDir=0x7ffff7715400 "", updCertPrefix=0x7ffff7715400 "", updKeyPrefix=0x7ffff7715400 "", updateID=0x7ffff7715400 "", updateName=0x7ffff7715400 "", configName=0x0, 
    configStrings=0x0, pwRequired=0, readOnly=1, noCertDB=0, noModDB=0, forceOpen=0, optimizeSpace=0, isContextInit=0) at nssinit.c:435


Second:
#1  0x00007ffff764335e in SECMOD_LoadModule (
    modulespec=0x6f1b50 " name=\"NSS Internal PKCS #11 Module\" parameters=\"configdir='/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/' certPrefix='' keyPrefix='' secmod='secmod.db' flags=readOnly updatedir"..., parent=0x6ebf70, recurse=1) at pk11pars.c:994
#2  0x00007ffff7643505 in SECMOD_LoadModule (
    modulespec=0x6eac50 "name=\"NSS Internal Module\" parameters=\"configdir='/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/' certPrefix='' keyPrefix='' secmod='secmod.db' flags=readOnly updatedir='' update"..., parent=0x0, recurse=1) at pk11pars.c:1045
#3  0x00007ffff76055e9 in nss_InitModules (configdir=0x6e84e8 "/home/kaie/moz/nss/hg/tests_results/security/localhost.4/ssl_gtests/", certPrefix=0x47e453 "", keyPrefix=0x47e453 "", 
    secmodName=0x47e449 "secmod.db", updateDir=0x7ffff7715400 "", updCertPrefix=0x7ffff7715400 "", updKeyPrefix=0x7ffff7715400 "", updateID=0x7ffff7715400 "", updateName=0x7ffff7715400 "", configName=0x0, 
    configStrings=0x0, pwRequired=0, readOnly=1, noCertDB=0, noModDB=0, forceOpen=0, optimizeSpace=0, isContextInit=0) at nssinit.c:435
This crash doesn't happen outside of running all.sh.  Does that give you any clues?
(In reply to Martin Thomson [:mt] from comment #1)
> This crash doesn't happen outside of running all.sh.  Does that give you any
> clues?

Yes, it's related to NSS_STRICT_SHUTDOWN=1
which verifies that the application has used NSS correctly wrt reference counting.
(and that NSS doesn't have internal errors wrt reference counting)
The module named "NSS Internal PKCS #11 Module" (the one that is created recursively) is not freed.  That seems like the real bug here.
I took me a while to analyze this.
It isn't a failure in NSS.

In order to shutdown NSS correctly, it's required that the application destroys all resources that have been obtained from NSS, prior to calling NSS shutdown.

However, ssl_gtests doesn't clean up.

TlsConnectTestBase::TearDown() simply leaks the client_ and server_ objects...

I suspect the original developer might have chosen to do so because of bug 1177430.

If I change TlsConnectTestBase::TearDown() to
+  delete client_;
+  delete server_;

and in addition remove the assertion as suggested in 1177430, then the gtests complete successfully, without crash.


Also, I noticed that UnitTest::GetInstance() uses a static, non-allocated object. At least on some platforms it isn't freed on exit.

-  // When compiled with MSVC 7.1 in optimized mode, destroying the
-  // UnitTest object upon exiting the program messes up the exit code,
-  // causing successful tests to appear failed.  We have to use a
-  // different implementation in this case to bypass the compiler bug.
-  // This implementation makes the compiler happy, at the cost of
-  // leaking the UnitTest object.
-
-  // CodeGear C++Builder insists on a public destructor for the
-  // default implementation.  Use this implementation to keep good OO
-  // design with private destructor.
-
-#if (_MSC_VER == 1310 && !defined(_DEBUG)) || defined(__BORLANDC__)
-  static UnitTest* const instance = new UnitTest;
-  return instance;
-#else
-  static UnitTest instance;
-  return &instance;
-#endif  // (_MSC_VER == 1310 && !defined(_DEBUG)) || defined(__BORLANDC__)
-}


I had removed the above, and replaced it with code to dynamically allocated the instance, and free it prior to calling NSS shutdown.

I don't know yet if that's still required, or if the fix in TearDown is sufficient. I didn't have time yet to work on that. I will follow up.
Depends on: 1177430
Thanks. To be honest, I can't recall if I leaked these intentionally, it was a bug, or Camilo did it intentionally.

Anyway once you fix 1177430 the fix here to the test code seems easy.
(In reply to Kai Engert (:kaie) from comment #4)
> I took me a while to analyze this.
> It isn't a failure in NSS.
> 
> In order to shutdown NSS correctly, it's required that the application
> destroys all resources that have been obtained from NSS, prior to calling
> NSS shutdown.
> 
> However, ssl_gtests doesn't clean up.
> 
> TlsConnectTestBase::TearDown() simply leaks the client_ and server_
> objects...
> 
> I suspect the original developer might have chosen to do so because of bug
> 1177430.
> 
> If I change TlsConnectTestBase::TearDown() to
> +  delete client_;
> +  delete server_;
> 
> and in addition remove the assertion as suggested in 1177430, then the
> gtests complete successfully, without crash.
> 
> 
> Also, I noticed that UnitTest::GetInstance() uses a static, non-allocated
> object. At least on some platforms it isn't freed on exit.
> 
> -  // When compiled with MSVC 7.1 in optimized mode, destroying the
> -  // UnitTest object upon exiting the program messes up the exit code,
> -  // causing successful tests to appear failed.  We have to use a
> -  // different implementation in this case to bypass the compiler bug.
> -  // This implementation makes the compiler happy, at the cost of
> -  // leaking the UnitTest object.
> -
> -  // CodeGear C++Builder insists on a public destructor for the
> -  // default implementation.  Use this implementation to keep good OO
> -  // design with private destructor.
> -
> -#if (_MSC_VER == 1310 && !defined(_DEBUG)) || defined(__BORLANDC__)
> -  static UnitTest* const instance = new UnitTest;
> -  return instance;
> -#else
> -  static UnitTest instance;
> -  return &instance;
> -#endif  // (_MSC_VER == 1310 && !defined(_DEBUG)) || defined(__BORLANDC__)
> -}
> 
> 
> I had removed the above, and replaced it with code to dynamically allocated
> the instance, and free it prior to calling NSS shutdown.
> 
> I don't know yet if that's still required, or if the fix in TearDown is
> sufficient. I didn't have time yet to work on that. I will follow up.

Kai,

Looking at this again, aren't these objects deleted in the the dtor?

https://dxr.mozilla.org/mozilla-central/source/security/nss/external_tests/ssl_gtest/tls_connect.cc#62
I think TearDown gets always called, so the dtor simply does "delete null".
Oh, I see what you're talking about. I think this may be an artifact of converting from smart pointers.
Attached patch not for review (obsolete) — Splinter Review
this is a backup of a patch that works. It needs cleanup and get removed to the minimum required.

(I need to postpone this work for a while, due to other priorities.)
Assignee: nobody → ttaubert
Status: NEW → ASSIGNED
Version: 3.19 → trunk
Patch at: https://codereview.appspot.com/299800043

mt and/or ekr, please take a look. The clientSigAndHash leak isn't actually registered as it's not refcounted but it's something I discovered while running ssl_gtests with Valgrind.
Flags: needinfo?(martin.thomson)
Flags: needinfo?(ekr)
Attachment #8626527 - Attachment is obsolete: true
LGTM, provided you can address ekr's comments regarding leaks.  BTW, have you had any luck with address sanitizer?
Flags: needinfo?(martin.thomson)
(In reply to Martin Thomson [:mt:] from comment #11)
> BTW, have you had any luck with address sanitizer?

Didn't try to compile NSS with ASan yet. I'll add this to my list of things we want for a CI, together with Valgrind builds.
I reviewed this Rietveld
Flags: needinfo?(ekr)
https://hg.mozilla.org/projects/nss/rev/ca23f9e14d63
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
This fails some tests. Backing out for now.
https://hg.mozilla.org/projects/nss/rev/88e50da58633

> Assertion failure: secmod_PrivateModuleCount == 0, at pk11util.c:88
> ./ssl_gtests.sh: line 106: 90829 Abort trap: 6           (core dumped) ${BINDIR}/ssl_gtest -d "${SSLGTESTDIR}" --gtest_output=xml:"${SSLGTESTREPORT}"
> ssl_gtest.sh: #8: ssl_gtest run successfully  - FAILED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Tim, maybe you can split this and keep the code changes, which are good.
Pushed the leak fixes:

https://hg.mozilla.org/projects/nss/rev/d5e4ad836426

Unfortunately, the TLS 1.3 code is leaking so we can't set NSS_STRICT_SHUTDOWN=1 until we fixed the leaks that appear when running with NSS_ENABLE_TLS_1_3=1. Will take a look.
Keywords: leave-open
Target Milestone: --- → 3.25
New patch with all the TLS 1.3 leak fixes:

https://codereview.appspot.com/293350043
Flags: needinfo?(martin.thomson)
Flags: needinfo?(ekr)
https://hg.mozilla.org/projects/nss/rev/09378e284035
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Flags: needinfo?(martin.thomson)
Flags: needinfo?(ekr)
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: