Perma-fail `Pkcs11ModuleTest.ListSlots` on mac taskcluster worker
Categories
(NSS :: Test, defect, P1)
Tracking
(Not tracked)
People
(Reporter: intermittent-bug-filer, Unassigned)
References
(Regression)
Details
(Keywords: regression)
Attachments
(1 obsolete file)
Filed by: kjacobs [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=310020895&repo=nss-try
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/NdLxKidySbuP15CVG5jL0w/runs/0/artifacts/public/logs/live_backing.log
[ RUN ] Pkcs11ModuleTest.ListSlots
loaded slot: NSS User Private Key and Certificate Services
loaded slot: NSS Internal Cryptographic Services
loaded slot: Test PKCS11 Public Certs Slot
loaded slot: Test PKCS11 Slot 二
loaded slot: NSS Builtin Objects
../../gtests/pk11_gtest/pk11_module_unittest.cc:67: Failure
Value of: std::equal(kSlotsWithToken.begin(), kSlotsWithToken.end(), foundSlots.begin())
Actual: false
Expected: true
[ FAILED ] Pkcs11ModuleTest.ListSlots (1 ms)
This is also breaking the "Certs" task with error:
modutil -add RootCerts -libfile /Users/administrator/worker9/tasks/task_1594915207/dist/Release/lib/libnssckbi.dylib -dbdir /Users/administrator/worker9/tasks/task_1594915207/tests_results/security/localhost.1/CA
WARNING: Performing this operation while the browser is running could cause
corruption of your security databases. If the browser is currently running,
you should exit browser before continuing this operation. Type
'q <enter>' to abort, or <enter> to continue:
ERROR: Failed to add module "RootCerts". Probable cause : "Unknown PKCS #11 error.".
cert.sh: #3: Loading root cert module to CA Cert DB (22) - FAILED
This first appeared in an unrelated try-push [1] and also occurs with 3.54 RTM [2]. The same tests pass locally, however.
[1] https://treeherder.mozilla.org/#/jobs?repo=nss-try&revision=efcb4bcb6462352c102ba852035aeb7349f3de08
[2] https://treeherder.mozilla.org/#/jobs?repo=nss-try&revision=348ee51f3c41c90b379e5284d8de3e3913da93a5
Updated•4 years ago
|
Comment 1•4 years ago
|
||
Kai, these failures are caused by https://hg.mozilla.org/projects/nspr/rev/608f1e672c2e771357d6937716fe25e5be231e46.
Before and after this patch, pk11_gtest always tries to load the Builtins, passing name="./libnssckbi.dylib"
to pr_LoadLibraryByPathname
. Before the patch (and on Linux and presumably Windows), this returns NULL. With the patch, dlopen actually loads the library and "NSS Builtin Objects" module.
ISTM that we might want to revert NSPR, or at least replace the code with something equivalent. What do you think?
Updated•4 years ago
|
Comment 2•4 years ago
|
||
Thanks. Yes, I agree, let's revert. I'm glad that we were able to find a scenario in which the old code was necessary. I'll investigate more tomorrow.
Comment hidden (Intermittent Failures Robot) |
Comment 4•4 years ago
|
||
I've analyzed this issue in more detail.
I no longer think we should revert.
At init time, NSS attempts to automatically load the nssckbi shared library from the database path.
It does so by calling nss_FindExternalRoot, which passes the full path to the potential database location to SECMOD_AddNewModule, which will eventually call PR_LoadLibrary.
On most platforms, if the library is missing, this check will fail.
On macOS, the OS will automatically strip the directory prefix, and attempt to load the nssckbi library from any place in the global search path for shared libraries.
Old bug 480730 was an attempt to prevent that, by adding a check that the file exists, prior to loading it.
The file system based check for an existing file will no longer work for system library on macOS 11. Because of that, it was necessary to remove that general check from NSPR.
As a result, if there is a global library available, then NSS attempt to load nssckbi from the database directory will succeed, even if that file does not exist. Consequently an unexpected library with unexpected contents will be loaded.
To fix this issue, I think we should implement NSS' expectation at the NSS code level.
I suggest that nss_FindExternalRoot should be enhanced to check if the library exists at the candidate path, and only attempts to load if it exists.
I'll submit a patch and a try run.
Comment 5•4 years ago
|
||
Comment 6•4 years ago
|
||
This commit fixed the macOS failures.
https://hg.mozilla.org/projects/nss/rev/ca207655b4b7cb1d3a5e438c1fb9b90d45596da6
Updated•4 years ago
|
Comment 7•4 years ago
|
||
As discussed on Matrix with kjacobs, this patch can be backed out, now that we have a backwards compatible fix in NSPR.
He gave r+ to backout in chat.
Updated•4 years ago
|
Comment 8•4 years ago
|
||
backed out:
https://hg.mozilla.org/projects/nss/rev/a448fe36e58bf03e102ce7f571082ae5e140a4ff
fix no longer necessary -> wontfix
(assuming build will be green)
Updated•3 years ago
|
Updated•3 years ago
|
Description
•