Closed Bug 1653310 Opened 4 years ago Closed 4 years ago

Perma-fail `Pkcs11ModuleTest.ListSlots` on mac taskcluster worker

Categories

(NSS :: Test, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 obsolete file)

Filed by: kjacobs [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=310020895&repo=nss-try
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/NdLxKidySbuP15CVG5jL0w/runs/0/artifacts/public/logs/live_backing.log


[ RUN      ] Pkcs11ModuleTest.ListSlots
loaded slot: NSS User Private Key and Certificate Services
loaded slot: NSS Internal Cryptographic Services
loaded slot: Test PKCS11 Public Certs Slot
loaded slot: Test PKCS11 Slot 二
loaded slot: NSS Builtin Objects
../../gtests/pk11_gtest/pk11_module_unittest.cc:67: Failure
Value of: std::equal(kSlotsWithToken.begin(), kSlotsWithToken.end(), foundSlots.begin())
  Actual: false
Expected: true
[  FAILED  ] Pkcs11ModuleTest.ListSlots (1 ms)

This is also breaking the "Certs" task with error:

modutil -add RootCerts -libfile /Users/administrator/worker9/tasks/task_1594915207/dist/Release/lib/libnssckbi.dylib -dbdir /Users/administrator/worker9/tasks/task_1594915207/tests_results/security/localhost.1/CA

WARNING: Performing this operation while the browser is running could cause
corruption of your security databases. If the browser is currently running,
you should exit browser before continuing this operation. Type 
'q <enter>' to abort, or <enter> to continue: 
ERROR: Failed to add module "RootCerts". Probable cause : "Unknown PKCS #11 error.".
cert.sh: #3: Loading root cert module to CA Cert DB (22)  - FAILED

This first appeared in an unrelated try-push [1] and also occurs with 3.54 RTM [2]. The same tests pass locally, however.

[1] https://treeherder.mozilla.org/#/jobs?repo=nss-try&revision=efcb4bcb6462352c102ba852035aeb7349f3de08
[2] https://treeherder.mozilla.org/#/jobs?repo=nss-try&revision=348ee51f3c41c90b379e5284d8de3e3913da93a5

Summary: Perrma-fail `Pkcs11ModuleTest.ListSlots` on mac taskcluster worker → Perma-fail `Pkcs11ModuleTest.ListSlots` on mac taskcluster worker

Kai, these failures are caused by https://hg.mozilla.org/projects/nspr/rev/608f1e672c2e771357d6937716fe25e5be231e46.

Before and after this patch, pk11_gtest always tries to load the Builtins, passing name="./libnssckbi.dylib" to pr_LoadLibraryByPathname. Before the patch (and on Linux and presumably Windows), this returns NULL. With the patch, dlopen actually loads the library and "NSS Builtin Objects" module.

ISTM that we might want to revert NSPR, or at least replace the code with something equivalent. What do you think?

Flags: needinfo?(kaie)
Priority: P5 → P1
Regressed by: 1652956
Has Regression Range: --- → yes

Thanks. Yes, I agree, let's revert. I'm glad that we were able to find a scenario in which the old code was necessary. I'll investigate more tomorrow.

Flags: needinfo?(kaie)

I've analyzed this issue in more detail.
I no longer think we should revert.

At init time, NSS attempts to automatically load the nssckbi shared library from the database path.
It does so by calling nss_FindExternalRoot, which passes the full path to the potential database location to SECMOD_AddNewModule, which will eventually call PR_LoadLibrary.

On most platforms, if the library is missing, this check will fail.

On macOS, the OS will automatically strip the directory prefix, and attempt to load the nssckbi library from any place in the global search path for shared libraries.

Old bug 480730 was an attempt to prevent that, by adding a check that the file exists, prior to loading it.

The file system based check for an existing file will no longer work for system library on macOS 11. Because of that, it was necessary to remove that general check from NSPR.

As a result, if there is a global library available, then NSS attempt to load nssckbi from the database directory will succeed, even if that file does not exist. Consequently an unexpected library with unexpected contents will be loaded.

To fix this issue, I think we should implement NSS' expectation at the NSS code level.

I suggest that nss_FindExternalRoot should be enhanced to check if the library exists at the candidate path, and only attempts to load if it exists.

I'll submit a patch and a try run.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 3.55

As discussed on Matrix with kjacobs, this patch can be backed out, now that we have a backwards compatible fix in NSPR.

He gave r+ to backout in chat.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

backed out:
https://hg.mozilla.org/projects/nss/rev/a448fe36e58bf03e102ce7f571082ae5e140a4ff

fix no longer necessary -> wontfix
(assuming build will be green)

Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → WONTFIX
Attachment #9164776 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: