Closed Bug 1674410 Opened 4 years ago Closed 1 year ago

Shutdown of the SSL Cert threadpool hangs

Categories

(Core :: Security: PSM, defect, P3)

defect

Tracking

()

RESOLVED FIXED
114 Branch
Tracking Status
firefox114 --- fixed

People

(Reporter: jstutte, Assigned: keeler)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

(Whiteboard: [psm-backlog])

Crash Data

Attachments

(1 file)

From bug 1633342 comment 41:

Looking at shutdownhang | mozilla::TaskController::GetRunnableForMTTask | nsThread::Shutdown | mozilla::net::nsSocketTransportService::ShutdownThread.

In all the reports I clicked on, the SocketThread is stuck while shutting down the SSL Cert threadpool.

Looking at the shutdown function, it seems we shutdown the threads in the order we created them (and wait for each single thread before we loop). I see in the first three reports I clicked on, that SSL Cert #1 is still alive, and am assuming that it processes some long lasting event when the shutdown event comes in, such that we never get to process the shutdown event.

In two cases it is stuck in nsNSSComponent::BlockUntilLoadableCertsLoaded(), in the other case mozilla::psm::NSSCertDBTrustDomain::GetCertTrust seems stuck.

We might want to consider the use of ShutdownWithTimeout in StopSSLServerCertVerificationThreads ?

Crash Signature: [@ shutdownhang | mozilla::TaskController::GetRunnableForMTTask | nsThread::Shutdown | mozilla::net::nsSocketTransportService::ShutdownThread ]

The severity field is not set for this bug.
:keeler, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(dkeeler)

There's a couple of things going on here. First is third-party PKCS#11 modules. As long as we support loading third-party modules, they'll continue to give users a bad experience. Our hope is that osclientcerts will make it so users don't have to load these modules, but we're not quite there yet. The other issue is NSS being essentially not thread-safe. Any time we use NSS resources on these other threads, we could hit a deadlock/race condition/etc., so bug 1664048 is working on avoiding NSS types in PSM. There will be some cases where NSS types can't be avoided, so we'll have to do something like proxy that work to the socket thread (since the socket thread can't not use NSS resources as well, since that's how we TLS).

Severity: -- → S3
Depends on: 1664048, osclientcerts
Flags: needinfo?(dkeeler)
Priority: -- → P3
Whiteboard: [psm-backlog]

Oh also we can't not shut down those threads because then when we shut NSS down, it'll fail because those resources are still in use.

Adjusting the signature in preparation for bug 1794587.

Crash Signature: [@ shutdownhang | mozilla::TaskController::GetRunnableForMTTask | nsThread::Shutdown | mozilla::net::nsSocketTransportService::ShutdownThread ] → [@ shutdownhang | mozilla::TaskController::GetRunnableForMTTask | nsThread::Shutdown | mozilla::net::nsSocketTransportService::ShutdownThread ] [@ shutdownhang | nsThread::Shutdown | mozilla::net::nsSocketTransportService::ShutdownThread]

Certificate verification can take a while, which is why it runs in a separate
thread pool. At shutdown, the thread pool gets joined. To make this fast,
certificate verification tasks should check for shutdown before doing
time-consuming operations and return early if appropriate.

Assignee: nobody → dkeeler
Status: NEW → ASSIGNED
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6c8edd3b4ff7
stop slow certificate verification tasks when the app is shutting down r=jschanck
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 114 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: