cert_storage blocks the parent process main thread
Categories
(Core :: Security: PSM, defect, P1)
Tracking
()
People
(Reporter: florian, Assigned: keeler)
Details
(Keywords: perf, perf:responsiveness, Whiteboard: [bhr:cert_storage][psm-assigned])
Attachments
(1 file)
This currently represents more than 6% of the total hang time reported through BHR for the nightly user base, making it one of the top five hangs (http://queze.net/bhr/test/#row=1&filter=cert_storage).
The main thread seems to be blocked trying to get a lock (RtlAcquireSRWLockExclusive
).
The two most common stacks look like this:
RtlAcquireSRWLockExclusive ntdll
cert_storage::{{impl}}::done<tuple<>,closure-0>(cert_storage::SecurityStateTask<tuple<>, closure-0>*) xul
moz_task::{{impl}}::allocate::Run(xpcom::interfaces::idl::nsIRunnable*) xul
mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex &> const&) xul
mozilla::detail::RunnableFunction<`lambda at /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:120:7'>::Run() xul
nsThread::ProcessNextEvent(bool, bool*) xul
and
RtlAcquireSRWLockExclusive ntdll
cert_storage::{{impl}}::allocate::AddCRLiteStash(xpcom::interfaces::idl::nsICertStorage*, thin_vec::ThinVec<u8>*, xpcom::interfaces::idl::nsICertStorageCallback*) xul
XPTC__InvokebyIndex xul
static XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) xul
XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) xul
(unresolved)
gre/modules/psm/RemoteSecuritySettings.jsm:797
I was on PTO last week, and when I looked at BHR data about a week ago, I don't remember seeing hangs involving cert_storage in the top hangs. Has something related to this code changed recently that would either have introduced these hangs, or significantly increased their frequency?
The certList.addCRLiteStash
call at https://searchfox.org/mozilla-central/rev/c37038c592a352eda0f5e77dfb58c4929bf8bcd3/security/manager/ssl/RemoteSecuritySettings.jsm#797 seems to have been added in May 2020. Was it there but somehow disabled until last week?
Assignee | ||
Comment 1•4 years ago
|
||
Do these reports include stack traces from other threads? That would be super helpful in figuring out what's going on here.
Reporter | ||
Comment 2•4 years ago
|
||
BHR collects a stack of the main thread when it has been unresponsive for more than 128ms, it doesn't collect stacks for other threads. If we find a way to reproduce, we could capture a profile of multiple threads, but I don't know steps to reproduce.
Updated•4 years ago
|
Assignee | ||
Comment 3•4 years ago
|
||
Is there a way to tell if the changes from bug 1677399 had an effect on this?
Reporter | ||
Comment 4•4 years ago
|
||
(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #3)
Is there a way to tell if the changes from bug 1677399 had an effect on this?
Yes, but it takes about 3 days for BHR (Telemetry) data to be received.
Today http://queze.net/bhr/test/ shows data from the 20201123 Nightlies, and this bug is now the 8th most frequent hang, with about 1% of the total hang time. Yesterday the data from the 20201122 Nightlies showed this bug as the 3rd most frequent hang, with 2.13% of the total hang time. A few days ago, this was the most common hang, with more than 10% of the total hang time.
20201120094511 was the first Nightly with the patch from bug 1677399, so the timeframe is correct to say it could have had an impact, but I can't say for sure.
Reporter | ||
Comment 5•4 years ago
|
||
When looking today at BHR data (from the 20201124 Nightly), this bug is no longer visible.
Assignee | ||
Comment 6•4 years ago
|
||
The stacks I'm seeing all involve base64::decode
, so one easy improvement would be to move the decoding step off the main thread.
Assignee | ||
Comment 7•4 years ago
|
||
Telemetry indicated that setting various security state (in particular, CRLite
state) was causing main thread hangs due to base64 decoding. This patch
rearranges cert_storage slightly to do these decodings off the main thread.
Assignee | ||
Updated•4 years ago
|
Updated•4 years ago
|
Comment 9•4 years ago
|
||
bugherder |
Updated•3 years ago
|
Description
•