Closed Bug 1435141 Opened 7 years ago Closed 6 years ago

Main thread IO in the parent process in nsNSSComponent::HasUserCertsInstalled

Categories

(Core :: Networking, defect, P3)

All
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla67
Performance Impact medium
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 --- fixed

People

(Reporter: mstange, Assigned: valentin)

References

Details

(Keywords: perf, perf:responsiveness, regression, Whiteboard: [necko-triaged])

Attachments

(2 files)

In this profile, the check caused 64ms of main thread jank: https://perfht.ml/2s6qPdp
Flags: needinfo?(valentin.gosu)
Assignee: nobody → valentin.gosu
Flags: needinfo?(valentin.gosu)
Whiteboard: [necko-triaged]
Are there any downsides to calling nsNSSComponent::HasUserCertsInstalled(bool& result) and nsNSSComponent::HasActiveSmartCards(bool& result) off-main-thread? I would like to use a MozPromise to check for certs off-main-thread, and CanEnableSpeculativeConnect would return false if the promise didn't complete, or actual value if it did.
Flags: needinfo?(dkeeler)
As of bug 1421084, I believe we can relax some of the main-thread-only requirements of nsNSSComponent, so yes, I think this would be doable.
Flags: needinfo?(dkeeler)
Priority: -- → P3
Comment on attachment 8950797 [details] Bug 1435141 - Check for user certificates on a background thread to avoid main thread IO https://reviewboard.mozilla.org/r/220038/#review225862 The nsNSSComponent.cpp changes look fine with some minor comments. I'm not a necko peer, so you should get one to review the netwerk/ changes. ::: security/manager/ssl/nsNSSComponent.cpp (Diff revision 1) > } > > nsresult > nsNSSComponent::HasActiveSmartCards(bool& result) > { > - MOZ_ASSERT(NS_IsMainThread(), "Main thread only"); We should also remove the comment at https://searchfox.org/mozilla-central/rev/d03ad8843e3bf2e856126bc53b0475c595e5183b/security/manager/ssl/nsNSSComponent.h#83 (and again at line 135) since these are no longer main-thread-only. ::: security/manager/ssl/nsNSSComponent.cpp:1048 (Diff revision 1) > - return NS_ERROR_NOT_SAME_THREAD; > - } > - > #ifndef MOZ_NO_SMART_CARDS > MutexAutoLock nsNSSComponentLock(mMutex); > I guess technically we should add a mInitialized check here.
Attachment #8950797 - Flags: review?(dkeeler) → review+
Comment on attachment 8950797 [details] Bug 1435141 - Check for user certificates on a background thread to avoid main thread IO https://reviewboard.mozilla.org/r/220038/#review225870 ::: security/manager/ssl/nsNSSComponent.h:134 (Diff revision 2) > #endif > > NS_IMETHOD BlockUntilLoadableRootsLoaded() override; > NS_IMETHOD CheckForSmartCardChanges() override; > > // Main thread only This one too please :)
Comment on attachment 8950797 [details] Bug 1435141 - Check for user certificates on a background thread to avoid main thread IO https://reviewboard.mozilla.org/r/220038/#review225870 > This one too please :) Heh - beat me to it.
Comment on attachment 8950797 [details] Bug 1435141 - Check for user certificates on a background thread to avoid main thread IO https://reviewboard.mozilla.org/r/220038/#review226180 ::: netwerk/protocol/http/nsHttpHandler.cpp:439 (Diff revision 3) > } > mIOService = new nsMainThreadPtrHolder<nsIIOService>( > "nsHttpHandler::mIOService", service); > > + rv = NS_NewNamedThread("nsHttpHandler", > + getter_AddRefs(mBackgroundThread)); could you use a lazy thread? this will just waste resources forever having just a single startup function a bit better name would be nice too, how about "HTTP Handler Background" ? ::: netwerk/protocol/http/nsHttpHandler.cpp:443 (Diff revision 3) > + rv = NS_NewNamedThread("nsHttpHandler", > + getter_AddRefs(mBackgroundThread)); > + if (NS_FAILED(rv)) { > + return rv; > + } > + hmm... little bit hard to fail here. you do a null check later on it. ::: netwerk/protocol/http/nsHttpHandler.cpp:594 (Diff revision 3) > nsCOMPtr<nsIParentalControlsService> pc = do_CreateInstance("@mozilla.org/parental-controls-service;1"); > if (pc) { > pc->GetParentalControlsEnabled(&mParentalControlEnabled); > } > + > + MaybeEnableSpeculativeConnect(); could you do this a bit sooner? ::: netwerk/protocol/http/nsHttpHandler.cpp:2479 (Diff revision 3) > - // Check if any 3rd party PKCS#11 module are installed, as they may produce > + // Check if any 3rd party PKCS#11 module are installed, as they may produce > - // client certificates > + // client certificates > - bool activeSmartCards = false; > + bool activeSmartCards = false; > - nsresult rv = component->HasActiveSmartCards(activeSmartCards); > + nsresult rv = component->HasActiveSmartCards(activeSmartCards); > - if (NS_FAILED(rv) || activeSmartCards) { > + if (NS_FAILED(rv) || activeSmartCards) { > - return false; > + return GenericPromise::CreateAndResolve(false, __func__); btw, if we fail to find out (NS_FAILED(rv)) do we really want to disable? ::: netwerk/protocol/http/nsHttpHandler.cpp:2515 (Diff revision 3) > + > + // Component must be initialized on the main thread. > + nsCOMPtr<nsINSSComponent> component(do_GetService(PSM_COMPONENT_CONTRACTID)); > + if (!component) { > + NS_WARNING("nsHttpHandler::MaybeEnableSpeculativeConnect() no component"); > + return; enable specul conn here then. ::: netwerk/protocol/http/nsHttpHandler.cpp:2526 (Diff revision 3) > + __func__, > + [component]() { return CanEnableSpeculativeConnect(component); }); > + > + // Update mSpeculativeConnect on the main thread. > + mSpeculativePromise->Then(GetMainThreadSerialEventTarget(), __func__, > + [self](bool aVal) { self->mSpeculativeConnectEnabled = false; }, = aVal;
Attachment #8950797 - Flags: review?(honzab.moz) → review+
Comment on attachment 8950797 [details] Bug 1435141 - Check for user certificates on a background thread to avoid main thread IO https://reviewboard.mozilla.org/r/220038/#review226180 > could you use a lazy thread? this will just waste resources forever having just a single startup function > > a bit better name would be nice too, how about "HTTP Handler Background" ? Didn't know about LazyIdleThread. Really cool. Thanks for the heads up. The reason I didn't give the thread a longer name is that NS_NewNamedThread accepts a max of 16 characters. LazyIdleThread doesn't have this limitation. > btw, if we fail to find out (NS_FAILED(rv)) do we really want to disable? HasActiveSmartCards doesn't fail unless the component isn't initialized. Unlikely to be an issue. > enable specul conn here then. Fair enough. > = aVal; Ooops!
The try run showed some failures. It seems some tests hang in nsCOMPtr<nsINSSComponent> component(do_GetService(PSM_COMPONENT_CONTRACTID)); with the terminal showing the following message: PID 28092 | [28092, Main Thread] WARNING: NSS will be initialized without a profile directory. Some things may not work as expected.: file /home/icecold/mozilla-central/security/manager/ssl/nsNSSComponent.cpp, line 1744 I'll try to touch up the patch to work around the issue.
(In reply to Valentin Gosu [:valentin] from comment #12) > The try run showed some failures. It seems some tests hang in > nsCOMPtr<nsINSSComponent> component(do_GetService(PSM_COMPONENT_CONTRACTID)); > with the terminal showing the following message: > PID 28092 | [28092, Main Thread] WARNING: NSS will be initialized without a > profile directory. Some things may not work as expected.: file > /home/icecold/mozilla-central/security/manager/ssl/nsNSSComponent.cpp, line > 1744 Usually this message means the test is running without a profile directory, or it doesn't have read/write access to the profile directory.
Blocks: 1445965
Whiteboard: [necko-triaged] → [necko-triaged][qf]
Whiteboard: [necko-triaged][qf] → [necko-triaged][qf:p1:f64]

Here's another profile where this consumes about 300ms on start-up: https://perfht.ml/2VLGcTa

Whiteboard: [necko-triaged][qf:p1:f64] → [necko-triaged][qf:p2:responsiveness]

Here is a profile of nsNSSComponent::HasUserCertsInstalled with main thread I/O markers: https://perfht.ml/2CX20Tc

I'm surprised we need to do that many read and stat calls. The stat calls are almost all on cert9.db-wal and cert9.db-journal. Would it be possible to have a copy of this file in memory to access it quickly without doing so much disk I/O?

See also bug 1478148. While it would be beneficial to improve that implementation, there are significant architectural constraints that make it difficult.

(In reply to Valentin Gosu [:valentin] from comment #23)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=204a7b073ff84b30cca8768fcc0a46f9c8d0d7ac

In my latest try patch it seems we no longer hang without a profile like before, but sometimes we get:
MOZ_CRASH(NSS_Shutdown failed)
I'm not really sure why that would happen. Could just holding the nssComponent mutex cause NSS shutdown to fail?

Flags: needinfo?(dkeeler)
See Also: → 1478148

My guess is it has something to do with that CanEnableSpeculativeConnect doesn't have an owning reference to the nsINSSComponent - there's nothing keeping that alive, from what I can tell, although the documentation says the thread will be joined at xpcom shutdown, which should happen strictly before trying to call NSS_Shutdown, so I don't really see how that would prevent NSS from shutting down successfully (I would just expect some uaf crash). It could be that that code isn't threadsafe in NSS and we're merely exercising an existing bug. I've tried to narrow it down to either checking for smartcards or checking for client certs, but only checking for one or the other doesn't seem to result in failures (although maybe I just haven't tried enough retriggers). I'll keep looking, but I don't see anything in this patch that would cause a new NSS shutdown failure (the most common cause of these is not releasing some NSS resource).

Flags: needinfo?(dkeeler)

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #27)

My guess is it has something to do with that CanEnableSpeculativeConnect doesn't have an owning reference to the nsINSSComponent - there's nothing keeping that alive, from what I can tell, although the documentation says the thread will be joined at xpcom shutdown, which should happen strictly before trying to call NSS_Shutdown, so I don't really see how that would prevent NSS from shutting down successfully (I would just expect some uaf crash). It could be that that code isn't threadsafe in NSS and we're merely exercising an existing bug. I've tried to narrow it down to either checking for smartcards or checking for client certs, but only checking for one or the other doesn't seem to result in failures (although maybe I just haven't tried enough retriggers). I'll keep looking, but I don't see anything in this patch that would cause a new NSS shutdown failure (the most common cause of these is not releasing some NSS resource).

The lambda that we pass to the thread holds a ref to the nsINSSComponent - it may not even run by the time we shutdown. That could be the cause.

Looks like I just needed to run more tests: https://treeherder.mozilla.org/#/jobs?repo=try&revision=91bdbc705eddb6fd6270b8f9004764049bc6895a indicates there may be a leak in CERT_FindUserCertsByUsage (maybe caused by a race?). Another thing I noticed is that NSS_IsInitialized() isn't protected by any kind of synchronization - we essentially race on the variable that backs it, so if nsINSSComponent is being held alive after xpcom shutdown somehow, then that check is essentially useless (but again I don't see how this could be happening because we've supposedly joined all threads at that point).

Stepping back, this mechanism is a bit fragile, as we've seen. Would it be possible to re-work things slightly so that when we're speculatively connecting, if the server asks for a client certificate, we just close the connection? (NSS doesn't directly expose a way to do this, I think, so we would have to essentially make a note that this happened in nsNSSSocketInfo that necko checks and then manually closes the connection.)

Flags: needinfo?(valentin.gosu)

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #29)

Looks like I just needed to run more tests: https://treeherder.mozilla.org/#/jobs?repo=try&revision=91bdbc705eddb6fd6270b8f9004764049bc6895a indicates there may be a leak in CERT_FindUserCertsByUsage (maybe caused by a race?). Another thing I noticed is that NSS_IsInitialized() isn't protected by any kind of synchronization - we essentially race on the variable that backs it, so if nsINSSComponent is being held alive after xpcom shutdown somehow, then that check is essentially useless (but again I don't see how this could be happening because we've supposedly joined all threads at that point).

Is CERT_FindUserCertsByUsage supposed to be thread-safe? Because this happens in all of my try runs, so it might be very racy.

Stepping back, this mechanism is a bit fragile, as we've seen. Would it be possible to re-work things slightly so that when we're speculatively connecting, if the server asks for a client certificate, we just close the connection? (NSS doesn't directly expose a way to do this, I think, so we would have to essentially make a note that this happened in nsNSSSocketInfo that necko checks and then manually closes the connection.)

Would that actually work? Looking at bug 910207, it seems like something like this was suggested before, but it didn't work.
I think the problem was that by the time we actually connect, that speculative connection might already be used by a HTTP request.

Flags: needinfo?(valentin.gosu)

It looks like what was missing from that original implementation was the part where necko says "ok, the server wanted a client certificate so we killed the speculative connection, and now that the user has actually gone to this site, we need to connect again and allow the client certificate dialog".

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #31)

It looks like what was missing from that original implementation was the part where necko says "ok, the server wanted a client certificate so we killed the speculative connection, and now that the user has actually gone to this site, we need to connect again and allow the client certificate dialog".

I think if we had a way to pass a callback into NSS that can be used to choose if to show the cert chooser or fail the connection that would be great. I suspect that's not as easy as it sounds :)

See Also: → 1533045

Dana, do you think the patch could still be used/useful?
What could I do to work around the issues in comment 29?

Flags: needinfo?(dkeeler)

It looks like that patch has bitrotted some, but I imagine it's salvageable. In particular, nsNSSComponent::mNSSInitialized doesn't exist any more, so you shouldn't need to acquire nsNSSComponent::mMutex at all. Also, the leaks may be fixed by first calling nsNSSComponent::BlockUntilLoadableRootsLoaded(); in each of HasUserCertsInstalled and HasActiveSmartCards (just to be safe).

Flags: needinfo?(dkeeler)
Blocks: 1533045
Attachment #9049150 - Attachment description: Bug 1435141 - Check for user certificates on a background thread to avoid main thread IO r=keeler!,mayhemer! → Bug 1435141 - Check for user certificates on a background thread to avoid main thread IO r=keeler!
Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/06e9cd204aac Check for user certificates on a background thread to avoid main thread IO r=keeler
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
Depends on: 1538631
Performance Impact: --- → P2
Whiteboard: [necko-triaged][qf:p2:responsiveness] → [necko-triaged]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: