Closed
Bug 1363038
Opened 7 years ago
Closed 7 years ago
nsUrlClassifierDBService shutdown deadlock
Categories
(Toolkit :: Safe Browsing, enhancement, P2)
Toolkit
Safe Browsing
Tracking
()
RESOLVED
FIXED
mozilla57
Tracking | Status | |
---|---|---|
firefox57 | --- | fixed |
People
(Reporter: mcmanus, Assigned: tnguyen)
References
Details
(Whiteboard: #sbv4-m9)
Attachments
(1 file)
shutdown() does a synchronous dispatch to the url classifier worker thread SyncRunnable::DispatchToThread(gDbBackgroundThread, r); meanwhile, on the worker thread in openDB() mCryptoHash = do_CreateInstance(NS_CRYPTO_HASH_CONTRACTID, &rv); and if you dig into nsNSSComponents.cpp 101 (from that createInstance stack) you will find that when an nss instance is created off the main thread it does a synchronous dispatch to the main thread to ensure initialization (even if it is indeed initialized) so we deadlock if these things are done simultaneously. There is a shared state variable, gShuttingDownThread, that looks like it might be meant to guard against some of this but without any locking it is easy to see how it could be racy. I can trivially reproduce this by grabbing a ref to the classifier service in nsHttpHandler::Init() and then running some tests that never use http.. basically the startup of the thread is racing against its shutdown the test completes so quickly. toolkit/components/crashmonitor/test/unit/test_init.js seems to repro the problem easily. but even without my change to handler, this seems like a latent bug that has been closed WONTFIX a couple times lacking fresh data, a repro, or a theory.
Comment 1•7 years ago
|
||
I was aware of the fact that mCryptoHash = do_CreateInstance(NS_CRYPTO_HASH_CONTRACTID, &rv); will need to SyncRunnable::Dispatch to the main thread. However, only the very first "do_CreateInstance(NS_CRYPTO_HASH_CONTRACTID)" on non-main thread will need to do SyncRunnable::Dispatch to the main thread. Maybe that's why I haven't run into this potential deadlock. I did see this kind of deadlocks in other place but have got around it. As for gShuttingDownThread, I would say it's just a "hint" for making shutdown more quickly. Racing on gShuttingDownThread is totally expected and it should lead to any issue. I'll try if I can reproduce it tomorrow. Thanks for bring this up!
Updated•7 years ago
|
Assignee: nobody → hchang
Status: NEW → ASSIGNED
Priority: -- → P1
Comment 2•7 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #0) > shutdown() does a synchronous dispatch to the url classifier worker thread > > SyncRunnable::DispatchToThread(gDbBackgroundThread, r); > > meanwhile, on the worker thread in openDB() > > mCryptoHash = do_CreateInstance(NS_CRYPTO_HASH_CONTRACTID, &rv); > > and if you dig into nsNSSComponents.cpp 101 (from that createInstance stack) From what I dug a while ago, there is no "creation thread restriction" on NS_CRYPTO_HASH_CONTRACTID [1]. Instead, the enforcement in at [2], which is only enforced for the very first time. [1] http://searchfox.org/mozilla-central/rev/224cc663d54085994a4871ef464b7662e0721e83/security/manager/ssl/nsNSSModule.cpp#191 [2] http://searchfox.org/mozilla-central/rev/6580dd9a4eb0224773897388dba2ddf5ed7f4216/security/manager/ssl/nsNSSComponent.cpp#101 [3] http://searchfox.org/mozilla-central/rev/224cc663d54085994a4871ef464b7662e0721e83/toolkit/components/url-classifier/SafeBrowsing.jsm#454 > you will find that when an nss instance is created off the main thread it > does a synchronous dispatch to the main thread to ensure initialization > (even if it is indeed initialized) > > so we deadlock if these things are done simultaneously. > > There is a shared state variable, gShuttingDownThread, that looks like it > might be meant to guard against some of this but without any locking it is > easy to see how it could be racy. > > I can trivially reproduce this by grabbing a ref to the classifier service > in nsHttpHandler::Init() and then running some tests that never use http.. > basically the startup of the thread is racing against its shutdown the test > completes so quickly. toolkit/components/crashmonitor/test/unit/test_init.js > seems to repro the problem easily. > > but even without my change to handler, this seems like a latent bug that has > been closed WONTFIX a couple times lacking fresh data, a repro, or a theory. This should be introduced by bug 1339050, where I significantly changed the shutdown process.
Updated•7 years ago
|
Priority: P1 → P3
Assignee | ||
Updated•7 years ago
|
Assignee: hchang → tnguyen
Assignee | ||
Comment 3•7 years ago
|
||
(In reply to Henry Chang [:hchang] from comment #1) > I was aware of the fact that > > mCryptoHash = do_CreateInstance(NS_CRYPTO_HASH_CONTRACTID, &rv); > > will need to SyncRunnable::Dispatch to the main thread. > > However, only the very first "do_CreateInstance(NS_CRYPTO_HASH_CONTRACTID)" > on non-main thread will need to do SyncRunnable::Dispatch to the main thread. > Maybe that's why I haven't run into this potential deadlock. I did see this > kind of deadlocks in other place but have got around it. > > As for gShuttingDownThread, I would say it's just a "hint" for making > shutdown > more quickly. Racing on gShuttingDownThread is totally expected and it should > lead to any issue. > > I'll try if I can reproduce it tomorrow. Thanks for bring this up! I am trying to catch the idea and follow what Henry has done in this bug. I see the case in this bug, we still use SyncRunnable::Dispatch EnsureNSSInitializedChromeOrContent off main thread, even we are already sure nss is already initialized in mainthread. Hi dkeeler, do we really need to send SyncRunnable::Dispatch in this case? Is it risky to remove that dispatching if nss is initialized? I guess SyncRunnable::Dispatch EnsureNSSInitializedChromeOrContent will be useful to trigger NSS initialization from background at the first time (and after that we will use cached variable http://searchfox.org/mozilla-central/rev/4b79f3b23aebb4080ea85e94351fd4046116a957/security/manager/ssl/nsNSSComponent.cpp#90)
Flags: needinfo?(dkeeler)
I would start by removing the mCryptoHash members of nsUrlClassifierDBServiceWorker and ProtocolParser. The hasher is only used in SafebrowsingHash::FromPlaintext, and since that usage consists of a complete set of Init, Update, and Finish, there's no reason to keep it around. I would then either add some synchronization to ensure that the background thread can't enter that section if the browser is in shutdown or add something before the shutdown that ensures NSS has been initialized (e.g. an explicit call to EnsureNSSInitializedChromeOrContent) (but honestly that seems like a waste if all we're doing is ensuring that we don't call that code at the wrong time).
Flags: needinfo?(dkeeler)
Assignee | ||
Comment 5•7 years ago
|
||
> I would start by removing the mCryptoHash members of > nsUrlClassifierDBServiceWorker and ProtocolParser. The hasher is only used > in SafebrowsingHash::FromPlaintext, and since that usage consists of a > complete set of Init, Update, and Finish, there's no reason to keep it Thanks David for the prompt reply. If I understand your idea correctly, we are going to move the hash related to mainthread. It seems break work flow http://searchfox.org/mozilla-central/rev/6482c8a5fa5c7446e82ef187d1a1faff49e3379e/toolkit/components/url-classifier/Classifier.cpp#473 We may have to change in many interfaces, and do much more stuffs in main thread to pass correct argument to worker thread later. We may lose a perf impact, I am not sure if there's any risk. > around. I would then either add some synchronization to ensure that the > background thread can't enter that section if We had ensured background thread could enter if ff is going to shutdown phase. http://searchfox.org/mozilla-central/rev/6482c8a5fa5c7446e82ef187d1a1faff49e3379e/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#936,955 > add something before the shutdown that ensures NSS has been initialized > (e.g. an explicit call to EnsureNSSInitializedChromeOrContent) It does not seem to work because shutdown() is triggered in main thread. The problem is, no matter EnsureNSSInitializedChromeOrContent made sure of NSS initialized in main thread, EnsureNSSInitializedChromeOrConten off thread still trigger SyncRunnable::Dispatch (only for the first time of every worker thread). That is the same with comment 3.
Assignee | ||
Comment 6•7 years ago
|
||
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #5) > > It does not seem to work because shutdown() is triggered in main thread. The > problem is, no matter EnsureNSSInitializedChromeOrContent made sure of NSS > initialized in main thread, EnsureNSSInitializedChromeOrConten off thread > still trigger SyncRunnable::Dispatch (only for the first time of every > worker thread). That is the same with comment 3. The simpler idea is not dispatching from background to main thread when the worker has not been initialized yet (actually it will be bailed out and does not do anything)
Assignee | ||
Comment 7•7 years ago
|
||
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #6) > (In reply to Thomas Nguyen[:tnguyen] ni plz from comment #5) > > > > It does not seem to work because shutdown() is triggered in main thread. The > > problem is, no matter EnsureNSSInitializedChromeOrContent made sure of NSS > > initialized in main thread, EnsureNSSInitializedChromeOrConten off thread > > still trigger SyncRunnable::Dispatch (only for the first time of every > > worker thread). That is the same with comment 3. > The simpler idea is not dispatching from background to main thread when the > worker has not been initialized yet (actually it will be bailed out and does > not do anything) I could easily reproduce the deadlock by running toolkit/components/crashmonitor/test/unit/test_init.js Indeed, that was the race when we are shutting down too quickly when mClassifier is still nullptr
Comment hidden (mozreview-request) |
Assignee | ||
Comment 9•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=52c47be10708cc713d0c9ccc7a3b51f5c4798389&selectedJob=123181406
Assignee | ||
Updated•7 years ago
|
Attachment #8897268 -
Flags: review?(hchang)
Assignee | ||
Comment 10•7 years ago
|
||
I removed mCryptoHash in some places which contain full set of Init, Update, Finish and added a condition check in shutdown() and tested on my local machine (we can easily reproduce the deadlock as comment 0) Henry, could you please take a look?
Assignee | ||
Updated•7 years ago
|
Priority: P3 → P2
Whiteboard: #sbv4-m9
Assignee | ||
Comment 11•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0ef65e48d611922ccb7dcdebbf473a2f1d5479d7
Assignee | ||
Comment 12•7 years ago
|
||
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #11) > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=0ef65e48d611922ccb7dcdebbf473a2f1d5479d7 Running 20 tests on top of backed out commit in Bug 1345058 and it looks good
Assignee | ||
Comment 13•7 years ago
|
||
As discussed with Henry, we may care about the case FlushAndDisableAsyncUpdate does not work expectly. It will not happen in real life but may happen in our test, for example when ResetDatabase and ReloadDatabase are called (those are only used for test purpose). http://searchfox.org/mozilla-central/rev/13148faaa91a1c823a7d68563d9995480e714979/toolkit/components/url-classifier/tests/browser/classifierHelper.js#135 Running full try to see if we could pass all the tests https://treeherder.mozilla.org/#/jobs?repo=try&revision=3a22868a29707589998c0433947789b4b4817fa3
Assignee | ||
Comment 14•7 years ago
|
||
It looks good, except wpt failures https://treeherder.mozilla.org/#/jobs?repo=try&revision=3a22868a29707589998c0433947789b4b4817fa3&selectedJob=123446172 That's related to ipc PrincipalFlashClassifier::Result, should be fixed in Bug 1345058
Assignee | ||
Comment 15•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=23f8c1bc52187b3c05e131f57aa6344f6d1eda32
Assignee | ||
Comment 16•7 years ago
|
||
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #15) > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=23f8c1bc52187b3c05e131f57aa6344f6d1eda32 Try looks good. I did not see any racing problem between shutting down againts reseting/reloading database
Comment 17•7 years ago
|
||
mozreview-review |
Comment on attachment 8897268 [details] Bug 1363038 - Remove synchronous dispatch when worker is racing against shutdown https://reviewboard.mozilla.org/r/168552/#review176110 Given that what I am worried about is not happening [1], I am positive to this patch. However, I feel like what we have to do is just leave the IsDBOpened() function and keep mCryptoHash = do_CreateInstance(NS_CRYPTO_HASH_CONTRACTID, &rv); as it in OpenDb(). Also, we probably shouldn't null out mCryptoHash in CloseDb(). Thomas, could you give that a try? Thanks! [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1363038#c16 ::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:942 (Diff revision 1) > > nsresult rv; > - mCryptoHash = do_CreateInstance(NS_CRYPTO_HASH_CONTRACTID, &rv); > + { > + // Force PSM loading > + nsCOMPtr<nsICryptoHash> dummy = do_CreateInstance(NS_CRYPTO_HASH_CONTRACTID, &rv); > - NS_ENSURE_SUCCESS(rv, rv); > + NS_ENSURE_SUCCESS(rv, rv); I am thinking if we can just do mCryptoHash = do_CreateInstance(NS_CRYPTO_HASH_CONTRACTID, &rv); here and leave the IsDBOpened() check in Shutdown() so that we don't need to bother removing mCryptoHash everywhere.
Assignee | ||
Comment 18•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b3f6b9b62ad9227f52714affc312597fef845d05
Assignee | ||
Comment 19•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ad0fd284302d4ac612ff4048bbaf3c607debb21d
Assignee | ||
Comment 20•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8897268 [details] Bug 1363038 - Remove synchronous dispatch when worker is racing against shutdown https://reviewboard.mozilla.org/r/168552/#review176110 FF may crash when trying to release worker and mCryptoHash. I guess we may change to use weak ref, but it may not be obligated here. So I just add IsDBOpened to resolve the race issue. > I am thinking if we can just do > > mCryptoHash = do_CreateInstance(NS_CRYPTO_HASH_CONTRACTID, &rv); > > here and leave the IsDBOpened() check in Shutdown() so that we don't need to bother removing mCryptoHash everywhere. Sure, will file and do that in another bug if needed
Comment hidden (mozreview-request) |
Assignee | ||
Comment 22•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&tochange=b3f6b9b62ad9227f52714affc312597fef845d05&fromchange=3908649a27d1f9b0ae4bc9d9be396ebfe80703ff
Comment 23•7 years ago
|
||
mozreview-review |
Comment on attachment 8897268 [details] Bug 1363038 - Remove synchronous dispatch when worker is racing against shutdown https://reviewboard.mozilla.org/r/168552/#review176694 r=me given the try result looks good. We can back this simple patch out anytime if what I'm worried (nsIUrlClassifierDBService.resetDatabase/reloadDatabase) occurs. Thanks Thomas :)
Attachment #8897268 -
Flags: review?(hchang) → review+
Assignee | ||
Updated•7 years ago
|
Keywords: checkin-needed
Comment 24•7 years ago
|
||
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/autoland/rev/af78a23cda9c Remove synchronous dispatch when worker is racing against shutdown r=hchang
Keywords: checkin-needed
Comment 25•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/af78a23cda9c
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox57:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Regressions: 1681646
You need to log in
before you can comment on or make changes to this bug.
Description
•