Race condition between updates to the Mozilla lists and ClassifyLocalWithTables() leads to deadlock

RESOLVED FIXED in Firefox 52

Status

()

defect
P1
normal
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: francois, Assigned: hchang)

Tracking

({regression})

Trunk
mozilla53
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox51- unaffected, firefox52+ fixed, firefox53 fixed)

Details

Attachments

(3 attachments)

Reporter

Description

3 years ago
I am seeing frequent hangs on Nightly (at least once a week, usually once a day). Here's an example using Nightly 2016-11-02 on Linux64:

listmanager: 13:17:47 GMT-0700 (PDT): checkForUpdates with https://shavar.services.mozilla.com/downloads?client=navclient-auto-ffox&appver=52.0a&pver=2.2
listmanager: 13:17:47 GMT-0700 (PDT): this.tablesData: {
  "googpub-phish-proto": {
    "updateUrl": "https://safebrowsing.googleapis.com/v4/threatListUpdates:fetch?$ct=application/x-protobuf&key=[trimmed-google-api-key]",
    "gethashUrl": "https://safebrowsing.googleapis.com/v4/fullHashes:find?$req=%REQUEST_BASE64%&$ct=application/x-protobuf&key=[trimmed-google-api-key]",
    "provider": "google4"
  },
...
}
listmanager: 13:17:47 GMT-0700 (PDT): existing chunks: test-trackwhite-simple;a:1
mozstd-trackwhite-digest256;a:1472162826
goog-badbinurl-shavar;a:111809-114434:s:114307-114314,114316-114319...
test-track-simple;a:1-2
base-track-digest256;a:1471874828
test-malware-simple;a:1
goog-malware-shavar;a:242837-251875:s:238338-238711,238713-239044,239046-239130...
goog-unwanted-shavar;a:63014-80389:s:56502,56507,56509,56511-56513...
goog-phish-shavar;a:458907-463758:s:286633-286890,286892-286931,286933-287066...
test-unwanted-simple;a:1
mozplugin-block-digest256;a:1471849627
test-block-simple;a:1
test-phish-simple;a:1
goog-malware-proto;Cg0IARAGGAEiAzAwMTABEJeoARoCGAYoGJZz:ZgS+9xtHHaRD/Ys1tbna7jLNHmx/SmpAh2flW1kuyUU=
googpub-phish-proto;Cg0IAhAGGAEiAzAwMTABEIGkARoCGAakvame:jGKcM/6uh+Uf3cGHvqPwffSLwaxDEMjPg701Nz48vZ4=
goog-unwanted-proto;Cg0IAxAGGAEiAzAwMTABEJmPARoCGAaluotS:V4NLJ9Kg9S9TtqEoNO+JSWJg/JM0KlvxolDDOS9iTNs=

listmanager: 13:17:47 GMT-0700 (PDT): update request: {
  "tableList": "base-track-digest256,mozstd-trackwhite-digest256,mozplugin-block-digest256",
  "tableNames": {},
  "requestPayload": "mozstd-trackwhite-digest256;a:1472162826\nbase-track-digest256;a:1471874828\nmozplugin-block-digest256;a:1471849627\n",
  "isPostRequest": true
}

listmanager: 13:17:47 GMT-0700 (PDT): makeUpdateRequestForEntry_: requestPayload mozstd-trackwhite-digest256;a:1472162826
base-track-digest256;a:1471874828
mozplugin-block-digest256;a:1471849627
 update: https://shavar.services.mozilla.com/downloads?client=navclient-auto-ffox&appver=52.0a&pver=2.2 tablelist: base-track-digest256,mozstd-trackwhite-digest256,mozplugin-block-digest256


After attaching to the firefox process using gdb, I got the following backtrace:

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007fa2d20d7b4c in PR_WaitCondVar () from /home/francois/devel/nightly/libnspr4.so
#2  0x00007fa2c541b6f5 in mozilla::SyncRunnable::DispatchToThread(nsIEventTarget*, bool) ()
   from /home/francois/devel/nightly/libxul.so
#3  0x00007fa2c541b7b7 in mozilla::SyncRunnable::DispatchToThread(nsIEventTarget*, nsIRunnable*, bool) ()
   from /home/francois/devel/nightly/libxul.so
#4  0x00007fa2c6b60eee in UrlClassifierDBServiceWorkerProxy::DoLocalLookup(nsACString_internal const&, nsACString_internal const&, nsTArray<mozilla::safebrowsing::LookupResult>*) ()
   from /home/francois/devel/nightly/libxul.so
#5  0x00007fa2c6b667a3 in nsUrlClassifierDBService::ClassifyLocalWithTables(nsIURI*, nsACString_internal const&, nsACString_internal&) () from /home/francois/devel/nightly/libxul.so
#6  0x00007fa2c7be26e6 in mozilla::net::nsHttpChannel::BeginConnect() ()
   from /home/francois/devel/nightly/libxul.so
#7  0x00007fa2c7be2ecc in mozilla::net::nsHttpChannel::OnProxyAvailable(nsICancelable*, nsIChannel*, nsIProxyInfo*, nsresult) () from /home/francois/devel/nightly/libxul.so
#8  0x00007fa2c7b8e756 in mozilla::net::nsAsyncResolveRequest::DoCallback() ()
   from /home/francois/devel/nightly/libxul.so
#9  0x00007fa2c7b8ecef in mozilla::net::nsProtocolProxyService::AsyncResolveInternal(nsIChannel*, unsigned int, nsIProtocolProxyCallback*, nsICancelable**, bool) () from /home/francois/devel/nightly/libxul.so
#10 0x00007fa2c7bd02fe in mozilla::net::nsHttpChannel::ResolveProxy() ()
   from /home/francois/devel/nightly/libxul.so
#11 0x00007fa2c7be3171 in mozilla::net::nsHttpChannel::AsyncOpen(nsIStreamListener*, nsISupports*) ()
   from /home/francois/devel/nightly/libxul.so
#12 0x00007fa2c7be32c2 in mozilla::net::nsHttpChannel::AsyncOpen2(nsIStreamListener*) ()
   from /home/francois/devel/nightly/libxul.so
#13 0x00007fa2c55c7c2a in mozilla::net::HttpChannelParent::InvokeAsyncOpen(nsresult) ()
   from /home/francois/devel/nightly/libxul.so
#14 0x00007fa2c55c8d71 in mozilla::net::HttpChannelParent::DoAsyncOpen(mozilla::ipc::URIParams const&, mozilla::ipc::OptionalURIParams const&, mozilla::ipc::OptionalURIParams const&, mozilla::ipc::OptionalURIParams const&, unsigned int const&, mozilla::ipc::OptionalURIParams const&, mozilla::ipc::OptionalURIParams const&, unsigned int const&, nsTArray<mozilla::net::RequestHeaderTuple> const&, nsCString const&, mozilla::ipc::OptionalIPCStream const&, bool const&, unsigned short const&, unsigned int const&, unsigned char const&, bool const&, bool const&, unsigned int const&, bool const&, unsigned long const&, nsCString const&, bool const&, nsCString const&, bool const&, bool const&, mozilla::net::OptionalLoadInfoArgs const&, mozilla::net::OptionalHttpResponseHead const&, nsCString const&, unsigned int const&, nsCString const&, mozilla::net::OptionalCorsPreflightArgs const&, unsigned int const&, bool const&, bool const&, bool const&, nsCString const&, nsCString const&, nsCString const&) () from /home/francois/devel/nightly/libxul.so
#15 0x00007fa2c55c929c in mozilla::net::HttpChannelParent::Init(mozilla::net::HttpChannelCreationArgs const&) () from /home/francois/devel/nightly/libxul.so
#16 0x00007fa2c56e7e6d in mozilla::net::PNeckoParent::OnMessageReceived(IPC::Message const&) ()
   from /home/francois/devel/nightly/libxul.so
#17 0x00007fa2c5818fb8 in mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const&) ()
   from /home/francois/devel/nightly/libxul.so
#18 0x00007fa2c72d7067 in mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) ()
   from /home/francois/devel/nightly/libxul.so
#19 0x00007fa2c72d89fa in mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) ()
   from /home/francois/devel/nightly/libxul.so
#20 0x00007fa2c72dafac in mozilla::ipc::MessageChannel::MessageTask::Run() ()
   from /home/francois/devel/nightly/libxul.so
#21 0x00007fa2c725fd6b in nsThread::ProcessNextEvent(bool, bool*) ()
   from /home/francois/devel/nightly/libxul.so
#22 0x00007fa2c727569c in NS_ProcessNextEvent(nsIThread*, bool) ()
   from /home/francois/devel/nightly/libxul.so
#23 0x00007fa2c72d6c7d in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ()
   from /home/francois/devel/nightly/libxul.so
#24 0x00007fa2c7c5b6d3 in MessageLoop::Run() () from /home/francois/devel/nightly/libxul.so
#25 0x00007fa2c6614b4a in nsBaseAppShell::Run() () from /home/francois/devel/nightly/libxul.so
#26 0x00007fa2c6b4636b in nsAppStartup::Run() () from /home/francois/devel/nightly/libxul.so
#27 0x00007fa2c6b8287c in XREMain::XRE_mainRun() () from /home/francois/devel/nightly/libxul.so
#28 0x00007fa2c6b82b5d in XREMain::XRE_main(int, char**, nsXREAppData const*) ()
   from /home/francois/devel/nightly/libxul.so
#29 0x00007fa2c6b82dc6 in XRE_main () from /home/francois/devel/nightly/libxul.so
#30 0x000000000040fca1 in do_main(int, char**, char**, nsIFile*) ()
#31 0x000000000040c08a in main ()

It looks like a deadlock between an update of the TP list and a call to ClassifyLocalWithTables().
Reporter

Comment 1

3 years ago
gcp, have you looked into deadlocks like this before? I'm not sure what the best way to debug this is.

I don't yet have a way to reproduce this reliably, but I imagine I could instrument whatever locks we have on the URL Classifier DB and PR_LOG() them?
Flags: needinfo?(gpascutto)
Reporter

Updated

3 years ago
Blocks: 1305486
Hi francois,
Do you have the backtrace of safebrowsing background thread ?
Comment hidden (typo)

Comment 4

3 years ago
SyncRunnable() can be used only if we're certain that the target will never block on sending an event to the source thread.
I saw that ClassifyLocalWithTables, main thread try to dispatch to worker thread by sync-dispatching in [2]
And updating list, worker thread also tries to dispatch to main thread [1]

Any possible deadlock here?
[1] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierProxies.cpp#142
[2] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/Classifier.cpp#169
Assignee

Comment 5

3 years ago
(In reply to Thomas Nguyen[:tnguyen] (use ni? plz) from comment #4)
> SyncRunnable() can be used only if we're certain that the target will never
> block on sending an event to the source thread.
> I saw that ClassifyLocalWithTables, main thread try to dispatch to worker
> thread by sync-dispatching in [2]
> And updating list, worker thread also tries to dispatch to main thread [1]
> 
> Any possible deadlock here?
> [1]
> https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-
> classifier/nsUrlClassifierProxies.cpp#142
> [2]
> https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-
> classifier/Classifier.cpp#169

Makes sense to me!


[On the main thread]

nsChannelClassifier::IsTrackerWhitelisted()
  nsUrlClassifierDBService::ClassifyLocalWithTables()
    UrlClassifierDBServiceWorkerProxy::DoLocalLookup()
      (being blocked the background thread)


[On background thread]

nsUrlClassifierDBServiceWorker::DoLocalLookup
  Classifier::Check()
    Classifier::GetLookupCache()
      LookupCache::LookupCache()
        LookupCache::UpdateRootDirHandle()
          Classifier::GetPrivateStoreDirectory()
            Classifier::GetProviderByTableName()
              (being blocked by the main thread)

Ideally, what we should fix is to avoid the main thread from being blocked
by the background thread. However, since it's not trivial to make 
nsChannelClassifier::IsTrackerWhitelisted a sync call, I think what we have
to fix is "Classifier::GetProviderByTableName" ...

Updated

3 years ago
See Also: → 1288633
Assignee

Updated

3 years ago
Assignee: nobody → hchang
Assignee

Updated

3 years ago
Attachment #8807469 - Flags: review?(francois)

Updated

3 years ago
Flags: needinfo?(gpascutto)

Updated

3 years ago
Blocks: 1288633
Assignee

Comment 7

3 years ago
That facts that we can't change are:

1) On the main thread, nsChannelClassifier::IsTrackerWhitelisted() is a sync call 
   which must be blocked by the SB worker thread.
2) On the worker thread, nsUrlClassifierDBServiceWorker::DoLocalLookup() *might* wait 
   the main thread for fetching preferences. (The actual point is the first time we 
   create LookupCache.)

I took the approach to avoid (2) since avoiding (1) seems impossible without entirely
changing the architecture. So, to avoid (2), the solution could be

a) Find some places (still on the worker thread but where we are certain that the main 
   thread is not waiting for the us) to read the preferences.
b) Read the preferences on the main thread and propagate to where they are needed.

Obviously (b) is more practical and simple and that's what the patch is doing:

1) Read the prefs and build the "table name => provider" mapping in [1]. It's guaranteed
   early enough and on the main thread.
2) Propagate the built map to Classifer in all OpenDb().
3) Once Classifer has the "table name to provider" mapping, it can propagate the corresponding
   provider to all LookupCache and HashStore whenever needed.

[1] https://dxr.mozilla.org/mozilla-central/rev/3b80868f7a8fe0361918a814fbbbfb9308ae0c0a/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#158
Comment hidden (mozreview-request)
Reporter

Comment 9

3 years ago
mozreview-review
Comment on attachment 8807469 [details]
Bug 1315097 - Build the provider dictionary on the main thread to be used everywhere.

https://reviewboard.mozilla.org/r/90606/#review90548

Your approach looks pretty sound to me. Doing the pref lookup once at startup seems much better than every time we need to convert a list to a provider.

I would however like gcp to take a good look at this too to make sure he doesn't see anything wrong with this.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:225
(Diff revision 2)
>  {
>    mGethashNoise = aGethashNoise;
>    mCacheDir = aCacheDir;
> +  nsresult rv = ReadProvidersFromPrefs(mProviderDict);
> +  if (NS_FAILED(rv)) {
> +    LOG(("Failed to build provider dictionary."));

Maybe we should add an assert here too. Things will be pretty broken if we can't read providers.
Attachment #8807469 - Flags: review?(francois) → review+
Reporter

Updated

3 years ago
Attachment #8807469 - Flags: review?(gpascutto)
Reporter

Comment 10

3 years ago
Henry, could you please ensure that this gets uplifted to 51 once it lands?

As far as I can tell, we landed GetProviderByTableName() in bug 1254763 which means it has already made it to aurora.
Flags: needinfo?(hchang)
Keywords: regression

Comment 11

3 years ago
mozreview-review
Comment on attachment 8807469 [details]
Bug 1315097 - Build the provider dictionary on the main thread to be used everywhere.

https://reviewboard.mozilla.org/r/90606/#review90554

Obviously I don't object to this as I've actually recommended this as an alternate approach in the original bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1254763#c14

But note the remark below about observing pref changes. I think the table-generation call is in the wrong place.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:223
(Diff revision 2)
> -nsUrlClassifierDBServiceWorker::Init(uint32_t aGethashNoise,
> +nsUrlClassifierDBServiceWorker::Init(uint32_t aGethashNoise, nsCOMPtr<nsIFile> aCacheDir)
> -                                     nsCOMPtr<nsIFile> aCacheDir)
>  {
>    mGethashNoise = aGethashNoise;
>    mCacheDir = aCacheDir;
> +  nsresult rv = ReadProvidersFromPrefs(mProviderDict);

I think you need to call this in "nsUrlClassifierDBService::ReadTablesFromPrefs()" too for cases where we change the prefs at runtime.

In fact, maybe that's the place where the call should live in the first place, given that nsUrlClassifierDBService will call it before it calls this?
Attachment #8807469 - Flags: review?(gpascutto) → review+
Reporter

Comment 12

3 years ago
(In reply to Dimi Lee[:dimi][:dlee] from comment #2)
> Do you have the backtrace of safebrowsing background thread ?

Here it is:

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f3b332d803e in PR_WaitCondVar () from /home/francois/devel/nightly/libnspr4.so
#2  0x00007f3b266aae3f in mozilla::SyncRunnable::DispatchToThread(nsIEventTarget*, bool) ()
   from /home/francois/devel/nightly/libxul.so
#3  0x00007f3b266aaf01 in mozilla::SyncRunnable::DispatchToThread(nsIEventTarget*, nsIRunnable*, bool) ()
   from /home/francois/devel/nightly/libxul.so
#4  0x00007f3b290ff019 in mozilla::safebrowsing::Classifier::GetPrivateStoreDirectory(nsIFile*, nsACString_internal const&, nsIFile**) () from /home/francois/devel/nightly/libxul.so
#5  0x00007f3b290ff269 in mozilla::safebrowsing::LookupCache::UpdateRootDirHandle(nsIFile*) ()
   from /home/francois/devel/nightly/libxul.so
#6  0x00007f3b29101b96 in mozilla::safebrowsing::Classifier::SetupPathNames() ()
   from /home/francois/devel/nightly/libxul.so
#7  0x00007f3b27da098e in mozilla::safebrowsing::Classifier::RemoveBackupTables() ()
   from /home/francois/devel/nightly/libxul.so
#8  0x00007f3b29108133 in mozilla::safebrowsing::Classifier::ApplyUpdates(nsTArray<mozilla::safebrowsing::TableUpdate*>*) () from /home/francois/devel/nightly/libxul.so
#9  0x00007f3b27da6d55 in nsUrlClassifierDBServiceWorker::FinishUpdate() ()
   from /home/francois/devel/nightly/libxul.so
#10 0x00007f3b290fea27 in mozilla::detail::RunnableMethodImpl<nsresult (nsUrlClassifierDBServiceWorker::*)(), true, false>::Run() () from /home/francois/devel/nightly/libxul.so
#11 0x00007f3b2849a77b in nsThread::ProcessNextEvent(bool, bool*) ()
   from /home/francois/devel/nightly/libxul.so
#12 0x00007f3b284af5bc in NS_ProcessNextEvent(nsIThread*, bool) ()
   from /home/francois/devel/nightly/libxul.so
#13 0x00007f3b2850bc81 in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*)
    () from /home/francois/devel/nightly/libxul.so
#14 0x00007f3b28e51c33 in MessageLoop::Run() () from /home/francois/devel/nightly/libxul.so
#15 0x00007f3b28d6136d in nsThread::ThreadFunc(void*) () from /home/francois/devel/nightly/libxul.so
#16 0x00007f3b332d8d7c in _pt_root () from /home/francois/devel/nightly/libnspr4.so
#17 0x00007f3b349370a4 in start_thread (arg=0x7f3afb4ff700) at pthread_create.c:309
#18 0x00007f3b33a3e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Assignee

Comment 13

3 years ago
mozreview-review-reply
Comment on attachment 8807469 [details]
Bug 1315097 - Build the provider dictionary on the main thread to be used everywhere.

https://reviewboard.mozilla.org/r/90606/#review90554

> I think you need to call this in "nsUrlClassifierDBService::ReadTablesFromPrefs()" too for cases where we change the prefs at runtime.
> 
> In fact, maybe that's the place where the call should live in the first place, given that nsUrlClassifierDBService will call it before it calls this?

My first approach was exactly as what you suggested:

1) Read in DBService::Init and listen to the prefs change.
2) Either copy or transfer the ownership to DBServiceWorker for propagating to Classifer/LookupCache/HashStore
3) On pref change, generate the mapping again
4) Propagate the up-to-date pref to Classifier/LookupCache/HashStore

However, step (4) seems non-trivial since the Classifier/LookupCache/HashStore are singletons. If the prefs change, we can easily update the
mapping owned by DBServive but hard to propagate to Classifier/LookupCache/HashStore since the they (Classifer/HashStore/LookupCache) mostly consume the mapping on the worker thread. If we choose only update the mapping on the main thread with propagating to the worker thread, things may be a little inconsistent. Do you think which is worth doing?
Assignee

Comment 14

3 years ago
This seems to be what we guess in comment 5. Pretty confident now that the patch will solve the race condition!

(In reply to François Marier [:francois] from comment #12)
> (In reply to Dimi Lee[:dimi][:dlee] from comment #2)
> > Do you have the backtrace of safebrowsing background thread ?
> 
> Here it is:
> 
> #0  pthread_cond_wait@@GLIBC_2.3.2 () at
> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1  0x00007f3b332d803e in PR_WaitCondVar () from
> /home/francois/devel/nightly/libnspr4.so
> #2  0x00007f3b266aae3f in
> mozilla::SyncRunnable::DispatchToThread(nsIEventTarget*, bool) ()
>    from /home/francois/devel/nightly/libxul.so
> #3  0x00007f3b266aaf01 in
> mozilla::SyncRunnable::DispatchToThread(nsIEventTarget*, nsIRunnable*, bool)
> ()
>    from /home/francois/devel/nightly/libxul.so
> #4  0x00007f3b290ff019 in
> mozilla::safebrowsing::Classifier::GetPrivateStoreDirectory(nsIFile*,
> nsACString_internal const&, nsIFile**) () from
> /home/francois/devel/nightly/libxul.so
> #5  0x00007f3b290ff269 in
> mozilla::safebrowsing::LookupCache::UpdateRootDirHandle(nsIFile*) ()
>    from /home/francois/devel/nightly/libxul.so
> #6  0x00007f3b29101b96 in
> mozilla::safebrowsing::Classifier::SetupPathNames() ()
>    from /home/francois/devel/nightly/libxul.so
> #7  0x00007f3b27da098e in
> mozilla::safebrowsing::Classifier::RemoveBackupTables() ()
>    from /home/francois/devel/nightly/libxul.so
> #8  0x00007f3b29108133 in
> mozilla::safebrowsing::Classifier::ApplyUpdates(nsTArray<mozilla::
> safebrowsing::TableUpdate*>*) () from /home/francois/devel/nightly/libxul.so
> #9  0x00007f3b27da6d55 in nsUrlClassifierDBServiceWorker::FinishUpdate() ()
>    from /home/francois/devel/nightly/libxul.so
> #10 0x00007f3b290fea27 in mozilla::detail::RunnableMethodImpl<nsresult
> (nsUrlClassifierDBServiceWorker::*)(), true, false>::Run() () from
> /home/francois/devel/nightly/libxul.so
> #11 0x00007f3b2849a77b in nsThread::ProcessNextEvent(bool, bool*) ()
>    from /home/francois/devel/nightly/libxul.so
> #12 0x00007f3b284af5bc in NS_ProcessNextEvent(nsIThread*, bool) ()
>    from /home/francois/devel/nightly/libxul.so
> #13 0x00007f3b2850bc81 in
> mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*)
>     () from /home/francois/devel/nightly/libxul.so
> #14 0x00007f3b28e51c33 in MessageLoop::Run() () from
> /home/francois/devel/nightly/libxul.so
> #15 0x00007f3b28d6136d in nsThread::ThreadFunc(void*) () from
> /home/francois/devel/nightly/libxul.so
> #16 0x00007f3b332d8d7c in _pt_root () from
> /home/francois/devel/nightly/libnspr4.so
> #17 0x00007f3b349370a4 in start_thread (arg=0x7f3afb4ff700) at
> pthread_create.c:309
> #18 0x00007f3b33a3e62d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Flags: needinfo?(hchang)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Assignee

Comment 17

3 years ago
Francois, gcp,

I revised my approach because of the following reasons:

1) We need to rebuild the mapping on prefs change (comment 11)
2) There's a coming requirement that StreamUpdater also needs the provider mapping.

So, instead of having a persistent mapping in DBServiceWorker and Classifier,
only one mapping is maintained in nsUrlClassifierUtils in the new patch and
can be lookup up by nsIUrlClassifierUtils. Given that nsIUrlClassifierUtils
is a thread-safe XPCOM service [1], nsIUrlClassifierUtils is a suitable place
to put this mapping.

One of the issue to maintaining the mapping in nsIUrlClassifierUtils is
we need to ensure the provider lookup thread-safe. So, a mutex associated
with the mapping is added to nsUrlClassifierUtils. 

Another issue is we also need to ensure the first use of nsIUrlClassifierUtils
is on the main thread. This is guaranteed by explicitly loading it in 
nsUrlClassifierDBService::Init(), where no work-thread operations 
would occur before.

Actually the revised patch is still imperfect. The per-table LookupCache
will have persistent |mProvider| even if the pref changes. I don't think
this is that bad because otherwise we may need to reload the LookupCache
from other per-provider folder, which might bring other issue.

(To be honest, I don't have an answer to "if we really need to be able to change 
all of these at runtime" [2]?)

Could you have another look again especially the changes in nsUrlClassifierUtils.cpp?
Thanks!


[1] https://dxr.mozilla.org/mozilla-central/rev/336759fad4621dfcd0a3293840edbed67018accd/toolkit/components/url-classifier/nsUrlClassifierUtils.h#47
[2] https://dxr.mozilla.org/mozilla-central/rev/336759fad4621dfcd0a3293840edbed67018accd/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#1304
Comment hidden (mozreview-request)
Assignee

Updated

3 years ago
Flags: needinfo?(gpascutto)
Flags: needinfo?(francois)

Comment 19

3 years ago
mozreview-review
Comment on attachment 8807469 [details]
Bug 1315097 - Build the provider dictionary on the main thread to be used everywhere.

https://reviewboard.mozilla.org/r/90606/#review93184

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1262
(Diff revisions 2 - 5)
>    nsresult rv;
>    nsCOMPtr<nsICryptoHash> dummy = do_CreateInstance(NS_CRYPTO_HASH_CONTRACTID, &rv);
>    NS_ENSURE_SUCCESS(rv, rv);
>  
> +  {
> +    // Force nsIUrlClassifierUtils loading on main thread.

Maybe scope the above PSM loading in the same manner, for consistency.

::: toolkit/components/url-classifier/tests/gtest/TestLookupCacheV4.cpp:35
(Diff revision 5)
>    nsCOMPtr<nsIFile> file;
>    NS_GetSpecialDirectory(NS_APP_USER_PROFILE_50_DIR, getter_AddRefs(file));
>  
>    file->AppendNative(GTEST_SAFEBROWSING_DIR);
>  
> -  UniquePtr<LookupCacheV4> cache = MakeUnique<LookupCacheV4>(GTEST_TABLE, file);
> +  UniquePtr<LookupCacheV4> cache = MakeUnique<LookupCacheV4>(GTEST_TABLE, EmptyCString(), file);

What's the EmptyCString? Feels like I'm missing a change to the constructor?
Reporter

Comment 20

3 years ago
(In reply to Henry Chang [:henry][:hchang] from comment #17)
> Could you have another look again especially the changes in
> nsUrlClassifierUtils.cpp?

Seems like a good approach.
Status: NEW → ASSIGNED
Flags: needinfo?(francois)
Assignee

Comment 21

3 years ago
Thanks you two! I'll be landing this patch right away after addressing gcp's comment 19.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 25

3 years ago
mozreview-review
Comment on attachment 8807469 [details]
Bug 1315097 - Build the provider dictionary on the main thread to be used everywhere.

https://reviewboard.mozilla.org/r/90606/#review93524

::: toolkit/components/url-classifier/Classifier.cpp:716
(Diff revisions 5 - 8)
> +#ifdef MOZ_SAFEBROWSING_DUMP_FAILED_UPDATES
> +
> +already_AddRefed<nsIFile>
> +Classifier::GetFailedUpdateDirectroy()
> +{
> +  nsCString failedUpdatekDirName = STORE_DIRECTORY + nsCString("-failedupdate");

NS_LITERAL_CSTRING?

::: toolkit/components/url-classifier/Classifier.cpp:731
(Diff revisions 5 - 8)
> +}
> +
> +nsresult
> +Classifier::DumpRawTableUpdates(const nsACString& aRawUpdates)
> +{
> +  // DumpFailedUpdate() MUST be called first to create the

Presumably it could be a member variable then, that you can check for null. That would also get rid of duplicating the construction of its nsIFile both in DumpRawTableUpdates and DumpFailedUpdate.

::: toolkit/components/url-classifier/Classifier.cpp:741
(Diff revisions 5 - 8)
> +  nsCOMPtr<nsIFile> failedUpdatekDirectory = GetFailedUpdateDirectroy();
> +
> +  // Create tableupdate.bin and dump raw table update data.
> +  nsCOMPtr<nsIFile> rawTableUpdatesFile;
> +  nsCOMPtr<nsIOutputStream> outputStream;
> +  if (NS_FAILED(failedUpdatekDirectory->Clone(getter_AddRefs(rawTableUpdatesFile))) ||

I'm not a fan of this chaining of functions with side effects as a combined conditional for the if. Note that the log won't point out what exactly failed either.
Flags: needinfo?(gpascutto)

Comment 26

3 years ago
Pushed by hchang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3e719c86ef24
Build the provider dictionary on the main thread to be used everywhere. r=francois,gcp

Comment 27

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/3e719c86ef24
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Assignee

Comment 30

3 years ago
Comment on attachment 8811600 [details] [diff] [review]
Bug1315097-uplift-for-beta51.patch

Approval Request Comment
[Feature/regressing bug #]: Bug 1315097
[User impact if declined]: A deadlock might occur
[Describe test coverage new/current, TreeHerder]: gtest
[Risks and why]: No
[String/UUID change made/needed]: No
Attachment #8811600 - Flags: approval-mozilla-beta?
Assignee

Comment 31

3 years ago
Comment on attachment 8811619 [details] [diff] [review]
Bug1315097-uplift-for-aurora52.patch

Approval Request Comment
[Feature/regressing bug #]: Bug 1315097 (regression bug)
[User impact if declined]: Deadlock may occur
[Describe test coverage new/current, TreeHerder]: gtest
[Risks and why]: No
[String/UUID change made/needed]: No
Attachment #8811619 - Flags: approval-mozilla-aurora?
Hi Henry,
First, the feature/regressing bug should not be the bug itself.
Second, because the patch is huge, the risk should not be "no".
Third, given 51 is in beta, I need better justification for beta, otherwise, I would rather let this patch ride the train on Aurora52.

Are there any reproduced steps for this issue? I would like to have QA to verify this.
Flags: needinfo?(hchang)
Assignee

Comment 33

3 years ago
(In reply to Gerry Chang [:gchang] from comment #32)
> Hi Henry,
> First, the feature/regressing bug should not be the bug itself.

This patch is to fix the regression caused by Bug 1254763. So should this field
be Bug 1254763.

> Second, because the patch is huge, the risk should not be "no".

The only risk I can think of is this patch might introduce another
deadlock since a mutex is use.

> Third, given 51 is in beta, I need better justification for beta, otherwise,
> I would rather let this patch ride the train on Aurora52.
> 
> Are there any reproduced steps for this issue? I would like to have QA to
> verify this.

Unfortunately there's no specific steps to reproduce. The only way to reproduce
is just to browser pages. Francois, do you remember any steps which is prone
to cause this issue?
Flags: needinfo?(hchang) → needinfo?(francois)
Reporter

Comment 34

3 years ago
(In reply to Henry Chang [:henry][:hchang] from comment #33)
> Unfortunately there's no specific steps to reproduce. The only way to
> reproduce
> is just to browser pages. Francois, do you remember any steps which is prone
> to cause this issue?

I have tracking protection enabled globally in my profile and in the course of normal day-to-day browsing, I hit this deadlock on average once or twice a day. It's not guaranteed to happen, sometimes it skips a day, but it's quite frequent and frequently I get another deadlock immediately after restarting Firefox.

So it's quite a bad regression for anybody using tracking protection (including anybody using Private Browsing I assume).
Flags: needinfo?(francois)
Comment on attachment 8811619 [details] [diff] [review]
Bug1315097-uplift-for-aurora52.patch

I'd like this to bake on Aurora a bit more before considering uplifting to
Beta51. Aurora52+.
Attachment #8811619 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Gerry is worried about the risk of uplifting to 51 (too many code changes).
And I talked to Henry offline.  It is fine that we don't uplift this fix to 51.
Henry, please help to elaborate the reason.
Flags: needinfo?(hchang)
Assignee

Comment 38

3 years ago
Re-checking the code that might cause deadlock, only when goog-*-proto tables are present may cause the deadlock [1]. So, the impact without this patch should be, when user modifies the preferences 
to add goog-*-proto tables, the browser may freeze due to deadlock. 

Ni'ing francois to see if he agrees not to uplift the patch to beta51.

[1] http://hg.mozilla.org/releases/mozilla-beta/file/tip/toolkit/components/url-classifier/Classifier.cpp#l180
Flags: needinfo?(hchang) → needinfo?(francois)
Reporter

Comment 39

3 years ago
(In reply to Henry Chang [:henry][:hchang] from comment #38)
> Re-checking the code that might cause deadlock, only when goog-*-proto
> tables are present may cause the deadlock [1]. So, the impact without this
> patch should be, when user modifies the preferences 
> to add goog-*-proto tables, the browser may freeze due to deadlock. 

You're right, I missed that. We don't need this on beta.
Flags: needinfo?(francois)
Reporter

Updated

3 years ago
Attachment #8811600 - Flags: approval-mozilla-beta?
Track 51- as 51 is unaffected.
Version: unspecified → Trunk
You need to log in before you can comment on or make changes to this bug.