Closed Bug 1361627 Opened 8 years ago Closed 8 years ago

UrlClassifierDBServiceWorkerProxy::DoLocalLookup does sync dispatch to off-main-thread that may take 8 seconds

Categories

(Toolkit :: Safe Browsing, enhancement, P1)

enhancement

Tracking

()

RESOLVED INVALID
Performance Impact ?

People

(Reporter: cpearce, Assigned: hchang)

References

Details

(Whiteboard: [bhr])

Attachments

(1 file)

Michael Layzell has added Background Hang Reporter telemetry which detects when the Gecko main thread is blocked for at least 8 seconds, and when that happens it reports a pseudo stack of the main thread. We have 5 reports in the report from 2017-04-29 report [1] that show that we're blocked in UrlClassifierDBServiceWorkerProxy::DoLocalLookup. This does a sync dispatch to another thread from the main thread. I'd guess the sync thread dispatch is taking longer than the author of the original code realized. I've attached the pseudo stacks grep'd out of [2] that show how we entered this function. We should *not* be doing sync thread dispatch, it generally leads to trouble, and especially we should not be blocking the main thread while doing it. Blocking the main thread can jank rendering, JS execution, etc. Note that synchronizing whatever structure is off-main-thread that you're blocking to access here via a lock can still result in a "hang" here, as whatever causes the 8 second hang here could equally be holding the lock while the main thread is blocked waiting to acquire the lock. One possible way to fix this would be to maintain a "mirror" of the off-main-thread state on the main thread. We have existing infrastructure to make this easy: http://searchfox.org/mozilla-central/source/xpcom/threads/StateMirroring.h [1] https://people-mozilla.org/~mlayzell/bhr/20170429.html [2] (all data) https://people-mozilla.org/~mlayzell/bhr/20170429/hangs.json
I am going to discuss this topic in two aspects 1) There is only one last use of UrlClassifierDBServiceWorkerProxy::DoLocalLookup and I am working on removing that in bug 1345058. In fact, I have removed two other (indirect) uses of DoLocalLookup. The fix is essentially to replace the sync API with the async one. 2) Since Bug 1339050 has been fixed, even if we are still using the sync API, the main thread blocking shouldn't have been 8 seconds. However, looking at the call stack [1], I am guessing the profiling result was not come out based on nightly build since nsHttpChannel::BeginConnect() no longer uses nsUrlClassifierDBService::ClassifyLocalWithTables. [1] https://people-mozilla.org/~mlayzell/bhr/20170429/585.html [2] http://searchfox.org/mozilla-central/rev/abe68d5dad139e376d5521ca1d4b7892e1e7f1ba/netwerk/protocol/http/nsHttpChannel.cpp#6012
Hi Chris, May I know what specific build the report came from? My fixes mentioned in comment 1 (at least bug 1339050 and nsHttpChannel::BeginConnect()) were landed not very long ago but should have been landed on nightly. Thanks :)
Flags: needinfo?(cpearce)
Assignee: nobody → hchang
Status: NEW → ASSIGNED
(In reply to Henry Chang [:henry][:hchang] from comment #2) > Hi Chris, > > May I know what specific build the report came from? My fixes mentioned in > comment 1 > (at least bug 1339050 and nsHttpChannel::BeginConnect()) were landed not > very long ago > but should have been landed on nightly. > > Thanks :) Hi Henry. Sorry, the BHR report dump doesn't include that. We'll get that for future reports. (In reply to Henry Chang [:henry][:hchang] from comment #1) > I am going to discuss this topic in two aspects > > 1) There is only one last use of > UrlClassifierDBServiceWorkerProxy::DoLocalLookup > and I am working on removing that in bug 1345058. In fact, I have > removed two other (indirect) uses of DoLocalLookup. The fix is essentially > to replace the sync API with the async one. I recommend you proceed with that refactoring regardless of whether BeginConnect() calls ClassifyLocalWithTables, as based on my experience with all the multi-threaded code in the media playback stack, sync thread dispatch is a source of problems. > 2) Since Bug 1339050 has been fixed, even if we are still using the sync API, > the main thread blocking shouldn't have been 8 seconds. > > However, looking at the call stack [1], I am guessing the profiling result > was > not come out based on nightly build since nsHttpChannel::BeginConnect() > no longer uses nsUrlClassifierDBService::ClassifyLocalWithTables. This makes sense. All of the call sites I have stacks for contain BeginConnect() calling nsUrlClassifierDBService::ClassifyLocalWithTables. I'll mark this as resolved for now, until we get more information.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Flags: needinfo?(cpearce)
Resolution: --- → INVALID
(In reply to Chris Pearce (:cpearce) from comment #3) > (In reply to Henry Chang [:henry][:hchang] from comment #2) > > Hi Chris, > > > > May I know what specific build the report came from? My fixes mentioned in > > comment 1 > > (at least bug 1339050 and nsHttpChannel::BeginConnect()) were landed not > > very long ago > > but should have been landed on nightly. > > > > Thanks :) > > Hi Henry. Sorry, the BHR report dump doesn't include that. We'll get that > for future reports. > > > (In reply to Henry Chang [:henry][:hchang] from comment #1) > > I am going to discuss this topic in two aspects > > > > 1) There is only one last use of > > UrlClassifierDBServiceWorkerProxy::DoLocalLookup > > and I am working on removing that in bug 1345058. In fact, I have > > removed two other (indirect) uses of DoLocalLookup. The fix is essentially > > to replace the sync API with the async one. > > > I recommend you proceed with that refactoring regardless of whether > BeginConnect() calls ClassifyLocalWithTables, as based on my experience with > all the multi-threaded code in the media playback stack, sync thread > dispatch is a source of problems. > UrlClassifierDBServiceWorkerProxy::DoLocalLookup is going to be removed entirely once bug 1345058 is fixed so that it shouldn't be a problem in this case :) Thanks for your suggestion and sharing your experience!
Whiteboard: [qf] → [qf][bhr]
Performance Impact: --- → ?
Whiteboard: [qf][bhr] → [bhr]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: