Closed
Bug 1361627
Opened 8 years ago
Closed 8 years ago
UrlClassifierDBServiceWorkerProxy::DoLocalLookup does sync dispatch to off-main-thread that may take 8 seconds
Categories
(Toolkit :: Safe Browsing, enhancement, P1)
Toolkit
Safe Browsing
Tracking
()
RESOLVED
INVALID
Performance Impact | ? |
People
(Reporter: cpearce, Assigned: hchang)
References
Details
(Whiteboard: [bhr])
Attachments
(1 file)
11.97 KB,
text/plain
|
Details |
Michael Layzell has added Background Hang Reporter telemetry which detects when the Gecko main thread is blocked for at least 8 seconds, and when that happens it reports a pseudo stack of the main thread.
We have 5 reports in the report from 2017-04-29 report [1] that show that we're blocked in UrlClassifierDBServiceWorkerProxy::DoLocalLookup. This does a sync dispatch to another thread from the main thread.
I'd guess the sync thread dispatch is taking longer than the author of the original code realized.
I've attached the pseudo stacks grep'd out of [2] that show how we entered this function.
We should *not* be doing sync thread dispatch, it generally leads to trouble, and especially we should not be blocking the main thread while doing it. Blocking the main thread can jank rendering, JS execution, etc.
Note that synchronizing whatever structure is off-main-thread that you're blocking to access here via a lock can still result in a "hang" here, as whatever causes the 8 second hang here could equally be holding the lock while the main thread is blocked waiting to acquire the lock.
One possible way to fix this would be to maintain a "mirror" of the off-main-thread state on the main thread. We have existing infrastructure to make this easy:
http://searchfox.org/mozilla-central/source/xpcom/threads/StateMirroring.h
[1] https://people-mozilla.org/~mlayzell/bhr/20170429.html
[2] (all data) https://people-mozilla.org/~mlayzell/bhr/20170429/hangs.json
Assignee | ||
Comment 1•8 years ago
|
||
I am going to discuss this topic in two aspects
1) There is only one last use of UrlClassifierDBServiceWorkerProxy::DoLocalLookup
and I am working on removing that in bug 1345058. In fact, I have
removed two other (indirect) uses of DoLocalLookup. The fix is essentially
to replace the sync API with the async one.
2) Since Bug 1339050 has been fixed, even if we are still using the sync API,
the main thread blocking shouldn't have been 8 seconds.
However, looking at the call stack [1], I am guessing the profiling result was
not come out based on nightly build since nsHttpChannel::BeginConnect()
no longer uses nsUrlClassifierDBService::ClassifyLocalWithTables.
[1] https://people-mozilla.org/~mlayzell/bhr/20170429/585.html
[2] http://searchfox.org/mozilla-central/rev/abe68d5dad139e376d5521ca1d4b7892e1e7f1ba/netwerk/protocol/http/nsHttpChannel.cpp#6012
Assignee | ||
Comment 2•8 years ago
|
||
Hi Chris,
May I know what specific build the report came from? My fixes mentioned in comment 1
(at least bug 1339050 and nsHttpChannel::BeginConnect()) were landed not very long ago
but should have been landed on nightly.
Thanks :)
Flags: needinfo?(cpearce)
Updated•8 years ago
|
Assignee: nobody → hchang
Status: NEW → ASSIGNED
Reporter | ||
Comment 3•8 years ago
|
||
(In reply to Henry Chang [:henry][:hchang] from comment #2)
> Hi Chris,
>
> May I know what specific build the report came from? My fixes mentioned in
> comment 1
> (at least bug 1339050 and nsHttpChannel::BeginConnect()) were landed not
> very long ago
> but should have been landed on nightly.
>
> Thanks :)
Hi Henry. Sorry, the BHR report dump doesn't include that. We'll get that for future reports.
(In reply to Henry Chang [:henry][:hchang] from comment #1)
> I am going to discuss this topic in two aspects
>
> 1) There is only one last use of
> UrlClassifierDBServiceWorkerProxy::DoLocalLookup
> and I am working on removing that in bug 1345058. In fact, I have
> removed two other (indirect) uses of DoLocalLookup. The fix is essentially
> to replace the sync API with the async one.
I recommend you proceed with that refactoring regardless of whether BeginConnect() calls ClassifyLocalWithTables, as based on my experience with all the multi-threaded code in the media playback stack, sync thread dispatch is a source of problems.
> 2) Since Bug 1339050 has been fixed, even if we are still using the sync API,
> the main thread blocking shouldn't have been 8 seconds.
>
> However, looking at the call stack [1], I am guessing the profiling result
> was
> not come out based on nightly build since nsHttpChannel::BeginConnect()
> no longer uses nsUrlClassifierDBService::ClassifyLocalWithTables.
This makes sense. All of the call sites I have stacks for contain BeginConnect() calling nsUrlClassifierDBService::ClassifyLocalWithTables. I'll mark this as resolved for now, until we get more information.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Flags: needinfo?(cpearce)
Resolution: --- → INVALID
Assignee | ||
Comment 4•8 years ago
|
||
(In reply to Chris Pearce (:cpearce) from comment #3)
> (In reply to Henry Chang [:henry][:hchang] from comment #2)
> > Hi Chris,
> >
> > May I know what specific build the report came from? My fixes mentioned in
> > comment 1
> > (at least bug 1339050 and nsHttpChannel::BeginConnect()) were landed not
> > very long ago
> > but should have been landed on nightly.
> >
> > Thanks :)
>
> Hi Henry. Sorry, the BHR report dump doesn't include that. We'll get that
> for future reports.
>
>
> (In reply to Henry Chang [:henry][:hchang] from comment #1)
> > I am going to discuss this topic in two aspects
> >
> > 1) There is only one last use of
> > UrlClassifierDBServiceWorkerProxy::DoLocalLookup
> > and I am working on removing that in bug 1345058. In fact, I have
> > removed two other (indirect) uses of DoLocalLookup. The fix is essentially
> > to replace the sync API with the async one.
>
>
> I recommend you proceed with that refactoring regardless of whether
> BeginConnect() calls ClassifyLocalWithTables, as based on my experience with
> all the multi-threaded code in the media playback stack, sync thread
> dispatch is a source of problems.
>
UrlClassifierDBServiceWorkerProxy::DoLocalLookup is going to be removed entirely
once bug 1345058 is fixed so that it shouldn't be a problem in this case :)
Thanks for your suggestion and sharing your experience!
Updated•8 years ago
|
Whiteboard: [qf] → [qf][bhr]
Updated•3 years ago
|
Performance Impact: --- → ?
Whiteboard: [qf][bhr] → [bhr]
You need to log in
before you can comment on or make changes to this bug.
Description
•