<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Reporter

Comment 2

•

9 years ago

(In reply to Gian-Carlo Pascutto [:gcp] from comment #1) > (In reply to :Ehsan Akhgari from comment #0) > > See this profile: https://clptr.io/2jFVXaI > > Francois, Gian-Carlo, do you have any idea what's going on here? Why do we > > need to call ClassifyLocalWithTables after classification is finished? > > If by "after classification is finished" you're referring to the naming of > ClassifyComplete, then take note that "Complete" here stands for "Complete > SHA256 hash" as opposed to "SHA 256 hash prefix", not necessarily the state > of the channel. Oh, I was totally misunderstanding the meaning of the name! Thanks for correcting me. > As far as I remember, the blocking was required because we don't even want > to start opening a connection to a tracking server, and this was understood > as having a clearly measurable pageload regression, which we ignored in > private browsing because skipping the trackers meant we load pages much > faster anyhow. > > I believe the background discussion is in > https://bugzilla.mozilla.org/show_bug.cgi?id=1100024 or related bugs. Hmm, so this is essentially pausing the UI thread on a network request, if I understand bug 1100024 comment 2 correctly. Is that true?

Comment 3

•

9 years ago

(In reply to :Ehsan Akhgari from comment #0) > Francois, Gian-Carlo, do you have any idea what's going on here? Why do we > need to call ClassifyLocalWithTables after classification is finished? Once we find a tracker, we need to apply our entity whitelist to determine whether or not it's actually a third-party tracker (e.g. google-analytics.com on google.com is not considered third-party, thanks to that entity list). That call however is entirely local and doesn't need to go out to the network. I wonder whether it might be blocking because we're in the middle of updating the list and we have locked the whitelist.

Flags: needinfo?(francois)

Reporter

Comment 4

•

9 years ago

(In reply to François Marier [:francois] from comment #3) > (In reply to :Ehsan Akhgari from comment #0) > > Francois, Gian-Carlo, do you have any idea what's going on here? Why do we > > need to call ClassifyLocalWithTables after classification is finished? > > Once we find a tracker, we need to apply our entity whitelist to determine > whether or not it's actually a third-party tracker (e.g. > google-analytics.com on google.com is not considered third-party, thanks to > that entity list). > > That call however is entirely local and doesn't need to go out to the > network. Good news! > I wonder whether it might be blocking because we're in the middle of > updating the list and we have locked the whitelist. That's possible... The profile unfortunately doesn't capture the worker thread so it's not possible to tell what was happening on that thread, but your hypothesis is the only one I can think of. We need to find an alternative solution though, especially if we want to ship bug 1334241 and anything depeneding on it. Is it possible to do the whitelist checking in the background as part of the initial classification? Or can we change things on the Necko side to start the classification before the connection is established, so that we can delay doing that until the TP classification is finished?

Comment 5

•

9 years ago

(In reply to :Ehsan Akhgari from comment #4) > We need to find an alternative solution though, especially if we want to > ship bug 1334241 and anything depeneding on it. Is it possible to do the > whitelist checking in the background as part of the initial classification? > Or can we change things on the Necko side to start the classification before > the connection is established, so that we can delay doing that until the TP > classification is finished? Maybe we could fire off IsTrackerWhitelisted() and set a flag on the channel based on that result just before returning from nsChannelClassifier::StartInternal() (http://searchfox.org/mozilla-central/rev/d20e4431d0cf40311afa797868bc5c58c54790a2/netwerk/base/nsChannelClassifier.cpp#421). We'd have to be careful about a potential race here if nsChannelClassifier::OnClassifyComplete() returns before we're done checking the whitelist. However, I'm not sure this would help. Whatever is causing ClassifyLocalWithTables() to block for 5 seconds(!) is going to stall everything no matter where it's called. Can you think of anything else we could instrument to detect where this might be blocked?

Comment hidden (mozreview-request)

Comment 7

•

9 years ago

(In reply to François Marier [:francois] from comment #6) > Created attachment 8833274 [details] > Bug 1334616 - Add telemetry probe to monitor ClassifyLocalWithTables() perf. Let's add a probe to detect any future performance regressions (or improvements).

Gian-Carlo Pascutto [:gcp]

Reporter

Comment 8

•

9 years ago

(In reply to François Marier [:francois] from comment #5) > (In reply to :Ehsan Akhgari from comment #4) > > We need to find an alternative solution though, especially if we want to > > ship bug 1334241 and anything depeneding on it. Is it possible to do the > > whitelist checking in the background as part of the initial classification? > > Or can we change things on the Necko side to start the classification before > > the connection is established, so that we can delay doing that until the TP > > classification is finished? > > Maybe we could fire off IsTrackerWhitelisted() and set a flag on the channel > based on that result just before returning from > nsChannelClassifier::StartInternal() > (http://searchfox.org/mozilla-central/rev/ > d20e4431d0cf40311afa797868bc5c58c54790a2/netwerk/base/nsChannelClassifier. > cpp#421). We'd have to be careful about a potential race here if > nsChannelClassifier::OnClassifyComplete() returns before we're done checking > the whitelist. > > However, I'm not sure this would help. Whatever is causing > ClassifyLocalWithTables() to block for 5 seconds(!) is going to stall > everything no matter where it's called. Stalling a channel load for 5 seconds is bad, sure, but it's much better than locking the parent process's main thread, which means the entire application will become unresponsive. In general doing synchronous operations like this on the parent process's main thread isn't OK because it's not possible to predict how long the operation takes. > Can you think of anything else we could instrument to detect where this > might be blocked? Unfortunately not. The url-classifier code is completely foreign to me and I don't know much about how it works... :/

Comment 9

•

9 years ago

mozreview-review

Comment on attachment 8833274 [details] Bug 1334616 - Add telemetry probe to monitor ClassifyLocalWithTables() perf. https://reviewboard.mozilla.org/r/109530/#review111168 ::: toolkit/components/telemetry/Histograms.json:3985 (Diff revision 1) > "high": 120000, > "n_buckets": 30, > "bug_numbers": [1315893], > "description": "Time spent per classifier update (ms), keyed by the name of the provider." > }, > + "URLCLASSIFIER_CLWT_TIME": { A less cryptic name please. _CLASSIFYLOCAL_ is OK as that is a trivial wrapper for CLWT. ::: toolkit/components/telemetry/Histograms.json:3989 (Diff revision 1) > }, > + "URLCLASSIFIER_CLWT_TIME": { > + "alert_emails": ["safebrowsing-telemetry@mozilla.org"], > + "expires_in_version": "never", > + "kind": "exponential", > + "high": 5000, 5 seconds is what was reported, it's not necessarily the worst out there. Don't we want to know what is the worst? Given that this is exponential with plenty of buckets, you can make the maximum 30s or a minute.

Attachment #8833274 - Flags: review?(gpascutto) → review+

Assignee

Updated

•

9 years ago

Assignee: nobody → hchang

Comment hidden (mozreview-request)

Pulsebot

Comment 11

•

9 years ago

Pushed by fmarier@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2e9eab35d1ea Add telemetry probe to monitor ClassifyLocalWithTables() perf. r=gcp

Phil Ringnalda (:philor)

Comment 12

•

9 years ago

Backed out in https://hg.mozilla.org/integration/autoland/rev/6db058fe1d63 for https://treeherder.mozilla.org/logviewer.html#?job_id=75042863&repo=autoland

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Updated

•

9 years ago

Comment 14

•

9 years ago

Pushed by fmarier@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d2fafcf20ceb Add telemetry probe to monitor ClassifyLocalWithTables() perf. r=gcp

Ethan Tseng [:ethan]

Updated

•

9 years ago

Status: NEW → ASSIGNED

Priority: -- → P1

Comment 15

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d2fafcf20ceb

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox54: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla54

Georg Fritzsche [:gfritzsche]

Comment 16

•

9 years ago

Reopening because that patch is just the first step, Henry will be working on an actual fix.

Status: RESOLVED → REOPENED

status-firefox54: fixed → ---

Resolution: FIXED → ---

Target Milestone: mozilla54 → ---

Comment 17

•

9 years ago

Was there any data collection review [1] on the new histogram? That is required for all new probes. 1: https://wiki.mozilla.org/Firefox/Data_Collection

Comment 18

•

9 years ago

(In reply to Georg Fritzsche [:gfritzsche] from comment #17) > Was there any data collection review [1] on the new histogram? Benjamin, I meant to ask for your datareview here but forgot to flag you before landing the patch. Does this look good to you? We want to measure this for two reasons: 1. so we can confirm that we have fixed the bug 2. so that we don't have a perf regression in the future

Flags: needinfo?(benjamin)

Assignee

Updated

•

9 years ago

Depends on: 1338017

Benjamin Smedberg

Comment 19

•

9 years ago

Yeah this is fine, the automated alerting should work for this as a scalar time metric.

Flags: needinfo?(benjamin)

Reporter

Updated

•

9 years ago

Blocks: UIJank

Assignee

Comment 20

•

9 years ago

I have a prototype at [1], which I've initially verified can do lookup and update "almost" concurrently. How I verify it is just to add a infinite loop at [2]. If update has to be entirely mutual exclusive to lookup, no channel can be opened once an update request is issued. :Ehsan, could you suggest how to verify if the original issue of this bug is fixed by [1]? We believe Bug 1338017 is "one of" the root causes but are not sure if it's the only one. It looks like you ran a profiler with "privacy.trackingprotection.annotate_channels=true". Is that the only configuration? How often do you see this kind of long delay? Do I have to profile a long while to reproduce or it happens all the time? Thanks for your help in advance :) [1] https://github.com/elefant/gecko-dev/commits/sb-background-db-update [2] http://searchfox.org/mozilla-central/rev/ea31c4b64f34a29415a69fb768f8051495547315/toolkit/components/url-classifier/Classifier.cpp#573

Flags: needinfo?(ehsan)

Reporter

Comment 21

•

9 years ago

(In reply to Henry Chang [:henry][:hchang] from comment #20) > I have a prototype at [1], which I've initially verified can do lookup > and update "almost" concurrently. How I verify it is just to add a infinite > loop at [2]. If update has to be entirely mutual exclusive to lookup, > no channel can be opened once an update request is issued. Fantastic! > :Ehsan, could you suggest how to verify if the original issue > of this bug is fixed by [1]? We believe Bug 1338017 is "one of" > the root causes but are not sure if it's the only one. > > It looks like you ran a profiler with > > "privacy.trackingprotection.annotate_channels=true". Yes, and this pref landed on Nightly as the default in bug 1334241, so you shouldn't need to set any prefs now. > Is that the only configuration? How often do you see this kind of > long delay? Do I have to profile a long while to reproduce or it > happens all the time? Unfortunately I don't know of a way to reproduce the jank in the profile in comment 0. AFAIK we can't even necessarily verify this on Talos since we disable safebrowsing updates on Talos. I think the best that we can do is to land the fix and watch out the telemetry probe as the fix gets to the beta channel and then decide whether this is fast enough or whether more work needs to be done. Sorry I don't have a better answer.

Flags: needinfo?(ehsan)

Assignee

Comment 22

•

8 years ago

Attached patch AsyncWhitelisting.patch (obsolete) — Details — Splinter Review

Assignee

Comment 23

•

8 years ago

Attached patch AsyncWhitelisting.patch — Details — Splinter Review

Attachment #8839422 - Attachment is obsolete: true

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 24

•

8 years ago

(In reply to François Marier [:francois] from comment #5) > (In reply to :Ehsan Akhgari from comment #4) > > We need to find an alternative solution though, especially if we want to > > ship bug 1334241 and anything depeneding on it. Is it possible to do the > > whitelist checking in the background as part of the initial classification? > > Or can we change things on the Necko side to start the classification before > > the connection is established, so that we can delay doing that until the TP > > classification is finished? > > Maybe we could fire off IsTrackerWhitelisted() and set a flag on the channel > based on that result just before returning from > nsChannelClassifier::StartInternal() > (http://searchfox.org/mozilla-central/rev/ > d20e4431d0cf40311afa797868bc5c58c54790a2/netwerk/base/nsChannelClassifier. > cpp#421). We'd have to be careful about a potential race here if > nsChannelClassifier::OnClassifyComplete() returns before we're done checking > the whitelist. > I am still worried about the synchronous operation on the main thread so I experimentally changed the whitelist checking to async in attachment 8839424 [details] [diff] [review]. (Probably I didn't "flip" the result correctly but it should not be far from the answer) Francois, gcp, do you have any concern or comment regarding this kind of approach? I haven't tested the patch yet but wanna know if I am missing something which forces us to do it synchronously.

Flags: needinfo?(gpascutto)

Flags: needinfo?(francois)

Comment 25

•

8 years ago

I don't know enough about nsChannelClassifier to give useful feedback here.

Flags: needinfo?(gpascutto)

Reporter

Comment 27

•

8 years ago

Bug 1334616 has steps to reproduce, and also a profile with the URL Classifier thread included: https://perfht.ml/2miKkrd The URL Classifier thread is busy in HashStore::WriteFile().

Assignee

Comment 28

•

8 years ago

(In reply to Henry Chang [:henry][:hchang] from comment #24) > (In reply to François Marier [:francois] from comment #5) > > (In reply to :Ehsan Akhgari from comment #4) > > > We need to find an alternative solution though, especially if we want to > > > ship bug 1334241 and anything depeneding on it. Is it possible to do the > > > whitelist checking in the background as part of the initial classification? > > > Or can we change things on the Necko side to start the classification before > > > the connection is established, so that we can delay doing that until the TP > > > classification is finished? > > > > Maybe we could fire off IsTrackerWhitelisted() and set a flag on the channel > > based on that result just before returning from > > nsChannelClassifier::StartInternal() > > (http://searchfox.org/mozilla-central/rev/ > > d20e4431d0cf40311afa797868bc5c58c54790a2/netwerk/base/nsChannelClassifier. > > cpp#421). We'd have to be careful about a potential race here if > > nsChannelClassifier::OnClassifyComplete() returns before we're done checking > > the whitelist. > > > > I am still worried about the synchronous operation on the main thread so > I experimentally changed the whitelist checking to async in attachment > 8839424 [details] [diff] [review]. > (Probably I didn't "flip" the result correctly but it should not be far from > the answer) > > Francois, gcp, do you have any concern or comment regarding this kind of > approach? > I haven't tested the patch yet but wanna know if I am missing something which > forces us to do it synchronously. The patch implementation is totally wrong. It will block itself. I'll just file a new bug for this topic and provide revised patch.

Flags: needinfo?(francois)

Assignee

Updated

•

8 years ago

Depends on: 1341506

Reporter

Comment 29

•

8 years ago

(In reply to :Ehsan Akhgari from comment #27) > Bug 1334616 has steps to reproduce Oops, this should have been bug 1325054.

Assignee

Comment 30

•

8 years ago

Hi Ehsan, As Bug 1341506 has landed IsTrackerWhitelisted() becomes a async call. Should we close this bug and continue to work on Bug 1338017 for the background DB update? Thanks :)

Flags: needinfo?(ehsan)