2.95 - 41.01% tp6-amazon / tp6-facebook / tp6-imdb / tp6-microsoft / tp6-wikia / tp6-yandex regression on push 5d8e428324c662d96fd47677733880ae546266d4 (Sun Dec 23 2018)
Categories
(Core :: Privacy: Anti-Tracking, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox67 | --- | wontfix |
firefox68 | --- | fixed |
People
(Reporter: jmaher, Assigned: ehsan.akhgari)
References
Details
(Keywords: perf, regression, Whiteboard: [privacy65][anti-tracking])
Reporter | ||
Comment 1•6 years ago
|
||
Assignee | ||
Comment 2•6 years ago
|
||
Reporter | ||
Comment 3•6 years ago
|
||
Assignee | ||
Comment 4•6 years ago
|
||
Reporter | ||
Comment 5•6 years ago
|
||
Assignee | ||
Comment 6•6 years ago
|
||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 7•6 years ago
|
||
Assignee | ||
Comment 8•6 years ago
|
||
Reporter | ||
Comment 9•6 years ago
|
||
Reporter | ||
Comment 10•6 years ago
|
||
Assignee | ||
Comment 11•6 years ago
|
||
Assignee | ||
Comment 12•6 years ago
|
||
Here is a status update on what I've tried so far.
Bug 1514340 makes things a bit better, for example we no longer need to encode the certificate along the notifications we send to the parent process, but the JS code we execute still shows up in the profile. :-(
In order to try to mitigate the impact of that a little bit, I tried to delay the dispatching of OnContentBlockingEvent() notifications a bit in the nsBrowserStatusFilter object similar to how status and progress notifications are delay-batched there, but sadly that approach isn't going to work either, for two reasons, a) the OnContentBlockingEvent() actually depends on its nsIWebProgress* argument, so we can't get away with just passing null instead of it, and if we want to do this properly we would need to have some rather complex logic in order to batch notification on a per-nsIWebProgress object. But more importantly, b) the profiles I was looking at actually don't dispatch enough of these notifications for the batching to be really effective, even with my quick and dirty attempt which was batching any notification no matter which nsIWebProgress/nsIRequest object it was dispatched for. The best results I was getting was that I could suppress just one of these notifications, so this wasn't a super useful optimization anyway.
This leaves us with bug 1510569 basically, which is a humongous amount of work... So I am trying to come up with a strategy to see if I can avoid doing all of that work. My hope is that now that OnContentBlockingEvent is a separate nsIWebProgressListener notification, we can only port that one to C++ and leave the rest for someone else to fix...
Updated•6 years ago
|
Assignee | ||
Comment 13•6 years ago
|
||
One case that this regresses performance is when a page has a ton of images. This simple test case loads 10,000 images: https://pastoral-salesman-1.glitch.me/
The profiles of that test case easily show the impact of bug 1518491.
Assignee | ||
Comment 14•6 years ago
|
||
I think I finally uncovered the underlying reason behind the regressions we are seeing here. Interestingly, I was lead down an incorrect path all along since the beginning of my investigations until Friday when I first started to pay attention to which network resources are loaded in the raptor tests with network.cookie.cookieBehavior set to 0 and 4. It turns out that the regressions are caused because the pages affected load a different set of resources based on the configuration:
raptor-tp6-microsoft-firefox
cookieBehavior = 0
507 requests
cookieBehavior = 1
538 requests
cookieBehavior = 4
630 requests
raptor-tp6-yandex-firefox
cookieBehavior = 0
84 requests
cookieBehavior = 1
84 requests
cookieBehavior = 4
85 requests
raptor-tp6-imdb-firefox
cookieBehavior = 0
65 requests
cookieBehavior = 1
65 requests
cookieBehavior = 4
65 requests
raptor-tp6-wikia-firefox
cookieBehavior = 0
63 requests
cookieBehavior = 1
64 requests
cookieBehavior = 4
63 requests
raptor-tp6-facebook-firefox
cookieBehavior = 0
94 requests
cookieBehavior = 1
94 requests
cookieBehavior = 4
94 requests
raptor-tp6-amazon-firefox
cookieBehavior = 0
147 requests
cookieBehavior = 1
147-163 requests
cookieBehavior = 4
153 requests
I captured these numbers using devtools by manually running ./mach raptor-test locally. As can be seen, in some of the tests the behaviour of the test in terms of the number of resources loaded depends on whether third-party cookies are blocked and exactly how. The other notable difference was that in some tests there were a lot of 404 requests that were being generated (requests that did not have a recorded response) but looking at the URLs of the resources captured also showed that pages may actually load different resources with the different values of the prefs.
For example, profiling the live yandex.ru site with cookieBehavior set to 0 (https://perfht.ml/2RqlWIe) vs cookieBehavior set to 4 (https://perfht.ml/2RqlYjk) shows an example of the site loading different resources during its page load in the two modes.
The take away from this is that while this bug prompted investigation into the performance of this feature which was very valuable, the raptor regressions here are a combination of things caused by the browser processing a different workload as well as being a little bit less efficient due to the dependencies captured here. Fixing all of the dependencies shouldn't be expected to fix the regressions captured here since the pages being tested can and do change their behavior when their third-party tracking resources can access storage vs when they cannot. That being said, there is still more work that I would like to do here in order to finish all of the performance improvements I was planning which have been uncovered here.
Assignee | ||
Comment 15•6 years ago
•
|
||
So I have some more interesting updates to share here.
Firstly, this is where we are currently standing, with all of the fixes that have landed so far: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=1b73a3b810a4b8edab6451c106a866429fc59d95&newProject=try&newRevision=15313fe8f75a63db2264ea8391c8f9342c6bb7a5&framework=10&showOnlyImportant=1 Things are still not looking good. Although I suspect bug 1270059 has sort of changed the baselines here quite significantly. I'm not actually sure how meaningful it is to compare the geometric means of the subtests before/after that change any more.
I decided to follow an approach of eliminating possible culprits, by pushing trial patches to the try server that progressively disabled code that is hidden behind the checks for network.cookie.cookieBehavior being equal to 4, especially now that I discovered bug 1523705.
Look and behold. This patch https://hg.mozilla.org/try/rev/80f83ae8e6284dd7c71ac6772c92a06697f43e1f erases all of the regressions: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=1b73a3b810a4b8edab6451c106a866429fc59d95&newProject=try&newRevision=a08c3e5f54a7c14017a1c259181496ecb34666c0&framework=10&showOnlyImportant=1. IOW, all of the regressions are coming from loading images in this codepath: https://searchfox.org/mozilla-central/rev/78cd247b5d7a08832f87d786541d3e2204842e8e/toolkit/components/antitracking/AntiTrackingCommon.cpp#1341-1401
Assignee | ||
Comment 16•6 years ago
|
||
http://bit.ly/2RYnF82 should be a profile that captures the issue in comment 15 (taken from this test case: https://pastoral-salesman-1.glitch.me/. Beware that the test case loads 1 million images!)
Comment 17•6 years ago
|
||
Thanks for this debugging. Today, I want to investigate why we have such horrible regression in image loading with cookie behavior == 4.
Comment 18•6 years ago
|
||
This created a talos regresion also:
== Change summary for alert #19108 (as of Fri, 01 Feb 2019 11:35:53 GMT) ==
Regressions:
2% tp5o_webext responsiveness linux64 opt e10s stylo 1.44 -> 1.46
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=19108
Assignee | ||
Comment 19•6 years ago
|
||
(In reply to Florin Strugariu [:Bebe] from comment #18)
This created a talos regresion also:
== Change summary for alert #19108 (as of Fri, 01 Feb 2019 11:35:53 GMT) ==Regressions:
2% tp5o_webext responsiveness linux64 opt e10s stylo 1.44 -> 1.46
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=19108
Isn't that related to bug 1522630?
Comment 20•6 years ago
|
||
(In reply to :Ehsan Akhgari from comment #19)
(In reply to Florin Strugariu [:Bebe] from comment #18)
This created a talos regresion also:
== Change summary for alert #19108 (as of Fri, 01 Feb 2019 11:35:53 GMT) ==Regressions:
2% tp5o_webext responsiveness linux64 opt e10s stylo 1.44 -> 1.46
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=19108
Isn't that related to bug 1522630?
You are right, it is actually related to bug 1522630.
Assignee | ||
Comment 21•6 years ago
|
||
This profile https://perfht.ml/2Sr96Jl is captured from a build after the patches in bug 1525208.
Assignee | ||
Comment 22•6 years ago
|
||
This profile https://perfht.ml/2D6AKBC is captured from a build after the patch in bug 1525356. Now you can see that nsContentUtils::IsThirdPartyWindowOrChannel() is no longer dominating the time spent under AntiTrackingCommon::MaybeIsFirstPartyStorageAccessGrantedFor().
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Comment 23•6 years ago
|
||
This profile https://perfht.ml/2DbZzfF is captured from a build after the patches in bug 1525502. Now you can see that CheckContentBlockingAllowList() is almost eliminated in this profile. Of course this is because of the specific nature of this test case which is a best case for this type of cache.
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=878491318f3e17f19dc0500046993c2110bad886&newProject=try&newRevision=9de01b6ce5c2dd3180a04386dade7b2bb2599b54&framework=10 shows the cumulative impact of all of the optimizations from comment 21 up to here and as you can see we still haven't managed to make a sufficient dent yet. :-(
Updated•6 years ago
|
Assignee | ||
Comment 24•6 years ago
|
||
This regression will finally be completely fixed in bug 1548349.
Assignee | ||
Comment 25•6 years ago
|
||
Fixed by bug 1548349.
Updated•6 years ago
|
Updated•5 years ago
|
Description
•