Closed Bug 1516552 Opened 5 years ago Closed 3 years ago

ETP Strict delays the visibility of all site content because of blocked GA/GTM resources

Categories

(Core :: Privacy: Anti-Tracking, defect, P3)

65 Branch
defect

Tracking

()

RESOLVED FIXED
86 Branch
Performance Impact medium
Tracking Status
firefox86 --- fixed

People

(Reporter: vchin, Assigned: mstange)

References

(Blocks 2 open bugs)

Details

(Keywords: perf:pageload)

Attachments

(1 file)

STR:
1. Nagivate to www.theverge.com

Profile:
https://perfht.ml/2SqkKB2

On MacOSX Nightly 66.0a1 (2018-12-23) (64-bit) it takes a long time 4.5s for the first contentful paint to happen. Compared to Chrome which only takes 1 second. The profile shows the snippet when enter is hit to navigate to the page to when contentful paint happens. It seems we're idle for a good chunk of that time?
Opening against layout for now. dholbert, thoughts?
Flags: needinfo?(dholbert)
Whiteboard: [qf:p1:pageload]
I don't see layout or styling taking any significant amount of time before that first paint, so this probably belongs in a different component.

A few observations:
 - the compositor process seems pretty continuously busy from ~3.5s to ~7s, even though nothing's been painted to the screen.
 - the content process is generally pretty quiet, but has a long paint at around 8s (a rasterize of 150ms in duration), and three kinda-long paints (65-77ms) earlier on.
 - the parent process seems to be spending a lot of time with "paint" in the backtrace, too.

This probably belongs in Graphics (or Graphics::Layers) since painting is what I'm seeing as the time-consuming thing here, and the compositor process churn seems maybe-worrisome.

--> Punting to Bas and mstange (who's away for a few weeks, hence also Bas in case he can get to it sooner.)
Component: Layout → Graphics
Flags: needinfo?(dholbert) → needinfo?(bas)
Flags: needinfo?(mstange)
(In reply to Daniel Holbert [:dholbert] from comment #2)
> I don't see layout or styling taking any significant amount of time before
> that first paint, so this probably belongs in a different component.
> 
> A few observations:
>  - the compositor process seems pretty continuously busy from ~3.5s to ~7s,
> even though nothing's been painted to the screen.
>  - the content process is generally pretty quiet, but has a long paint at
> around 8s (a rasterize of 150ms in duration), and three kinda-long paints
> (65-77ms) earlier on.
>  - the parent process seems to be spending a lot of time with "paint" in the
> backtrace, too.
> 
> This probably belongs in Graphics (or Graphics::Layers) since painting is
> what I'm seeing as the time-consuming thing here, and the compositor process
> churn seems maybe-worrisome.
> 
> --> Punting to Bas and mstange (who's away for a few weeks, hence also Bas
> in case he can get to it sooner.)

The compositor thread (there's no GPU process since this is OS X) is just busy re-uploading stuff here that's sent there in rapid transactions, this is also the work seen in the parent thread. We know uploading to be slow on OS X and it certainly isn't great, but that's -not- the reason for the long wait here.

If you look at the screenshots there is also -nothing showing- for all these paints that we're doing.

The long rasterizes, again, aren't great, but not wholly unexpected, if you look at what it's doing, it's basically spending 96% of its time clearing surfaces for newly created layers, perhaps the layerization here is suboptimal, which again, would be a layout issue. There certainly isn't anything left to do here for Gfx, it's basically a memset. I actually assume this is mostly page-faulting.

What's really concerning here though is that we spend several seconds just displaying black, no new obvious content comes in over the network, nor is any real work done in JS, and then all of a sudden a paint is generated that actually -does- draw the page, which seems to point at some sort of scheduling or invalidation issue. Again pointing towards something layout, or DOM-ish.

In short, yes, graphics takes some time here, but that time doesn't appear to be graphics' fault, nor is it the biggest concern here. It's the time we're -not- doing anything that's concerning.
Flags: needinfo?(bas) → needinfo?(dholbert)
Agreed that that the wait is the concerning bit. Not sure what we can do to investigate that -- mstange, do you know?

Personally I'm on Linux & I can't really reproduce this -- theverge.com starts painting content in ~1 second or less in Firefox Nightly on my linux box.  (Chrome feels slightly faster, but not drastically so.)

Bas, can you reproduce? (IIRC you're on Windows -- does it happen there?)

Vicky, does this happen every time for you? (even in a fresh profile with nothing much else running & no other tabs open? [just to rule out sources of background-CPU-usage during the "dark time" in the profile)
Flags: needinfo?(dholbert)
Flags: needinfo?(vchin)
(In reply to Daniel Holbert [:dholbert] from comment #4)
> Agreed that that the wait is the concerning bit. Not sure what we can do to
> investigate that -- mstange, do you know?
> 
> Personally I'm on Linux & I can't really reproduce this -- theverge.com
> starts painting content in ~1 second or less in Firefox Nightly on my linux
> box.  (Chrome feels slightly faster, but not drastically so.)
> 
> Bas, can you reproduce? (IIRC you're on Windows -- does it happen there?)
> 
> Vicky, does this happen every time for you? (even in a fresh profile with
> nothing much else running & no other tabs open? [just to rule out sources of
> background-CPU-usage during the "dark time" in the profile)

I don't have screenshots on Windows sadly so it's hard for me to tell how bad it is here, but indeed it doesn't seem to take more than 2 seconds. What I can tell as a difference between Firefox and Chrome, is that Firefox definitely draws 'all black' first, and then the actual content in some subsequent frame. Chrome draws all the content right away.

Chrome also -does- draw the page considerably faster on my machine (even though we're nowhere near as slow for me as described on the bug).

I'd say considering the compositor and content process main thread happily being scheduled here, as well as no gaps in profiler samples. Background CPU usage is highly unlikely. It could be, although unlikely, we're waiting for some other thread in Firefox I suppose.
I still get the 2 seconds of black screen on a new profile. It loads slightly faster on a new profile but it could be because I had the profiler running.
Flags: needinfo?(vchin)
(In reply to Bas Schouten (:bas.schouten) from comment #5)
> What I can tell as a difference between Firefox and Chrome, is that Firefox
> definitely draws 'all black' first, and then the actual content in some
> subsequent frame. Chrome draws all the content right away.

On my linux machine, Chrome draws "all black" first, just like Firefox does, FWIW.
OK, so I think content blocking may be involved here.

Specifically, starting with a fresh profile...
 (1) If I visit about:preferences#privacy and change Content Blocking to "strict", then I get a nearly 4-second blank black page before the first contentful paint.

 (2) If I reset Content Blocking to "Standard" or disable it entirely, then I see a ~0.5 second blank black page before the first contentful paint.

 (3) If I install uBlock origin, then the first contentful paint happens "instantly" - I don't see any black blank page at all.
Here's a profile of scenario (1), with strict content blocking enabled: https://perfht.ml/2VeYFHC
The page starts loading at 1 second (and paints a blank black background pretty quickly), and the content doesn't paint until ~5.5 seconds.

If I look at the DOM during the "blank page" time, I can see that they have:
 <html class="async-hide fonts-loaded">
...where .async-hide is an inline style rule which applies "opacity: 0 !important"

I also see this script in their HTML:
(function(a,s,y,n,c,h,i,d,e){s.className+=' '+y;h.start=1*new Date;
h.end=i=function(){s.className=s.className.replace(RegExp(' ?'+y),'')};
(a[n]=a[n]||[]).hide=h;setTimeout(function(){i();h.end=null},c);h.timeout=c;
})(window,document.documentElement,'async-hide','dataLayer',4000,
{'GTM-54FC4VZ':true});

It's somewhat obfuscated, but given the observed behavior, I'm guessing that JS adds this class (hiding all the content), and then waits 4 seconds before removing it. They probably have some other scripts that remove it sooner in the usual case, but they fallback to a setTimeout(...,4000) to be sure their content eventually shows up.  And perhaps Firefox's strict content blocking causes that "usual case" to break, by removing some link in their expected chain that leads up to removing the async-hide class.

So that explains why the page is dark for 4 seconds, probably; there's nothing to paint.  So the question is why Vicky is hitting that scenario.  Vicky, do you get better results (less than 4 seconds) if you disable content blocking? (by clicking the left edge of the URL bar and choosing "Turn off blocking for this site")
Flags: needinfo?(vchin)
This sounds like the artificial delay introduced by Google APIs and blocking `googletagmanager.com` causes the delay to take longer.

I can reproduce the delay if blocking `googletagmanager.com` with uMatrix while allowing first-party scripts to execute. The delay is a commonly reported issue when blocking `googletagmanager.com`, see https://github.com/uBlockOrigin/uMatrix-issues/issues/93#issuecomment-449982859
Thanks!  I do see references to googletagmanager.com in a saved snapshot of the page. And my observations (class name & timeout duration) in comment 9 seem to jive with this comment on the uBlockOrigin github issue:
> Google's optimisation was async-hide's opacity set with a delay of 4 seconds, wasn't it ?
https://github.com/uBlockOrigin/uMatrix-issues/issues/93#issuecomment-449980342

So, perhaps we should investigate if our "standard" content-blocking configuration (on Vicky's machine at least) is blocking that domain for some reason, & if we can suppress that from happening, or something. [EDIT: Turns out Vicky was only seeing the issue with "custom" blocking that was similar to "strict", per comment 13.]
Summary: Long wait 4.5s before contentful paint on theverge.com → Long wait 4.5s before contentful paint on theverge.com (due to their usage of "async-hide" class which they remove after some delay)
This issue was based on an issue originally reported by Ekr. This suggests that at least for this site it's relatively widespread. I had no idea this class of issues existed but it certainly seems worrying to me, especially since as far as I can tell we have no telemetry or any other data from out in the world to tell us how prevalent this is, and it will cause a significant difference in perceived page load performance compared to Chrome.

NI'ing bz, not because I expect him to act on this but I want to make sure we're aware of this 'class' of performance issue.
Flags: needinfo?(bzbarsky)
:dholbert is right. I thought I changed to standard content blocking but I was on custom blocking that was equivalent to strict blocking. The "blocking trackers in all windows" is what causes the long black screen paint. The content blocking list used in all windows is Disconnect.me basic protection. If i change that to only be in private windows then I only see a split second of black before the rest of the page paints.
Component: Graphics → Tracking Protection
Flags: needinfo?(vchin)
Product: Core → Firefox
Thanks, Vicky. It sounds like this isn't a problem in our default configuration, then -- only users that have opted in to stricter content-blocking.

It's still worth investigating what we can do here, though there's potentially a small number of (privacy/perf-conscious) users who may be affected - not our whole userbase.
(In reply to Daniel Holbert [:dholbert] from comment #9)
> Here's a profile of scenario (1), with strict content blocking enabled:
> https://perfht.ml/2VeYFHC
> The page starts loading at 1 second (and paints a blank black background
> pretty quickly), and the content doesn't paint until ~5.5 seconds.
> 
> If I look at the DOM during the "blank page" time, I can see that they have:
>  <html class="async-hide fonts-loaded">
> ...where .async-hide is an inline style rule which applies "opacity: 0
> !important"
> 
> I also see this script in their HTML:
> (function(a,s,y,n,c,h,i,d,e){s.className+=' '+y;h.start=1*new Date;
> h.end=i=function(){s.className=s.className.replace(RegExp(' ?'+y),'')};
> (a[n]=a[n]||[]).hide=h;setTimeout(function(){i();h.end=null},c);h.timeout=c;
> })(window,document.documentElement,'async-hide','dataLayer',4000,
> {'GTM-54FC4VZ':true});
> 
> It's somewhat obfuscated, but given the observed behavior, I'm guessing that
> JS adds this class (hiding all the content), and then waits 4 seconds before
> removing it. They probably have some other scripts that remove it sooner in
> the usual case, but they fallback to a setTimeout(...,4000) to be sure their
> content eventually shows up.  And perhaps Firefox's strict content blocking
> causes that "usual case" to break, by removing some link in their expected
> chain that leads up to removing the async-hide class.
> 
> So that explains why the page is dark for 4 seconds, probably; there's
> nothing to paint.  So the question is why Vicky is hitting that scenario. 
> Vicky, do you get better results (less than 4 seconds) if you disable
> content blocking? (by clicking the left edge of the URL bar and choosing
> "Turn off blocking for this site")

This is a known issue, it's caused by the interaction of Google Optimize and Google Analytics/Tag Manager (which we block using tracking protection), see https://bugzilla.mozilla.org/show_bug.cgi?id=1484713#c18 (note that bug 1484713 itself is about that issue, but in that issue I described what's going on.  There are other reports of similar issues on other sites on Bugzilla as well.

The solution to this bug would be to provide a surrogate replacement for the GA/GTM scripts, similar to privacy extensions like uBlock Origin and Privacy Badger.  We've talked about this in the anti-tracking project but looking at Bugzilla I see that there is no bug on file for it yet.  Should I repurpose this bug?  (If not, we should file another one and make it depend on it, although this is going to be a dupe of that one effectively.)

(In reply to Daniel Holbert [:dholbert] from comment #14)
> Thanks, Vicky. It sounds like this isn't a problem in our default
> configuration, then -- only users that have opted in to stricter
> content-blocking.

That's not exactly right, this is very much the default configuration of Firefox in private browsing, and has been so since Firefox 42, so it is a problem that users are facing right now.
BTW in that bug you can find two other sites affected: spiegel.de and kino.de.  Any site using GA/GTM + Google Optimize will be similarly affected right now in Private Browsing mode.
Flags: needinfo?(mstange)
Flags: needinfo?(bzbarsky)
Summary: Long wait 4.5s before contentful paint on theverge.com (due to their usage of "async-hide" class which they remove after some delay) → Add support for surrogate replacements for resources that are commonly blocked by tracking protection
Component: Tracking Protection → Privacy: Anti-Tracking
Product: Firefox → Core

Not that I know of, please file one!

Filed bug 1533102

See Also: → 1540917
See Also: 1540917
See Also: → 1561450
See Also: 1561450
Whiteboard: [qf:p1:pageload] → [qf:p2:pageload]
Blocks: tp-breakage
Type: defect → enhancement
See Also: → 1618108
Depends on: 1637329

This bug is simultaneously serving as a tracking bug for GA/GTM blocking issues and all of our implementation work for surrogate/shim script replacement. I've found this pretty confusing, and I suspect anyone new to the project would also find this confusing. I've decided to repurpose this bug back to focusing on the GTM/GA first paint breakage and have filed Bug 1648863 to track our overall shim script implementation work (which will address many types of breakage, not just GTM/GA).

Type: enhancement → defect
Summary: Add support for surrogate replacements for resources that are commonly blocked by tracking protection → ETP Strict delays the visibility of all site content because of blocked GA/GTM resources

I'm going to look into this. We have the ability to shim resources now, and we even already have a relevant shim at https://searchfox.org/mozilla-central/source/browser/extensions/webcompat/shims/google-analytics-tag-manager.js , but for some reason it's not working properly.

Assignee: nobody → mstange.moz
Status: NEW → ASSIGNED

The problem seems to be that both https://searchfox.org/mozilla-central/source/browser/extensions/webcompat/shims/google-analytics-tag-manager.js and https://searchfox.org/mozilla-central/source/browser/extensions/webcompat/shims/google-analytics.js only call window.dataLayer.hide.end() if window.ga is not defined. But on the pages that I've tested (theverge.com and t-mobile.cz), window.ga is already defined by another inline script in the main html page.
The uBlock Origin shim calls window.dataLayer.hide.end() even if window.ga is already defined.

This also removes the queue loop at the end, because as far as I can tell that code was never run.
The loop was only entered if gaQueue was falsy (check in the first line), and if gaQueue was a
function (check just before the loop), which is unlikely to be true at the same time.

The uBlock Origin loop does have that loop, but it doesn't call the original function; the loop
calls the shim function, because uBlock Origin replaces window.ga with its shim function even
if window.ga is already defined.

I have checked all the duplicates of this bug and the patch seems to fix all of them.

Pushed by twisniewski@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b2473d585fff
In the Google Analytics shim, call window.dataLayer.hide.end() even if window.ga is already defined. r=twisniewski,webcompat-reviewers
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 86 Branch
Performance Impact: --- → P2
Keywords: perf:pageload
Whiteboard: [qf:p2:pageload]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: