Closed Bug 837166 Opened 11 years ago Closed 11 years ago

Move [Google Analytics] tag from footer to header in [SUMO]

Categories

(support.mozilla.org :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX
2013Q1

People

(Reporter: cmore, Assigned: rehandalal+mozilla)

Details

(Whiteboard: u=sumo-team c=general p=1)

The Google Analytics code on SUMO needs to be moved from the footer to the header so that GA has the ability to measure page speed and DOM timings.

Currently the code is in here:

//support.cdn.mozilla.net/media/js/common-min.js

It needs to be moved to somewhere in the <head> and preferably just before the closing </head>. This will not have an impact on site performance since the new GA tag is asynchronous.

You can either have an additional .js file referenced from the <head> or just place the code directly in the base template just before the closing </head>. It may make sense to just have the code in the base template to reduce an additional server call.
Placing the code directly in the template is undesirable because in the future we might switch to using CSP, and then inline scripts would not be allowed. This is why we bundled it with the rest of the JS to begin with.
(In reply to Mike Cooper [:mythmon] from comment #1)
> Placing the code directly in the template is undesirable because in the
> future we might switch to using CSP, and then inline scripts would not be
> allowed. This is why we bundled it with the rest of the JS to begin with.

wfm. :)

If you want, you could also make the GA UA account ID to be variable in settings.py instead of hard-coded in the snippet.
We are using jQuery to grab some custom variables to pass along. So this isn't as easy as it seems. It would require other changes to how we are doing custom variables and events.
Also, since it is asynchronous, I don't understand how that will help it do timings that it can't do now.
The accuracy of the data is lower when it is not in the <head>, but it will still work regardless of where it is. The following is a few answers on the location rationale:

"Insert the asynchronous snippet at the bottom of the <head> section of your pages, after any other scripts your page or template might use.

One of the main advantages of the asynchronous snippet is that you can position it at the top of the HTML document. This increases the likelihood that the tracking beacon will be sent before the user leaves the page. We've determined that on most pages, the optimal location for the asynchronous snippet is at the bottom of the <head> section, just before the closing </head> tag. "

---

"Google explicitly recommends placing the new asynchronous code in the <head>. The new asynchoronous code is non-blocking in two ways. First, it queues up the variables for the page in a global _gaq variable. That way, the data is prepared either way.

Then using javascript directly to write out the script as in the new async code is non-blocking (this direct inject method is the way to achieve asynchronous-ness, even in browsers that don't directly observe the async attribute). The rest of the site can continue to load if for some reason Google's servers are down or slow to respond. And that's only if the user doesn't have ga.js cached already, as many do, since ga.js is used on many, many popular websites.

The benefit of all this is that the earlier the ga.js loads and is able to transmit the _gaq object to Google, the more likely you'll be to capture ALL of your potential data, like the data of the users who click very quickly on your page. This is particularly important for 'big' websites that tend to have lots of regular users who follow quick-clicking habits."
Ricky, can you sketch out what we'd need to do in order to serve this from within <head>?
Priority: -- → P3
Whiteboard: u=sumo-team c=general p= s=2013.3
Target Milestone: --- → 2013Q1
This is a simple change, but I'm still not convinced it will be beneficial. It will increase load time of the page, so it might make the experience worse, but it might increase the amount of data we collect.

Unfortunately the only way to determine if this is a good change to make is to actually make the change. Kadir is going to discuss what we need to do in order to decide if this was the right change with Ibai.

Once we know we are ready to test this, it is a small change => 1pt.
Whiteboard: u=sumo-team c=general p= s=2013.3 → u=sumo-team c=general p=1 s=2013.3
(In reply to Mike Cooper [:mythmon] from comment #7)
> This is a simple change, but I'm still not convinced it will be beneficial.
> It will increase load time of the page, so it might make the experience
> worse, but it might increase the amount of data we collect.
> 
> Unfortunately the only way to determine if this is a good change to make is
> to actually make the change. Kadir is going to discuss what we need to do in
> order to decide if this was the right change with Ibai.
> 
> Once we know we are ready to test this, it is a small change => 1pt.

In the relationship to the UX of the page, how will this increase the load time of the page?
This will add an extra http request that must happen before the page will render (to load the GA bootstrap). This is the code that would normally be placed in a script tag, but we can't put a script tag directly on the page. Even if it were in a script tag, instead of an external file, it would need to be parsed before the page could render. I don't know how big of an impact this will be. It might not be enough to worry about, but I am concerned about regions that are already loading SUMO slowly such as Germany and Brazil. The only way to be sure it is to do it.

As long as we are ready to analyze how this affects the data (ie: how beneficial is this), and ready to analyze how this affects page speed, I am ok to make the change with the option of rolling it back.
(In reply to Mike Cooper [:mythmon] from comment #9)
> This will add an extra http request that must happen before the page will
> render (to load the GA bootstrap). This is the code that would normally be
> placed in a script tag, but we can't put a script tag directly on the page.
> Even if it were in a script tag, instead of an external file, it would need
> to be parsed before the page could render. I don't know how big of an impact
> this will be. It might not be enough to worry about, but I am concerned
> about regions that are already loading SUMO slowly such as Germany and
> Brazil. The only way to be sure it is to do it.
> 
> As long as we are ready to analyze how this affects the data (ie: how
> beneficial is this), and ready to analyze how this affects page speed, I am
> ok to make the change with the option of rolling it back.

Yes, there will be an extra HTTP request, but the rest of the http page's requests and DOM rendering will not be tied to the final execution of the GA request. The GA request is asynchronous now unlike webtrends and previous non-asynchronous GA tags and thus it can continue to do load and execute without holding up the page itself. The ga.js file is about ~41k.
So, I have some custom code for passing customVariables that grabs data from the <body/> so it can't really run in the <head/>. It also depends on jquery right now.

var extraPush = $('body').data('ga-push');

// Add any extra pushes from body[data-ga-push].
if (extraPush && extraPush.length) {
  for (var i = 0, l = extraPush.length; i < l; i++) {
    _gaq.push(extraPush[i]);
  }
}


We'd have to figure a way of reimplenting that somehow.
OS: Mac OS X → All
Hardware: x86 → All
Okay, let's use this sprint to figure out how to make this happen, and what effort would need to go into it.

The rationale from my side for going forward with this is: We don't know what we don't know. It might be that we have a lot more bounces than we are seeing today and that could effect our decisions on optimizing Kitsune or the infrastructure Kitsune is running on. Since we'll see more usage of SUMO on mobile OSes, which are generally slower than desktop computers the assumption is that whatever data we are missing on desktop  - because the GA script is loaded so late - will be worse on mobile.
Assignee: nobody → rdalal
I would love love love not to do this.

We've implemented customVariables in a way that is incompatible with this change. We'd have to change that implementation and verify everything again.

I would argue we are collecting more than enough data. If there is something that we specifically need to measure that we can't do accurately right now, please let me know. There are probably better ways.
(In reply to Chris More [:cmore] from comment #10)
> Yes, there will be an extra HTTP request, but the rest of the http page's
> requests and DOM rendering will not be tied to the final execution of the GA
> request. The GA request is asynchronous now unlike webtrends and previous
> non-asynchronous GA tags and thus it can continue to do load and execute
> without holding up the page itself. The ga.js file is about ~41k.

I don't think this has been made clear, because there seems to be some conflation of two different things.

The GA snippet, the inline JS, creates an async script tag. Everything in the Google answers are in reference to this secondary JS, not the primary, inline script tag.

To label this a little better, let's run through very quickly what happens in the "normal" case that Google is talking about:

1. Browser starts downloading and parsing HTML document.
2. Browser reaches <script>...</script> tag (the inline snippet, let's call this ScriptA) in <head>.
3. Browser pauses DOM construction, parses, and executes the contents of the ScriptA, i.e.: ScriptA is parsed synchronously.
4. ScriptA creates a global variable (_gaq) and then creates a new <script> element (let's call this ScriptB) that that *does* have the async attribute. As soon as it is added to the DOM (synchronously by ScriptA), ScriptB starts downloading asynchronously.
5. Sometime before onload, ScriptB is parsed and executed.

Chris is talking about ScriptB in comment 10. Google has absolutely no recommendations for how to include the ScriptA content in an external script.

So the issue is, CSP doesn't allow us to have ScriptA inline without sacrificing the major benefit of CSP. (This is not unique to SUMO, this is something that all our sites will need to solve.) Inlining ScriptA is off the table, so lots of the Google answers and recommendations will not apply.

I did a few tests locally (happy to share the code) with a multi-threaded web server. I included a significant stylesheet and a dozen or so images, plus various external scripts.


Results
=======

Starting with ScriptA above </head>:
- With the async attribute, ScriptB executes last on the page.
- Without the async attribute, ScriptB blocks all the content below it.

With ScriptA above </body>:
- With the async attribute, ScriptB executes last.
- Without the async attribute, ScriptB still executes last.


TL;DR
=====

Either we can block the page content to load ScriptA and get ScriptB to execute sooner, but unless we block on ScriptA, ScriptB will run later.

The fastest we can make ScriptB execute without blocking is to put ScriptA in an IIFE at the top of a script *without* the async attribute at the bottom of the page.

(In reply to Ricky Rosario [:rrosario, :r1cky] from comment #11)
> var extraPush = $('body').data('ga-push');
> 
> // Add any extra pushes from body[data-ga-push].
> if (extraPush && extraPush.length) {
>   for (var i = 0, l = extraPush.length; i < l; i++) {
>     _gaq.push(extraPush[i]);
>   }
> }

However, this does *not* need to run before ScriptB.

When ScriptB executes, it runs all the commands in the _gaq array, and then replaces _gaq with an Analytics object which has its own .push method[1,2]. tl;dr - you can continue pushing commands to _gaq forever, it's fine. (This is actually how you're supposed to implement analytics for something like pjax.)

[1] http://stackoverflow.com/questions/2382428/when-and-how-does-asynchronous-google-analytics-execute/2386985#2386985
[2] http://stackoverflow.com/questions/6418100/is-gaq-push-google-analytics-really-asynchronous


James' Take
===========

I'm not OK putting a blocking ScriptA into the <head>, or inlining ScriptA in the head. From these tests, the best we can do is concat ScriptA at the top of a script file above </body>. (Technically, without async is probably faster, but at that point it doesn't make a ton of difference.)
It's been almost a week with no response here. This bug is not in a state where we can code something, so I'm bumping it out of the sprint.
Whiteboard: u=sumo-team c=general p=1 s=2013.3 → u=sumo-team c=general p=1 s=2013.backlog
Since there hasn't been any response to my or Ricky's comments, I propose we WONTFIX this. Given the people out this week, I'm not going to do that right away, though.
I'm fine with whatever all of you want to do. We have the code in the head of www.mozilla.org, the site receive ~100,000,000 page views per week, and have custom GA event tagging in the body of our pages.

If we wanted to, I could do an A/B test using Google Content Experiments on comparing load time and conversion rates between the placement of the tag within the template (header vs footer).
(In reply to Chris More [:cmore] from comment #17)
> We have the code in the head
> of www.mozilla.org, the site receive ~100,000,000 page views per week, and
> have custom GA event tagging in the body of our pages.

Again, no, this is NOT the same thing.

You have the code inline.

That is not an option for SUMO, or any site with CSP.
(In reply to James Socol [:jsocol, :james] from comment #18)
> (In reply to Chris More [:cmore] from comment #17)
> > We have the code in the head
> > of www.mozilla.org, the site receive ~100,000,000 page views per week, and
> > have custom GA event tagging in the body of our pages.
> 
> Again, no, this is NOT the same thing.
> 
> You have the code inline.
> 
> That is not an option for SUMO, or any site with CSP.

Yes, I understand and not an option with CSP. We can WONTFIX this as GA will still work.
Per comment 19.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
Here is Google's premium support response to this bug for future reference and future GA discussions. Initially, I mentioned that having it in the head helps with page timing data, which is inaccurate. I was initially told/read this on another forum and it was an inaccurate statement. There was some misinformation out on the web when the async tag was first released. Google provided the following information about 30 minutes ago.

--

The page timing is tracked using the `window.performance` variable. It should work regardless of where you place the code. But it only works for browsers that support that API. Still it's a bad idea to have the tracking code at the bottom of the page for other reasons, that may keep the Google Analytics from firing for long pages where the user navigates before it's completely rendered. This use case is specially bad if it happens on the landing page because the second page the user visits will become a landing page for GA and that visit will be recognized as a self-referral in analytics. We lose the traffic source. 

As long as you keep the tag up (inside <head> or at the beginning of <body>) you are safe regardless if you use async or not. But If you keep your pages small it doesn't make that much of a difference, even the tag at the bottom could load fast enough.

The recommended method strives for a fast firing and non-blocking tag. I'll give you two ideas that keep the same goals without having inline <script> tags on your 
site:

Bundle at the Top

Bundle the whole analytics tracking code into another script that lives in the <head> or at the top of the <body> for your site. Seems like it's exactly what you guys do today but instead moving the bundle to the top, or creating another bundle just with script that need to live at the top. As you saw it will cause the script to load a bit later than it currently does, because until ScriptA is fully loaded and executed the ScriptB won't start executing. Still this is probably the simpler method and should give you good enough results.

Load both scripts in parallel

Have a tag just for analytics and place in the top. Load the ga.js file outside that tag. This will cause both file to load at the same time. So in theory these are the same amount of requests as before but now the code should fire earlier. The downside is that it will only load the ga.js script asynchronous on browsers that support the async property as we need to remove the hacks for iE, on other browsers it will block the loading of the page briefly. Also you need to explictly load the correct protocol to match the one in your page (HTTP/HTTPS)

eg:

<script async src="mozilla_analytics.js"></script>
<script async src="//www.google-analytics.com/ga.js"></script>
contents of mozilla_analytics.js

    var _gaq=_gaq||[];var extraPush=$("body").data("ga-push");_gaq.push(["_setAccount","UA-36116321-2"]);if(extraPush&&extraPush.length){for(var i=0,l=extraPush.length;i<l;i++){_gaq.push(extraPush[i])}}_gaq.push(["_trackPageview"]);

--
Bizarre. I can try to reach out to dcramer--unless someone already has.
Apologies, comment 22 was for a completely different bug. Thanks, Thunderbird Conversations.
Thanks a lot for the added information, Chris and James.

As far as I can tell the downside of our current implementation is that we might not be counting page views, because we don't get to load the GA script. That has two possible effects:

a. We underestimate bounces on landing pages.
b. We count subsequent pages and landing pages.

AFAIK the downside of putting the the loading GA script inline is that it would not work with CSP. However, we are not using CSP yet.

I'd really like to know how big the effect of having the script load so late actually is. If we can rule it out as a source for further design decisions, we can keep our current implementation and also move to CSP. 

James, do you see an issue with testing this for a limited time to determine the effect size?

Chris, would you be able to help us determine the effect size?

Reopening, while we are discussing this further.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Flags: needinfo?(chrismore.bugzilla)
Flags: needinfo?(james)
(In reply to Kadir Topal [:atopal] from comment #24)
> James, do you see an issue with testing this for a limited time to determine
> the effect size?

Testing what, exactly? Turning on CSP is non-optional for a site like SUMO. That it's not on now really isn't OK. (Can we get bug 586485 into the backlog?)

From comment 14, there are two possibilities if we put a <script> tag (without inline content) in the <head>:

1) It does not have the async attribute and so it blocks the page. Then it adds another <script> (with ScriptB) tag that DOES have the async attribute. This causes ScriptB to execute sooner, almost--but not quite--as soon as if we had ScriptA inline.

Or

2) It does have the async attribute, so it loads, and runs, eventually, and adds another script tag with the async attribute, which loads, and runs eventually. That sequence of asynchronicity means ScriptB, at least in my tests, doesn't seem to execute any sooner than if it's at the foot.

In either case, we're adding an extra HTTP request, but at least in case #2 it's an asynchronous one.

If you just want to put ScriptA inline in the head for a week or so to measure the systematic bias of having it elsewhere, I don't see a problem with that, but it cannot stay there.
Flags: needinfo?(james)
(In reply to Kadir Topal [:atopal] from comment #24)
> Chris, would you be able to help us determine the effect size?
> 
> Reopening, while we are discussing this further.

This would be difficult to really measure well. Probably about the only way to really tell the effect of moving the tag location is to do an A/B test with either optimizely.com or Google Content Experiments (we use both) to measure if a page's bounce rate (or other metric) is statistically different. This is a decent amount of work for something that is a moot point with CSP on the horizon. Google explained that the data is more accurate in the <head>, but it will still work anywhere on the page.

The accuracy is the data is only good if we are making specific decisions on the exact metrics for given pages. Google explained that the data is not inaccurate with the tag in the footer, but it is *more* accurate in the header. Since most work and optimizations with web analytics is based off of trend changes, you can confidently assume that the less-accurate data coming from footer GA tags is consistent over time with a large sample size.

We had this issue when A/B testing the new https://www.mozilla.org/firefox/new/ page. Randomly, some GA virtual page views were being dropped and it appeared that the conversion rate was lower than normal. This was not a big deal because it was consistent over a large sample set and also consistent between the A and B versions. Errors like this are fine as long as they are consistent. We were looking for changes between the A and B over millions of views to get to 95% probability and it worked great.

Does that help?
Flags: needinfo?(chrismore.bugzilla)
(In reply to Chris More [:cmore] from comment #26)
> Google explained that the data is more
> accurate in the <head>, but it will still work anywhere on the page.

As far as I can tell, this refers only to inlining the script (and indeed, if we were doing that, the <head> would be the best place for it). I haven't seen them comment on either of the cases described in comment 25.
(In reply to James Socol [:jsocol, :james] from comment #27)
> (In reply to Chris More [:cmore] from comment #26)
> > Google explained that the data is more
> > accurate in the <head>, but it will still work anywhere on the page.
> 
> As far as I can tell, this refers only to inlining the script (and indeed,
> if we were doing that, the <head> would be the best place for it). I haven't
> seen them comment on either of the cases described in comment 25.

Yes, true. Like I mentioned in comment 26, if it is as accurate or less accurate, it shouldn't matter over time when comparing changes in trends. We will probably be making changes to pages or improvements based on the percentage difference between metrics and not their absolute values.
Thanks for the clarifications, James and Chris.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → WONTFIX
Cleaning out the backlog.
Whiteboard: u=sumo-team c=general p=1 s=2013.backlog → u=sumo-team c=general p=1
You need to log in before you can comment on or make changes to this bug.