Closed Bug 941703 Opened 11 years ago Closed 11 years ago

Implement Google Analytics on crash-stats.mozilla.org

Categories

(Websites :: Web Analytics, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lonnen, Assigned: garethc)

References

Details

full address:
https://crash-stats.mozilla.com/home/products/Firefox

the goal:
crash-stats has many features, and is expanding, but we have no insight into how people are using which features. We'd like to restore basic analytics gathering so we can see what make decisions based on data. We had it a few years ago, but got rid of it as part of the move to WebTrends (for which we were denied a seat).

requirements to track user interactions:
There are a few intra-page interactions we're interested in examining, but they aren't critical at the outset. Off the top of my head I can think of two particular interactions (one button on a page, one page with tabs) where we want insight.

who should have access:
Laura Thomson lthomson@mozilla.com
Lars Lohn lars@mozilla.com
Rob Helmer rhelmer@mozilla.com
Chris Lonnen lonnen@mozilla.com
Adrian Gaudebert agaudebert@mozilla.com
Brandon Savage bsavage@mozilla.com
Schalk Neethling sneethling@mozilla.com
Peter Bengtsson peterbe@mozilla.com

These are the module owners/peers of the product.
Blocks: 940560
Gareth: Can you get a tag here and add the people above to the profile? note the domain is .com. Thanks!
Assignee: nobody → garethcull.bugs
Any action on this?
Place this tag in the <head> of the page:

<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-35433268-50', 'mozilla.org');
  ga('send', 'pageview');

</script>
All users from comment 0 added and a few required different emails addresses that match their Google Apps usernames.
Google has been emailed to enabled premium. Let me know when the tag is in place and I will verify data is coming in.
We should ship this on this upcoming wednesday.
This is deployed in production. Data should be coming in now.
(In reply to Chris Lonnen :lonnen from comment #7)
> This is deployed in production. Data should be coming in now.

I adjusted the hostname filter to only include crash-stats.mozilla.com. It was .org before. The filters take a bit to update and we can check on it in a few hours.
Thanks!
I see real time data.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
The data looks really anemic. In the GA interface it lists http://crash-stats..., but most of our traffic is over https. Does that need to be changed?
(In reply to Chris Lonnen :lonnen from comment #11)
> The data looks really anemic. In the GA interface it lists
> http://crash-stats..., but most of our traffic is over https. Does that need
> to be changed?

I switched it to https in the interface, but it shouldn't matter. The protocol is only used when providing fqdn and absolute URLs in the GA interface.
Hrm. Maybe we're really seeing this low level of traffic. It's orders of magnitude lower than what I see in the apache logs, but I've long suspected that was mostly bot traffic and scrapers/crawlers don't usually load and execute JS.

Is GA smart enough to discount selenium driver? Does it respect DNT?
(In reply to Chris Lonnen :lonnen from comment #13)
> Hrm. Maybe we're really seeing this low level of traffic. It's orders of
> magnitude lower than what I see in the apache logs, but I've long suspected
> that was mostly bot traffic and scrapers/crawlers don't usually load and
> execute JS.
> 
> Is GA smart enough to discount selenium driver? Does it respect DNT?

GA tried to remove as many bots and automated scripts as possible from their logs. Also, make sure you are looking at pageviews when comparing to Apache logs and not visits. GA doesn't do anything different when DNT is enabled unless you wrap the JS tag in some conditional.

What you are probably seeing is caused by GA's sampling rate. On high traffic websites, the way that GA is so fast and that it can process the logs quickly is that it shows a sample of the traffic in all of the graphs. This is not a big deal on high volume websites because GA is good at estimating historical/daily/hourly trends and is able to accurately fill in the data between the samples. On a low traffic website, it is not super accurate for raw numbers. Sampling still works fine on low traffic websites when you just want to look at trends and percentages, but raw numbers will often not off. You can download the unsampled report from any report by viewing any report, clicking export near the top, and going to unsampled report. It will take some time to process the raw log files and you will emailed a CSV of the raw data. I wrote a script that went to our websites 10,000 times. I then compared the GA number in the normal reports and it wasn't exactly 10,000, but when I downloaded the unsampled report, it was exactly 10,000. I also compared the unsampled data to apache and it was accurate as long as you remove all spiders.

Does that help?
Yes, that does. I'm seeing 10s of requests per day, where our apache log traffic was saying thousands, which made me suspicious. Reviewing my findings from parsing the apache logs, though, and I see that nearly 75% of the apache traffic was wget or urllib, and nearly %20 was nagios, with a long tail of mixed clients after that. Interesting findings.

thanks!
You need to log in before you can comment on or make changes to this bug.