Closed Bug 742785 Opened 12 years ago Closed 12 years ago

Send data pings to metrics

Categories

(addons.mozilla.org Graveyard :: Code Quality, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: andy+bugzilla, Assigned: andy+bugzilla)

References

Details

Setup a library on AMO to allow us to send a data ping to metrics to their cluster:

https://github.com/mozilla-metrics/bagheera

We will need to ping metrics to make sure we sending the correct data.

This is about the general API which is a pretty simple HTTP request, preferably done with celery. Although a UDP ping would be really nice.

Seperate bugs will be filed for the places to ping from.
Blocks: 742789
Blocks: 742792
Why the move from processing log files to pinging?  Pinging gives us faster data but at the expense of reproducibility.  Our add-on metrics logging fails for various reasons and Daniel can go back and reprocess logs to fill in gaps in the statistics history.  What is your plan to refill gaps in the history when a network switch dies?
Daniel requested a ping. I was going to send it to celery and let that do the work and retries. We could log it, but then it feels like we are just re-inventing the wheel.
Almost all of the add-on metrics collection failures are due to load balancer config problems.

The problem with log file collection is that we are relying on the load balancers to provide metrics in addition to serving traffic.  IT does not generally have the same priorities for these two tasks.  Serving the traffic is the primary priority, and if they have to make changes on the fly to keep the site running well, they do so even if it impairs log file collection.

We also have a storage and retention issue with log file collection.  We need to move the log files from one data center to another and ensure we keep them around long enough to extract the necessary data from them.  In the case of Blocklist, the log files contain all requests for the AMO site in general and we have to filter through that looking for the one specific set of requests we are interested in.  That could be changed to split them up into more dedicated subdomains, but if we are talking about a write only service for this request (i.e. no useful response to be made other than OK), it feels cleaner to have a service dedicated to the collection and storage of that data in the most efficient manner possible.

As for recovery, if the service is catastrophically down, we will either lose data or the data would be delayed until the client retries submitting it.

On the occasions we have had catastrophic AMO site outages, the result has been the same.  Data is just missing for that time period.
On data to send, I can send:

- the app primary key

- the app domain (most likely used for correlating with other services)

- should I send anything about the user (so we can correlate with browserId, Sync or something else) without causing any privacy issues

- the user agent

- app price

- we've got a couple of different scenarios: install and purchase so far, would those be sent in the JSON blob or as part of the URL?

Is there anything else?
I don't think we need to send any ID stuff about the user.  User agent should be fine since that's already accessible by metrics.  Additional stuff could be locale, device (phone/tablet/desktop), screen size maybe?, app object (for in-app purchase?), ....ummm..... CCing fligtar/ragavan/gkoberger - now's your chance to get something tracked.
Device, screen size etc are parseable from the user agent?
In theory.  I was wondering if we had anything more accurate on AMO when they clicked download.  I suppose not.
Assignee: nobody → amckay
https://github.com/mozilla/zamboni/commit/c2b6fd

Let's find an endpoint to test against.
I'm having trouble understanding this bug but I think it's about AMO sending a download notification ping to a metrics server? Is this for only apps or apps and add-ons?

We would want to track: timestamp, src, browser, device, os/platform, locale

We'll eventually want to analyze things like whether the user was already logged in, whether it's a re-download or initial purchase, etc. but we don't need them yet.
To my knowledge, at the moment, we are only looking at this for the new Apps stuff.  Of course, I'd be happy to eventually use the same mechanism for add-ons as well.

I'd like to request one more review on this though, not just asking for random pieces of data that we wish to collect, but rather a list of the analytic questions we would like to ask that we can then tie in to the needed datapoints.

For instance:

1. How many new app downloads do we get per day?
  Broken down by:
a. locale
b. product/version/channel
c. download source
d. device
e. os/platform

If we capture these questions we wish to ask, then we will know that we aren't collecting unnecessary data and that we aren't missing data for questions we know we want to ask.
Filed bug 744897 for getting into metrics.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.