Closed Bug 1293750 Opened 8 years ago Closed 3 years ago

Replicate telemetry.m.o analytics using mozilla-only tech

Categories

(Data Platform and Tools :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: ddurst, Unassigned)

References

()

Details

We don't really have a good way of knowing how much use links from t.m.o get; one downside is in dashboard maintenance: we're looking at potentially rewriting some dashboards that may or may not be actively used.

Adding analytics to track at least anchor clicks from t.m.o would allow us to determine what links are actively accessed (and we could retire others that aren't).
Also being discussed here: mozilla/telemetry-dashboard#300
Priority: -- → P3
Hey, :whd, how exactly does TMO _work_? We're thinking about analytics which may have some hosting impact... but we don't actually know how TMO is hosted.
Flags: needinfo?(whd)
TMO is hosted as a static website (s3), fronted by a CDN (cloudfront). Specifically telemetry.mozilla.org is a CNAME for d65ojvdli7jd8.cloudfront.net, a distribution which has as at its origin telemetry-mozilla-org.s3-website-us-west-2.amazonaws.com. This is actually contrary to what I remember (I was expecting the origin to be github-pages), because it means that this s3 bucket is populated somehow from github pages. I'm not actually sure what the population mechanism is at the moment but for the purposes of your query it may be irrelevant. I can dig into this more if needed.
Flags: needinfo?(whd)
I took a quick stab at how we might instrument this:
https://github.com/mozilla/telemetry-dashboard/compare/gh-pages...georgf:analytics

There are two parts here:
1) Instrument page views on every page within telemetry-dashboard.
2) Add tracking for outbound links on the TMO homepage.

We should definitely do 1).
2) is useful because most outbound links on the TMO homepage are outside of telemetry-dashboard. Instrumenting outbound links is useful here, so that we get a proper understanding of what people use from here.

I haven't used GA before, so i'm pretty unclear on whether the way i did it is the best way to go. But i confirmed the basic approach works for my test account.
Also, i can't find info on the Mana about our enterprise GA account.
Assignee: nobody → chutten
Priority: P3 → P1
Shot in the dark, but... :jezdez, would you happen to know how we do front-end analytics these days? Or know who would know?
Flags: needinfo?(jezdez)
Gareth Cull and Peter German are your Google Analytics contact points, if you want that. All they need to know is the FQDN and who needs access.
Status: NEW → ASSIGNED
This may end up being an excellent first consumer for the Generic Ingestion Pipeline. :frank's looking at writing up a schema, and I'll see about writing a little JS to use it. I think we're basing the schema on Telemetry Events (https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/collection/events.html) and the client-side JS on sendBeacon (https://caniuse.com/#feat=beacon)
Flags: needinfo?(jezdez)
Step one scratchpad WIP pre-alpha: https://github.com/chutten/telemetry-dashboard/commit/a2b80f87aa15f50ad4ee7f25668bd3e85ec77544

Then it's a matter of document.querySelectorAll('a', () => { if (!this.host.startsWith('telemetry.mozilla.org')) { window.ma.send('click', 'external_link', this.host) } }); or whatever
(Er, I missed an Array.from(...).forEach(a => a.addEventListener('click', ...)) in various parts. Exercise for the reader).
The submission URL should be:

https://pipeline-incoming.stage.mozaws.net/submit/<namespace>/<doctype>/<docversion>/<docid>

I propose that we use a namespace of "mozza" (a cheesy name for moz analytics), a doctype of "event", and a version of 1 to start with. The docid should be a client-generated UUID (we can hide this detail inside the JS wrapper).

We can land the schema in the mozilla-pipeline-schemas repo[1], and then we'll need to coordinate with :whd to do a deploy to stage.

[1] https://github.com/mozilla-services/mozilla-pipeline-schemas
I'm unclear about at what frequency we expect docid to change. Is it supposed to be per-submission? per-session? per-client? Or is it something we include in ma.js as a constant for anyone who's using ma.js? (in which case, should it change if there's a new version?)
Flags: needinfo?(mreid)
We should probably include some metadata in this ping. This might not be terribly useful right now but will be helpful long-term. Here are some that might be relevant:
- Browser + version
- OS + version
- permissions

We will have the following server-side:
- GeoCountry
- GeoCity

Lastly, how should we deal with opt-out? Perhaps we can cover that later.
(In reply to Chris H-C :chutten from comment #11)
> I'm unclear about at what frequency we expect docid to change. Is it
> supposed to be per-submission? per-session? per-client? Or is it something
> we include in ma.js as a constant for anyone who's using ma.js? (in which
> case, should it change if there's a new version?)

docId will not change much. Basically, for anyone sending this ping, it will have the docId of "event". This could mean separate apps or websites.
That's a doctype of "event" not a "docid", right?

As for opt-out, I made sure to check DNT in my initial code: https://github.com/chutten/telemetry-dashboard/commit/a2b80f87aa15f50ad4ee7f25668bd3e85ec77544
(In reply to Chris H-C :chutten from comment #14)
> That's a doctype of "event" not a "docid", right?
> 
> As for opt-out, I made sure to check DNT in my initial code:
> https://github.com/chutten/telemetry-dashboard/commit/
> a2b80f87aa15f50ad4ee7f25668bd3e85ec77544

Ah, I misread. Every ping sent should have a unique docId. Good call on DNT.
Yes, each ping should have a unique docId (though if we build in a "retry" it should remain stable for all attempts for a given submission). It should be considered a globally unique identifier for a particular instance of a document type.
Flags: needinfo?(mreid)
Schemas have been added at https://github.com/mozilla-services/mozilla-pipeline-schemas. The only required fields are property="tmo" and event, which must have a [timestamp, category, method, action]. The parquet schema does have a required Date header.
(In reply to Wesley Dawson [:whd] from comment #3)

> I'm not actually sure what the population mechanism is at the moment but for
> the purposes of your query it may be irrelevant. I can dig into this more if
> needed.

Because this was brought up again this week I did the extra digging and determined that the s3 bucket is populated using the ops jenkins infrastructure (https://deploy.mozaws.net/job/telemetry.mozilla.org/ for those with access). The job is a cron that runs the following in the context of the the ghpages branch:

aws s3 sync ./ s3://telemetry-mozilla-org/ --exclude '.git*' --delete

Hopefully that clears things up.
See Also: → 1430879
Rewording to cover the work actually tracked in this bug.
Priority: P1 → P2
Summary: Add analytics to telemetry.m.o → Replicate telemetry.m.o analytics using mozilla-only tech
Priority: P2 → P3
Product: Webtools → Data Platform and Tools

Seems unlikely I'll get to this any time soon :|

Assignee: chutten → nobody
Status: ASSIGNED → NEW

TMO (telemetry.mozilla.org) dashboards are deprecated in favor of GLAM. Future work and enhancements are focused on GLAM - with the goal to to address or surpass previous TMO use cases.

We are closing out older internal requests related to TMO - and are investing in expanding GLAM use cases.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
Component: Telemetry Dashboards (TMO) → General
You need to log in before you can comment on or make changes to this bug.