FxA Metrics ETL: Make device-commands events available for querying in redash
Categories
(Data Platform and Tools :: General, task, P2)
Tracking
(Not tracked)
People
(Reporter: rfkelly, Assigned: klukas)
References
Details
(Whiteboard: [SACI])
Attachments
(1 file)
(I'm filing this by analogy to Bug 1633918 but please don't hesitate to redirect me as appropriate)
For monitoring the quality of the "send tab to device" feature, we currently have a redash dashboard here that is build using client-submitted metrics from the sync ping. To get a more reliable sense of whether and how sending tabs is succeeding, we'd like to add some server-generated metrics to this dashboard. I have a PR to emit them on the FxA server here and they're documented here.
In order to add them to our dashboard, I believe we need an ETL job that extracts these events from the raw server logs, applies the HMAC anonymization logic to any user-linked identifiers, and then makes them available in a table queryable from redash. Leif, that's based on your recent suggestions in the send-tab metrics google doc, does it sound right to you?
I'm quite happy to have a go at putting together the necessary queries here, based on existing FxA jobs in the bigquery-etl repo. But any advice or guidance would of course be appreciated.
Reporter | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Reporter | ||
Comment 1•4 years ago
|
||
These metrics have started to become available in bigquery; noting the query that I used to find them for reference:
WITH funnel_events AS (
SELECT
TIMESTAMP_TRUNC(timestamp, HOUR) AS hour,
jsonPayload.type,
jsonPayload.fields.uid,
jsonPayload.fields.index,
jsonPayload.fields.command,
jsonPayload.fields.target,
jsonPayload.fields.targetOS,
jsonPayload.fields.targetType,
jsonPayload.fields.sender,
jsonPayload.fields.senderOS,
jsonPayload.fields.senderType,
FROM `moz-fx-fxa-prod-0712.fxa_prod_logs.docker_fxa_auth_2020*`
WHERE jsonPayload.type IN ("device.command.invoked", "device.command.notified", "device.command.retrieved")
-- This date is when the metrics were deployed and stable in prod;
-- there is some data for earlier dates but it's from a failed deployment so we don't count it.
AND _TABLE_SUFFIX >= '0708'
),
:klukas, any chance you can recommend a good existing example script in the bigquery-etl repo, that I could use as a starting point to poke at extracting these events over to redash?
Assignee | ||
Comment 2•4 years ago
|
||
I'll plan to take a crack at this today. Your sample query here give me the bulk of what I need to productionize the query. You'll need a data engineer involved anyway to provision the table.
Comment 3•4 years ago
|
||
Assignee | ||
Comment 4•4 years ago
|
||
See your example query translated into something appropriate for bigquery-etl in https://github.com/mozilla/bigquery-etl/pull/1150
Feel free to comment there or use it as a basis for your own PR. It's a bit difficult of a workflow, since the HMAC piece requires privs available only to Airflow, so you'll need to comment that piece out if you want to iterate on the query in the BQ console.
Assignee | ||
Comment 5•4 years ago
|
||
This new table/query is now scheduled and backfilled.
Reporter | ||
Comment 6•4 years ago
|
||
Thank you! Linking for completeness, I was able to use this data to make some nice graphs:
:tcsc, could you please incorporate these into your new send-tab dashboard for easy future reference? I had a look but I don't think I have edit permissions.
Comment 7•4 years ago
|
||
I can't because those aren't published, but you should have edit permissions now.
Description
•