Closed Bug 1628740 Opened 5 years ago Closed 4 years ago

FxA Metrics ETL: Bulk export of send tab metrics to Amplitude

Categories

(Data Platform and Tools :: General, task, P1)

task
Points:
3

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: klukas, Unassigned)

References

Details

Attachments

(3 files)

See description in https://docs.google.com/document/d/1UNulczGgBSepy0V1YISAfEuczbtvXaR59HB7xwTCFCk/edit#heading=h.yku9rl5thbdk

We need to export "tab sent" and "tab received" events to Amplitude based on sync telemetry.

The existing schema for the Amplitude events is:

User     fxa_uid        string    
Event    device_id      string    
User     ua_browser     string    
User     ua_version     string    
Event    flow_id        string                               

In BigQuery, we have the telemetry.sync ping view which includes fields payload.uid (which should be the hashed fxa_uid), payload.device_id, and user agent info parsed under metadata.user_agent, though we may want to check on format of those fields and whether they match previous expectations. See https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/data/sync-ping.html for more details on the ping structure.

The flow_id will be extracted from payload.events where it appears in the object map as key "flowID".

This may be a fairly straightforward task of creating a query to write out these events 1:1 to a derived table with schema matching the above, and then tying that table into the same nightly bulk export machinery we use for other Amplitude sends.

Send tab metrics are currently broken, so it's not clear at this point how we can validate the output.

The following query prototype parses out most of what we need:

WITH
  events AS (
  SELECT
    *,
    udf.deanonymize_event(event).*
  FROM
    `moz-fx-data-shared-prod.telemetry_stable.sync_v4`
  CROSS JOIN
    UNNEST(payload.events) AS event
  WHERE
    DATE(submission_timestamp) = "2020-01-09" )
SELECT
  payload.uid,
  --payload.device_id,
  --insert_id,
  CASE event_object
  WHEN 'processcommand' THEN 'tab_received'
  WHEN 'sendcommand' THEN 'tab_sent'
  END AS event_type,
  metadata.user_agent.browser AS ua_browser,
  metadata.user_agent.version AS ua_version,
  event_object,
  udf.get_key(event_map_values, 'flowID') AS flow_id,
  submission_timestamp,
  TIMESTAMP_SECONDS(SAFE_CAST(SUBSTR(udf.get_key(event_map_values, 'serverTime'), 1, 10) AS INT64)) AS server_time,
FROM
  events
WHERE
  event_method = 'displayURI'
LIMIT
  1000

To finish this off, we need to include some more metadata fields and we'll need to put the user_properties and event_properties into JSON-ified form. These changes should be very similar to the fennec ios events view.

Attached file send_tab_events query
I think this is very close to a complete query.

I'm checking with Rob about Amplitude event budget.

The view that defines the schema here is now committed. It's doing sampling at 10% for now in order to save on Amplitude budget.

We have an Airflow PR in review to schedule daily dumps of send tab data to AWS S3. At this point, we need to figure out where in Amplitude this data should go. If we have an appropriate dev project, we should try that first so we can verify the schema, etc. before pushing to the prod project.

Events are now flowing to the dev project. I'm awaiting validation from FxA folks to see if we need to make any tweaks to the format before we go ahead and start populating prod.

Questions to evaluate:

  • Are fields being populated as expected?
  • Is volume approximately as expected (understanding that we're only sending a 10% sample)?
  • Do we need to include messages lacking the payload.uid and payload.deviceID fields?
  • Do the hashed fxa UIDs match up with the hashed UIDs coming from other sources?
Priority: P2 → P1

Validation with DS happened in https://jira.mozilla.com/browse/DS-642 and we were able to resolve outstanding questions.

These events are now flowing to the prod FxA project in Amplitude.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: