Manage event volume of new "fxa_activity - oauth_access_token_created" event
Categories
(Data Platform and Tools :: General, enhancement, P1)
Tracking
(Not tracked)
People
(Reporter: jhirsch, Assigned: jhirsch)
Details
Attachments
(3 files, 1 obsolete file)
This is a followup from bug 1632635.
Sync is in the process of moving from using browserid assertions to oauth access tokens, and this change is already completed for mobile. We've lost visibility into the number of devices used by these users, as the daily device info was recorded in the fxa_activity - cert_signed
events fired when a browserid assertion was signed by the auth-server.[1]
I'm just about to land[2] a new amplitude event in FxA to restore the missing device usage data, by instead recording an event when oauth access tokens are created, and appending the device data to that event.
The new event name is fxa_activity - oauth_access_token_created
.
This should hit production next week. In parallel, ops will lower the FxA oauth access token TTL from 2 days to 6 hours, which is the same TTL as the old browserid assertions, so we'll have the same need for per-user rollups that we had with the old cert_signed
events.
It looks like a bigquery-etl PR will be needed, and one in cloudops-infra as well. I'll open those now and attach to this bug.
[1] https://github.com/mozilla/fxa/issues/5143
[2] https://github.com/mozilla/fxa/pull/5820
Assignee | ||
Comment 1•4 years ago
|
||
Hope it's okay to assign this bug to you, Frank :-)
I'm planning to do the work, I'll just need reviews in case I've missed something.
We'll want to leave the fxa_activity - cert_signed
rollups in place until desktop switches from browserid to oauth at some point in the future.
Comment 2•4 years ago
|
||
Assignee | ||
Comment 3•4 years ago
|
||
Assignee | ||
Comment 4•4 years ago
|
||
Assignee | ||
Comment 5•4 years ago
|
||
Comment 6•4 years ago
|
||
Updated•4 years ago
|
Updated•4 years ago
|
Comment 7•4 years ago
•
|
||
The patch in #c5 has been applied in production
Comment 8•4 years ago
|
||
Comment 9•4 years ago
|
||
Jared, assigning this bug back to you since it looks like you already completed it. Let me know if I'm missing something.
Assignee | ||
Comment 10•4 years ago
|
||
Thanks, Frank! Looks like the two pull requests have landed, so I think this is good to close.
Assignee | ||
Comment 11•4 years ago
|
||
Hey Frank, is it possible to verify that a non-zero number of these new events are arriving and are getting added to the rollup counts?
Comment 13•4 years ago
•
|
||
From
SELECT
jsonPayload.fields.event_type,
COUNT(*)
FROM
`moz-fx-fxa-prod-0712.fxa_prod_logs.docker_fxa_auth_20200817`
GROUP BY
1
I see 9219819 "fxa_activity - oauth_access_token_created" events in yesterday's source table. So these events are definitely arriving, and I based on the current logic in the query they should definitely be included in the rollups.
Comment 14•4 years ago
|
||
Sorry for the delay here - the best we can do is what :klukas queried. We don't have any count of what the events are in Amplitude. Any particular reason you think they might not be getting sent over?
Assignee | ||
Comment 15•4 years ago
|
||
Alex is seeing some unexpectedly low usage numbers in Amplitude. I think he's on PTO this week, but I'll redirect the needinfo his way so he can share a link to an amplitude chart
Comment 16•4 years ago
•
|
||
When Firefox for iOS moved over to OAuth in May, our multi-device roll-up numbers tanked in Amplitude. Other metrics changes did happen then (like the activity roll-up this bug focused on) so perhaps one of the others also impacted this.
At the beginning of August, as Android users have started to move from Fennec to Fenix (OAuth), we've seen our multi-device roll-up number slowly drop too.
With desktop moving to OAuth last week, we're seeing the roll-up number drop even further.
https://analytics.amplitude.com/mozilla-corp/chart/n0byr37
Leif might have other ideas.
Comment 17•4 years ago
•
|
||
I think the drop in Alex's chart relates to this issue.
That issue is now closed, but seems (?) also to maybe have been blocked on this issue, which is still open.
Jared, in FXA-1780 (former link above) you write:
- Update the FxA server metrics code to derive the missing sync usage events from the
fxa_activity - access_token_created
event, generated when an oauth token is created, instead of the no longer reliablefxa_activity - cert_signed
event, which was generated when a browserid assertion was signed.
Did we ever do this? If not, I think it would explain why our sync_active_devices_*
metrics are tanking,
Comment 18•4 years ago
|
||
FWIW we could update the Amplitude export to add a new user property that tells us which events the meta-event is derived from. It wouldn't take long to figure out then which activity event was affecting the change.
Assignee | ||
Comment 19•4 years ago
•
|
||
(In reply to Leif Oines [:loines] from comment #17)
Jared, in FXA-1780 (former link above) you write:
- Update the FxA server metrics code to derive the missing sync usage events from the
fxa_activity - access_token_created
event, generated when an oauth token is created, instead of the no longer reliablefxa_activity - cert_signed
event, which was generated when a browserid assertion was signed.Did we ever do this? If not, I think it would explain why our
sync_active_devices_*
metrics are tanking,
Yes, at least I think so:
- the
fxa_activity - oauth_access_token_created
event was added to FxA in https://github.com/mozilla/fxa/pull/5820, meaning these events are in the FxA server logs that are routed into BigQuery - the sync device counts are generated by bigquery-etl code, and this very bug contains the patch that added the new event to those rollups (comment #3) as well as a patch that filters this high-volume event out of amplitude, as we've done for the other oauth events (comment #5)
- in comment #13, klukas confirmed that these new events are in the FxA logs in BigQuery and should be present in the rollups.
Assignee | ||
Comment 20•4 years ago
|
||
(In reply to Frank Bertsch [:frank] from comment #18)
FWIW we could update the Amplitude export to add a new user property that tells us which events the meta-event is derived from. It wouldn't take long to figure out then which activity event was affecting the change.
Sure, that sounds like a good idea to me. Is this an easy change to make?
Comment 21•4 years ago
|
||
Sure, that sounds like a good idea to me. Is this an easy change to make?
It should be relatively straightforward. Let me take a look.
Comment 22•4 years ago
|
||
Comment 23•4 years ago
|
||
Thanks for the info jared.
So the desktop browser stopped using browser_id assertions in version 80, which definitely means we will not be getting the fxa_activity - cert_signed
events going forward. However, this should not have caused the drop we are seeing since virtually all these clients should also be generating the fxa_activity - oauth_access_token_created
event as well, and in theory with the same device_id (?).
Taking a look at this query the majority of oauth_access_token_created events have a sync_active_devices_day
count of 0. Which is wrong, since the device that generated the event should be counted as active. I think this may be the source of the problem.
Jared, you said that these device counts are computed by the bigquery_etl code, but I think its just taking the max
of the value that is provided by the event, which is computed by the FxA server.
Comment 24•4 years ago
|
||
Jared, I merged the above PR, so on the next data load we will have a new event property called rollup_events
that is an array of the distinct events that make up that active event.
Jared, you said that these device counts are computed by the bigquery_etl code, but I think its just taking the
max
of the value that is provided by the event, which is computed by the FxA server.
I can confirm, this is coming from the FxA server.
Assignee | ||
Comment 25•4 years ago
|
||
Hmm, I wonder if we're not getting a list of devices because of the weird magic of the metricsContext, an issue that came up just this week: https://github.com/mozilla/fxa/issues/6347.
Vijay, you were just poking around at this code. Any idea if this new event needs to pull device counts out of the metricsContext to work?
Comment 26•4 years ago
|
||
Any idea if this new event needs to pull device counts out of the metricsContext to work?
I believe we should be doing this already. Quick test of https://github.com/mozilla/fxa/issues/6347 and I see the following amplitude events logged locally
first sync login
16|auth | 2020-09-11T11:50:46: fxa-auth-server.INFO: amplitudeEvent {"op":"amplitudeEvent","event_type":"fxa_activity - oauth_access_token_created","time":1599839446263,"user_id":"e7fc3c71d8ff45b684160819cd114c64","app_version":"187.0","language":"en-US","os_name":"Mac OS X","os_version":"10.15","event_properties":{},"user_properties":{"ua_browser":"Firefox","ua_version":"80.0","sync_device_count":1,"sync_active_devices_day":1,"sync_active_devices_week":1,"sync_active_devices_month":1}}
16|auth | 2020-09-11T11:50:46: fxa-auth-server.INFO: amplitudeEvent {"op":"amplitudeEvent","event_type":"fxa_activity - cert_signed","time":1599839446269,"user_id":"e7fc3c71d8ff45b684160819cd114c64","app_version":"187.0","language":"en-US","os_name":"Mac OS X","os_version":"10.15","event_properties":{},"user_properties":{"ua_browser":"Firefox","ua_version":"80.0","sync_device_count":1,"sync_active_devices_day":1,"sync_active_devices_week":1,"sync_active_devices_month":1}}
which shows sync_device_count=1
in the user properties.
second sync login on new profile
16|auth | 2020-09-11T11:52:32: fxa-auth-server.INFO: amplitudeEvent {"op":"amplitudeEvent","event_type":"fxa_activity - oauth_access_token_created","time":1599839552406,"user_id":"e7fc3c71d8ff45b684160819cd114c64","app_version":"187.0","language":"en-US","os_name":"Mac OS X","os_version":"10.15","event_properties":{},"user_properties":{"ua_browser":"Firefox","ua_version":"80.0","sync_device_count":2,"sync_active_devices_day":2,"sync_active_devices_week":2,"sync_active_devices_month":2}}
16|auth | 2020-09-11T11:52:32: fxa-auth-server.INFO: amplitudeEvent {"op":"amplitudeEvent","event_type":"fxa_activity - cert_signed","time":1599839552411,"user_id":"e7fc3c71d8ff45b684160819cd114c64","app_version":"187.0","language":"en-US","os_name":"Mac OS X","os_version":"10.15","event_properties":{},"user_properties":{"ua_browser":"Firefox","ua_version":"80.0","sync_device_count":2,"sync_active_devices_day":2,"sync_active_devices_week":2,"sync_active_devices_month":2}}
which shows sync_device_count=2
in the user properties.
From the looks of the log, the counts appear to be correct?
Assignee | ||
Comment 28•4 years ago
|
||
I'm investigating this currently, and it seems like an issue on the FxA server side.
I've got sync working with local FxA, and I'm observing a number of different amplitude events logged with inaccurate sync device counts--in particular, seeing a lot of 0 values.
Closing.
Assignee | ||
Comment 29•4 years ago
|
||
In testing the sync sign in flow locally, it looks like the oauth_access_token_created
event is fired twice in close succession, with different values for the sync device counts, and the properties of the event vary between the two.
The first time the event is fired, the current login isn't counted towards the device totals. So, for a first login of the day, sync_device_count
and sync_active_devices_day
will both be 0
.
A few seconds later, the event is fired a second time, with the sync device counts correctly incremented by one.
The first event also has some properties missing from the second: device_id
, session_id
, event_properties = {service: 'sync'}
, and the user_properties
field has flow_id
and $append: { fxa_services_used: 'sync' }
. The second event has event_properties = {}
, and the other listed properties are simply not present.
Could these differences explain the sync device counts being unusually low? For instance, I noticed the rollup query includes a JSON_EXTRACT_SCALAR... IGNORE NULLS
statement for the event_properties.service
field. How exactly does IGNORE NULLS
work in a case like this? Could the event with no {service: 'sync'}
be getting dropped from the rollup?
Here are two pretty-printed example events from local testing, again, these are both fired in the course of one login to desktop sync:
first event:
{
"op": "amplitudeEvent",
"event_type": "fxa_activity - oauth_access_token_created",
"time": 1601421751982,
"user_id": "1f63e61e519648a88fb8661c60b1fb28",
"device_id": "3e436ac79c894f29932839e474149281",
"session_id": 1601421741925,
"app_version": "189.0",
"language": "en-US",
"os_name": "Mac OS X",
"os_version": "10.15",
"event_properties": {
"service": "sync"
},
"user_properties": {
"flow_id": "2de9c8bceed3723c3bd921668e5b74f525910a322a5c2afe3d35d089dac5c339",
"ua_browser": "Firefox",
"ua_version": "83.0",
"$append": {
"fxa_services_used": "sync"
},
"sync_device_count": 0,
"sync_active_devices_day": 0,
"sync_active_devices_week": 0,
"sync_active_devices_month": 0
}
}
second event:
{
"op": "amplitudeEvent",
"event_type": "fxa_activity - oauth_access_token_created",
"time": 1601421752248,
"user_id": "1f63e61e519648a88fb8661c60b1fb28",
"app_version": "189.0",
"language": "en-US",
"os_name": "Mac OS X",
"os_version": "10.15",
"event_properties": {},
"user_properties": {
"ua_browser": "Firefox",
"ua_version": "83.0",
"sync_device_count": 1,
"sync_active_devices_day": 1,
"sync_active_devices_week": 1,
"sync_active_devices_month": 1
}
}
Comment 30•4 years ago
|
||
Hey Jared,
To answer your question - no, the IGNORE NULL
only ignores nulls for that column, it does not drop that row from other aggregates.
To verify this, I created two queries, and both show what the data in Amplitude shows.
This viz shows the number of clients reporting more than 1 of each aggregate in the derived table we export to Amplitude.
This viz is the same query, but on the raw log data.
I believe this shows that the data issue is happening before it even hits the logs.
Assignee | ||
Comment 31•4 years ago
|
||
Thanks for checking, Frank! The sync device count story turns out to be a little complicated[1], but a fix for a related metrics bug landed last week[2] and hopefully that will resolve the sync device undercounting.
[1] https://github.com/mozilla/fxa/issues/6578
[2] https://github.com/mozilla/fxa/pull/6581
Assignee | ||
Updated•4 years ago
|
Description
•