Reduce "fxa_activity - cert_signed" event volume to one per-user per-day
Categories
(Data Platform and Tools :: General, task, P1)
Tracking
(Not tracked)
People
(Reporter: frank, Assigned: frank)
Details
Attachments
(3 files)
As an org we're sending more events to Amplitude than we expected when we signed the contract last year. This bug is specifically for FxA, and understanding the "cert_signed" event, which accounts for ~23% of the total number of events we send to Amplitude.
Leif, can you describe what this event is, and if there could be a way to reduce the volume that it's being sent?
Comment 1•4 years ago
•
|
||
The event is fired periodically for authenticated and verified sync users on desktop, fennec and ios. It is fired whenever a client hits this endpoint on the FxA auth server. See also this.
My understanding (not an engineer) is that Sync periodically needs to refresh this certificate in order to receive a token from the sync token server that will allow it to continue syncing.
This is one of only two server-side events that are reliably generated and that will let us infer whether a client is still syncing. The other events are the creation and checking of oauth tokens, which are of even higher volume and which we already filter out before they are sent to amplitude. As such it is the primary event that contributes to measures of FxA/Sync DAU and MAU (a user that syncs at least once on a given day should almost always generate one of these events). I know Alex is critically reliant on it for daily monitoring of our metrics.
The FxA team is aware that this is a costly event due to its volume. We are actively exploring ways to sample it, see bug 1592123.
CC Jared and Alex
Assignee | ||
Comment 2•4 years ago
|
||
Hey Jared and Jon, we tentatively came up with the idea of sending a single cert_signed
event per-user at EOD instead of real-time. I have no knowledge of the current pipeline to Amplitude, but Leif has indicated that it's tailed server logs -> Pubsub -> Amplitude HTTP API.
Do either of you have any idea if this is feasible? It would essentially move processing these events to a batch job, and aggregating per-client. My guess is this is too much data to possibly do on one node, so we'd need something like BQ as an intermediate to do that processing. Once we've aggregated, we could use the same pipeline you already do and push the data to Pubsub and then to Amplitude, hopefully resulting in the same schema with minimal work.
We also need to be clear about which metrics and plots this may affect for any real-time, or daily, visualizations you all use. Any single-event counts should be unaffected, but general user-counting wouldn't be available until EOD. Similarly, retention would be low until these events are sent.
Assignee | ||
Comment 3•4 years ago
|
||
The idea outlined in Comment 2 would reduce the number of events monthly from ~1.4B to ~250M, a reduction of about 1150M, or about a 19% reduction in total events we send to Amplitude.
Comment 4•4 years ago
|
||
The other events are the creation and checking of oauth tokens, which are of even higher volume and which we already filter out
before they are sent to amplitude.
I wanted to chime in here to mention, the fact that we filter these out in Desktop is shaping up to be a blocker for some client-side work the sync team is doing, because we want to move away using BrowserID assertions (which generate cert-signed events) and towards OAuth tokens (which generate oauth-related events). See e.g. Bug 1591312 where we had to back out some work that was heading in that direction, because it would have meant some users disappearing from our MAU.
From my perspective, an ideal solution would send a single activity
event per user at EOD, encompassing both cert_signed
and token_created
events. I don't think we particularly need the ability to distinguish which event it was, only that some FxA-related activity was registered for that user no that day.
Comment 5•4 years ago
|
||
This sounds like it would be almost be trivial to implement. All FxA event logs are already sent to BigQuery via a Stackdriver integration, and Airflow has permission to query those tables. Our FxA KPI reporting is based on a series of queries that hit these Stackdriver-created tables, and it sounds desirable that we'd send exactly the same set of users to Amplitude as what we consider active for KPIs.
The difficult thing here is that FxA IDs are hashed using an HMAC key before being sent to Amplitude. The logs in BigQuery have the raw IDs. This is a problem we keep running into in discussing various FxA metrics tasks.
It may be worth at this point considering whether we could make the HMAC key available for Airflow to access. If we did that, we could build a Docker container that would be able to do the same HMAC hashing that the FxA pipeline does. It would pull the list of active IDs for the day via BQ query to the Stackdriver log tables, HMAC them, and write the results to S3 for Amplitude to ingest. We then might be able to reuse that pattern for some other tasks in the FxA metrics migration.
Assignee | ||
Comment 6•4 years ago
|
||
It may be worth at this point considering whether we could make the HMAC key available for Airflow to access. If we did that, we could build a Docker container that would be able to do the same HMAC hashing that the FxA pipeline does. It would pull the list of active IDs for the day via BQ query to the Stackdriver log tables, HMAC them, and write the results to S3 for Amplitude to ingest. We then might be able to reuse that pattern for some other tasks in the FxA metrics migration.
If having the HMAC available to Airflow is a blocker at all, we can push to pub/sub from BQ and use their existing pipeline to load the data. Who do we need to get permission from to enable that key use on our end?
Comment 7•4 years ago
|
||
If having the HMAC available to Airflow is a blocker at all, we can push to pub/sub from BQ and use their existing pipeline to load the data.
We have been essentially relying on that pattern for some other pieces of the FxA migration work. This may indeed be an option.
Who do we need to get permission from to enable that key use on our end?
jbuck may have some good context on that, and perhaps :rfkelly. I really don't know how to reason about the risk surrounding giving enhanced access to that key.
Comment 8•4 years ago
|
||
(In reply to Ryan Kelly [:rfkelly] from comment #4)
From my perspective, an ideal solution would send a single
activity
event per user at EOD, encompassing bothcert_signed
andtoken_created
events. I don't think we particularly need the ability to distinguish which event it was, only that some FxA-related activity was registered for that user no that day.
+1 we should definitely do this
Assignee | ||
Comment 9•4 years ago
|
||
From my perspective, an ideal solution would send a single
activity
event per user at EOD, encompassing bothcert_signed
andtoken_created
events. I don't think we particularly need the ability to distinguish which event it was, only that some FxA-related activity was registered for that user no that day.+1 we should definitely do this
Are we already getting the token_created
events from Stackdriver? If so that would be almost no additional work on top of what we're already looking at here.
Assignee | ||
Comment 10•4 years ago
|
||
If having the HMAC available to Airflow is a blocker at all, we can push to pub/sub from BQ and use their existing pipeline to load the data.
We have been essentially relying on that pattern for some other pieces of the FxA migration work. This may indeed be an option.
Who do we need to get permission from to enable that key use on our end?
jbuck may have some good context on that, and perhaps :rfkelly. I really don't know how to reason about the risk surrounding giving enhanced access to that key.
Ryan, do you have any context on the issues about making the HMAC key available to our Airflow instance?
Assignee | ||
Comment 11•4 years ago
|
||
Updating the bug title to more accurately represent current conversation.
Comment 12•4 years ago
|
||
(In reply to Frank Bertsch [:frank] from comment #9)
From my perspective, an ideal solution would send a single
activity
event per user at EOD, encompassing bothcert_signed
andtoken_created
events. I don't think we particularly need the ability to distinguish which event it was, only that some FxA-related activity was registered for that user no that day.+1 we should definitely do this
Are we already getting the
token_created
events from Stackdriver? If so that would be almost no additional work on top of what we're already looking at here.
Yes they are already there. We just ignore them when sending to amplitude currently, so we would just need to remove that filter.
Assignee | ||
Comment 13•4 years ago
|
||
Hey all, we're hoping to move quickly on this, so responses are appreciated. The current plan is the following:
- Create a new table, derived from the FxA data in BQ, that groups by user-days and filters to
cert_signed
andtoken_created
[0]. For every active day (derived from thetimestamp
field [1]), we will derived a single event for every user, with namefxa_activity - active
. We will omit the event propertiesoauth_client_id
andservice
from the events. - Create a job to send this data to an FxA vacuum, where it will be loaded into the FxA project. This requires working with Amplitude to get that set up.
- Once we've confirmed the user count numbers for the new
fx_activity - active
event, we can have the FxA pipeline stop sending thecert_created
event. There will be some overlap in time where both are sent, but that is acceptable from an analysis perspective.
Note: This plan can probably be acted on quickly but requires us to hash the user ids in the same way as the current FxA pipeline does. We are still waiting on confirmation from the FxA team on whether that is possible.
[0] This filtering isn't strictly required. We could use all events and send a single "activity" per-user per-day, encompassing any activity.
[1] Timestamp
is a bit nebulous. Looking over the tables, I see a timestamp
field, in addition to a receiveTimestamp
field. There is a slight delay from timestamp
-> recieveTimestamp
. Because the table is partitioned on timestamp
, I want to ensure we won't miss any activity when timestamp
and recieveTimestamp
occur on different dates, where a day boundary occurs between the two.
Comment 14•4 years ago
|
||
Couple questions -
-
There is also an
fxa_activity - access_token_checked
amplitude event. Can we add that to the list of events that get sent to the vacuum? This may not be strictly necessary, as most clients that generate this event also generatefxa_activity - access_token_created
, but I believe there are some cases where a client might only generate thechecked
event on a given day, which would cause them not to be counted towards DAU if we omit it. I will double check to see how often this happens (client sends one event but not the other). -
Is there a plan for adding back in the service and oauth client_id event properties? E.g. could we take the
set
of all the unique values associated with the activity events for a given user-day, and send them as arrays under theservice
/oauth_client_id
event properties for the rollup event? They are kind of important for segmenting DAU by service. The docs seem to indicate that this at least possible for the HTTP API: https://help.amplitude.com/hc/en-us/articles/204771828-HTTP-API (see the example forevent_properties
) So for example if a user generated a cert signed event for sync and an access token event for monitor we would do something like
{"service": ["sync", "fx-monitor"],"oauth_client_id":["802d56ef2a9af9fa"]}
(note the sync
service does not have an oauth_client_id
)
Assignee | ||
Comment 15•4 years ago
|
||
- There is also an
fxa_activity - access_token_checked
amplitude event. Can we add that to the list of events that get sent to the vacuum? This may not be strictly necessary, as most clients that generate this event also generatefxa_activity - access_token_created
, but I believe there are some cases where a client might only generate thechecked
event on a given day, which would cause them not to be counted towards DAU if we omit it. I will double check to see how often this happens (client sends one event but not the other).
Definitely. I mentioned in [0] that we could actually remove any filtering, so that this active
event would encompass any activity. I'm not sure how that would play with the event_properties discussed below, though (if e.g. those events are sending different services).
- Is there a plan for adding back in the service and oauth client_id event properties? E.g. could we take the
set
of all the unique values associated with the activity events for a given user-day, and send them as arrays under theservice
/oauth_client_id
event properties for the rollup event? They are kind of important for segmenting DAU by service. The docs seem to indicate that this at least possible for the HTTP API: https://help.amplitude.com/hc/en-us/articles/204771828-HTTP-API (see the example forevent_properties
) So for example if a user generated a cert signed event for sync and an access token event for monitor we would do something like
{"service": ["sync", "fx-monitor"],"oauth_client_id":["802d56ef2a9af9fa"]}
We can definitely add these back in. It sounds like we would aggregate all services and all oath_client_id, taking the unique set for each. Does this sound like the right approach, Leif?
Comment 16•4 years ago
|
||
I don't think we need to fret about the event_type -> service mappings, all services except sync can occur with both types of access_token
events and the vast majority of cert_signed
events are just sync. By aggregating the services and event properties we lose information about what event type was originally associated with what service, but that is not important for analysis.
Your intuition about how to aggregate the service and oauth_client_id fields is correct.
Comment 17•4 years ago
•
|
||
I was too fast to submit my last comment:
I think it would be useful to also aggregate the following user properties in a similar way:
sync_active_devices_*
(day, week, month), sync_device_count
.
If possible we should also aggregate fxa_services_used
and then update it using $postInsert
as documented here. Although :jbuck & :_6a68 - this postInsert
function is only available using the identify API - would that be a problem for us?
Ideally we would also do something similar for OS
, Language
, Country
but reading between the lines here it doesn't appear we'll be able to do that, which is a shame. It means that we will no longer be able to use the activity events to know which e.g. OS users were active on in a given day, e.g. if they were active on both mobile and desktop. We can still use other events to answer these types of questions, but its not ideal. Maybe as a fast follow we could introduce a new event property like os_used_on_day
or activity_event_os_array
and set-aggregate like above.
Comment 18•4 years ago
|
||
(In reply to Frank Bertsch [:frank] from comment #13)
- Once we've confirmed the user count numbers for the new
fx_activity - active
event, we can have the FxA pipeline stop sending thecert_created
event. There will be some overlap in time where both are sent, but that is acceptable from an analysis perspective.
I can stop the flow of the original events whenever - just need to change the filter being used on the FxA side
Note: This plan can probably be acted on quickly but requires us to hash the user ids in the same way as the current FxA pipeline does. We are still waiting on confirmation from the FxA team on whether that is possible.
I can provide the HMAC key to you, can you talk about access control once it's been loaded into the Airflow cluster? I know HMAC's are one-way, but if the key is only visible to ops folks that would be ideal.
[1]
Timestamp
is a bit nebulous. Looking over the tables, I see atimestamp
field, in addition to areceiveTimestamp
field. There is a slight delay fromtimestamp
->recieveTimestamp
. Because the table is partitioned ontimestamp
, I want to ensure we won't miss any activity whentimestamp
andrecieveTimestamp
occur on different dates, where a day boundary occurs between the two.
In fxa-amplitude-send we use the jsonPayload.Fields.time
field when sending data to Amplitude, which I think corresponds to the timestamp
field.
Assignee | ||
Comment 19•4 years ago
|
||
I can stop the flow of the original events whenever - just need to change the filter being used on the FxA side
Perfect, we'll plan on that once this work is ready.
I can provide the HMAC key to you, can you talk about access control once it's been loaded into the Airflow cluster? I know HMAC's are one-way, but if the key is only visible to ops folks that would be ideal.
Yes, it should be. You can get in contact with Harold (cc'ed him here) to get the key added to Airflow. Once there it is not even visible to admins if stored as a secret, and we can still pass it in as a param to the query.
In fxa-amplitude-send we use the
jsonPayload.Fields.time
field when sending data to Amplitude, which I think corresponds to thetimestamp
field.
Great, we'll continue to do this.
Assignee | ||
Comment 20•4 years ago
|
||
(In reply to Leif Oines [:loines] from comment #17)
I was too fast to submit my last comment:
I think it would be useful to also aggregate the following user properties in a similar way:
sync_active_devices_*
(day, week, month),sync_device_count
.
Leif, are these user properties filled in currently from the cert_signed
event? If so, we will indeed need to send those along with the events.
If possible we should also aggregate
fxa_services_used
and then update it using$postInsert
as documented here. Although :jbuck & :_6a68 - thispostInsert
function is only available using the identify API - would that be a problem for us?
We will be using what they call a "vacuum", which is essentially an uploaded CSV they they import. I'm not sure offhand what they do/do not support w.r.t. user properties, but we can request they make $postInsert
available there. When we reach out about creating this vacuum we can ask about those options.
Ideally we would also do something similar for
OS
,Language
,Country
but reading between the lines here it doesn't appear we'll be able to do that, which is a shame. It means that we will no longer be able to use the activity events to know which e.g. OS users were active on in a given day, e.g. if they were active on both mobile and desktop. We can still use other events to answer these types of questions, but its not ideal. Maybe as a fast follow we could introduce a new event property likeos_used_on_day
oractivity_event_os_array
and set-aggregate like above.
Are these questions that are already answered with the cert_signed
ping? If so we don't want to lose them. I believe we can do exactly what you mentioned earlier - take the unique set of e.g. OS'.
Comment 21•4 years ago
•
|
||
(In reply to Frank Bertsch [:frank] from comment #20)
(In reply to Leif Oines [:loines] from comment #17)
Leif, are these user properties filled in currently from the
cert_signed
event? If so, we will indeed need to send those along with the events.
Are these questions that are already answered with the
cert_signed
ping? If so we don't want to lose them. I believe we can do exactly what you mentioned earlier - take the unique set of e.g. OS'.
Yes, they are sent with the cert_signed event. You can use this biguery query as a reference for what is sent in the event_properties
and user_properties
fields. I believe the value for os is derived from the jsonPayload.fields.os_name
column. Country and Language are also there.
Edit: Note that we are using $append
for fxa_services_used
here but we should really be using $postInsert
per amplitude's advice (we just haven't made the change yet)
Comment 22•4 years ago
|
||
Ryan, do you have any context on the issues about making the HMAC key available to our Airflow instance?
:jbuck will have better context on this than I do. My main question is, who has the ability to calculate HMACs using this key? (Which is a slightly different question to "who has the ability to access this key?"). The threats to be concerned about here are:
- Given a raw FxA userid, who is able to calculate the corresponding hashed userid in amplitude?
- Given a hashed userid from amplitude, who is able to try to brute-force-guess the corresponding raw FxA userid?
Ideally the answer to both of these questions is "only a restricted set of operational staff at Mozilla". I've no objection to making that set bigger, but I wouldn't want to allow e.g. anyone at Mozilla to calculate HMACs using this key.
I believe there are some cases where a client might only generate the checked event on a given day, which would
cause them not to be counted towards DAU if we omit it. I will double check to see how often this happens
This definitely happens, because some of our OAuth tokens live for longer than 1 day. Including checked
sounds valuable to me.
My opinions on aggregating services used etc are accurately represented by Leif's comments above, so I won't repeat any of it here apart from "+1".
Assignee | ||
Comment 23•4 years ago
|
||
Ryan, the usual path is we create a table that has the exact events we want to send to Amplitude. Currently that means anyone with access to Telemetry data will have access to both the unhashed and hashed userids; however no link between them. We could lock down the hashed userids table, if that would alleviate any issues on your end.
The HMAC should only be available to Airflow jobs and ops.
Comment 24•4 years ago
|
||
We could lock down the hashed userids table, if that would alleviate any issues on your end.
If this is feasible to lock down that table, please do so. Thanks!
Comment 25•4 years ago
|
||
Currently that means anyone with access to Telemetry data will have access to both the unhashed and hashed userids
Unhashed FxA user IDs do not exist anywhere in telemetry data. The existing imports of FxA data that we do via Airflow read from fxa-prod project (which has the unhashed IDs) but hash the IDs as part of the query so that the resulting tables that live in the shared-prod
project do not contain raw FxA IDs.
HMAC-hashed FxA UIDs already exist in shared-prod
as they are passed in the sync
ping (it's unclear whether these are hashed with the same key as the events sent to Amplitude).
So, I don't see any issue with telemetry users having access to the HMAC-hashed UIDs.
Assignee | ||
Comment 26•4 years ago
|
||
Unhashed FxA user IDs do not exist anywhere in telemetry data. The existing imports of FxA data that we do via Airflow read from fxa-prod project (which has the unhashed IDs) but hash the IDs as part of the query so that the resulting tables that live in the
shared-prod
project do not contain raw FxA IDs.
Thanks for clarifying that, Jeff. This also means we can't use those tables for the Amplitude import. Given this situation I agree that limiting access to the hashed data isn't a big concern.
Updated•4 years ago
|
Comment 27•4 years ago
•
|
||
Frank and I met today and I agreed to provide a spec for how we should 1. aggregate the user and event properties for the rollup event 2. which operations we should use when sending the event to amplitude. Here goes
name | event or user property | aggregation | special amplitude operation (if needed) |
---|---|---|---|
service | event | array | none |
oauth_client_id | event | array | none |
fxa_services_used | user | array | $postInsert (we are changing this from $append) |
sync_device_count | user | max | none |
sync_active_devices_day | user | max | none |
sync_active_devices_week | user | max | none |
sync_active_devices_month | user | max | none |
OS (os_name in the logs)* | user | none | |
OS Version (os_version in the logs)* | user | none | |
Language* | user | mode | none |
ua_version | user | none | |
ua_browser | user | none | |
Version (app_version in the logs, this is the version of the FxA server) | user | max | none |
Country and Region* | user | none |
edited to reflect comments below.
*I'm unsure if we can actually send these properties to amplitude as arrays. lmk if that ends up being a problem. i guess if we can use mode if we don't have much of a choice.
I think that's all of the properties that are relevant to the fxa_activity - *
events. As I said, we should work under the assumption that all event types can take all of these properties, even if that's not true at the moment (many of them will sometimes be null). I also believe that if you don't specify an operation then it defaults to $set
, which is what we want, but maybe we should verify that.
Assignee | ||
Comment 28•4 years ago
|
||
Thanks for providing that list, Leif. What is currently done for os
, os_version
, language
, and country
/region
? Are they currently just set as the latest value from that user?
Comment 29•4 years ago
|
||
Yes they are. So i suppose you're right, it doesn't makes sense to send those as an array. Amplitude does the magic of pulling the correct value for the time interval of your chart. Since we are sending just one event per day now, there will be no way to establish multiple values of those properties per user. I guess that means we use mode for those. I'll edit my chart to reflect this.
Assignee | ||
Comment 30•4 years ago
|
||
Leif, why don't we add an array version of said fields as well? We can use the set append operation.
Comment 31•4 years ago
|
||
That works for me. Could be something along the lines of e.g. os_used_day
Assignee | ||
Comment 32•4 years ago
|
||
I've gotten confirmation from Amplitude that we can use the entire identity API capabilities with the vacuum ingestion system, so the above user props should be no issue. Here's what's left to do:
- :jbuck to give :hwoo the HMAC key, who will make it available to Airflow jobs
- :frank to write the export job for events and user properties
- Amplitude needs to add that vacuum endpoint to the FxA project
It will be good to first test this on the dev FxA project.
Comment 33•4 years ago
|
||
I have sent the HMAC keys for stage and prod to :hwoo
Comment 34•4 years ago
|
||
added to airflow vars as fxa_amplitude_hmac_secret_key_*
Assignee | ||
Comment 35•4 years ago
|
||
Draft PR for what we'd be exporting to Amplitude is available here.
Assignee | ||
Comment 36•4 years ago
|
||
I'm noticing what may be surprises, and I want to check in with the FxA folks:
Ryan:
~6.6% of user don't report any cert_signed
events, but do report a access_token_checked
or access_token_created
. Is this expected? I was under the impression that right now all users were sending cert_signed
. Are these users counted in Amplitude through some other event?
Leif:
Of the users with no cert_signed
events, their user_properties
are missing all fields except fxa_services_used
. Any idea how we want to handle this? Should we try and get this added? For now we could send null
for those properties.
Comment 37•4 years ago
|
||
Its definitely NOT the case that all users will generate cert signed. Users of Sync and maybe a small number of other services do, but the rest will generate only the oauth access_token
events. For the purposes of MAU/DAU we count users who generate ANY fxa_activity - *
event (there is a "derived" event within amplitude that lumps all of these together).
Not all of the user properties make sense for services that generate the access_token
events. For example, if a user uses monitor and NOT sync, the sync_active_devices
properties should not be set at all (this is only a property of sync users). Once cert_signed
goes away however, FxA WILL need to migrate those sync-specific user properties to be set by the access_token
events. So for now, I think we should allow either event type to set the properties, but also allow the properties to be null (if we don't set the property for a given event, amplitude will continue to use the most recent value for that property, which is fine).
Comment 38•4 years ago
|
||
~6.6% of user don't report any cert_signed events, but do report a access_token_checked or access_token_created. Is this expected?
This sounds about right to be (assuming that it's looking at all users from all FxA-related products, many of which don't generate any cert_signed
events).
Assignee | ||
Comment 39•4 years ago
|
||
Hi all, we've successfully launched the pipeline and I am testing data in the FxAccts_Dev
project. I will be loading one day of fxa_activity - active
and $identify
events. I've noticed the client count numbers may end up being slightly higher than what we're currently seeing in Amplitude, so we may need to take a look at which events are causing that.
In addition to the fields that Leif laid out in comment 27, we've added os_used_week
and os_used_month
. These are aggregated on our end, and it is straightforward to add more user properties that are aggregated in a similar way across various time periods.
Comment 40•4 years ago
|
||
Comment 41•4 years ago
|
||
Thanks so much for all your work on this Frank, here's what I'm noticing:
-
The "official" amplitude user properties except for User ID (see attached screenshot to see what I'm referring to) are null. However I am seeing non-null values for custom event properties
LANGUAGE
,country
,app_version
(the latter looks like we should just use the officialVersion
property). I am also seeinguser_country
,user_locale
etc but they are all null. For user properties that we are not using array-agg on, is it possible to start using the "official" versions? -
I'm also seeing user properties
fxa_uid
,fxa_uid.data
andfxa_uid.type
, I'm not sure what those are (possibly some of these properties are just an artifact of your testing in which case feel free to ignore me). -
The aggregated
os_used
andsync_devices_used
,fxa_services_used
properties seem to be working, great! -
I queried the auth server logs for 2020-04-23 for COUNT(DISTINCT user_id) and got a number that was 1.08% higher than amplitude is showing. I cast the timezone to be PDT to match what amplitude uses. PM me on slack if you want the query/raw numbers. Unsure how I would follow up on this though, maybe you have ideas.
Comment 42•4 years ago
|
||
Note that I'm having the same problem with custom vs. "official" properties right now in trying to implement sync send_tab events. I'm chatting with Amplitude folks and we can hopefully apply the same solution there and here.
Assignee | ||
Comment 43•4 years ago
|
||
- The "official" amplitude user properties except for User ID (see attached screenshot to see what I'm referring to) are null. However I am seeing non-null values for custom event properties
LANGUAGE
,country
,app_version
(the latter looks like we should just use the officialVersion
property). I am also seeinguser_country
,user_locale
etc but they are all null. For user properties that we are not using array-agg on, is it possible to start using the "official" versions?
Let's see what happens with the other import, but we should be able to move those to top-level columns as we do for e.g. the Fenix import to get them available.
- I'm also seeing user properties
fxa_uid
,fxa_uid.data
andfxa_uid.type
, I'm not sure what those are (possibly some of these properties are just an artifact of your testing in which case feel free to ignore me).
I bet those are from some historical data in FxAccts_Dev
. Do they have the associated fxa_activity - active
events?
- The aggregated
os_used
andsync_devices_used
,fxa_services_used
properties seem to be working, great!
Great!
- I queried the auth server logs for 2020-04-23 for COUNT(DISTINCT user_id) and got a number that was 1.08% higher than amplitude is showing. I cast the timezone to be PDT to match what amplitude uses. PM me on slack if you want the query/raw numbers. Unsure how I would follow up on this though, maybe you have ideas.
There may be something odd going on around timestamps. I use a UTC 00:00:00 timestamp to load the data, it looks like I should be using a PDT one? That may help make the data match.
Comment 44•4 years ago
|
||
(In reply to Frank Bertsch [:frank] from comment #43)
I bet those are from some historical data in
FxAccts_Dev
. Do they have the associatedfxa_activity - active
events?
Ah yep, I think that's right.
There may be something odd going on around timestamps. I use a UTC 00:00:00 timestamp to load the data, it looks like I should be using a PDT one? That may help make the data match.
Actually, maybe what happened here is that you loaded the data from 2020-04-24 relative to UTC, timestamped it as 2020-04-24:00:00:00 but that ended up getting shifted to 2020-04-23:17:00:00 when displayed in amplitude, since the FxA project is set to be relative to PDT (I wish we would change this tbh, but I think too many people are used to it now). When I look at the numbers from the 24th relative to utc from the server logs, I get a closer number off only by +0.008% which I think would be good enough for government work.
Assignee | ||
Comment 45•4 years ago
|
||
Hey all, we fixed the tz offset and the data is now loaded in the correct day. Amplitude has confirmed that we need top-level properties for their "official" user properties, so we'll update that and then do a small test against prod, to confirm that user ids are matching. I'm planning on sending just a few users (O(10)) to prod to see that they already exist there. If that works, we should be good to open the gates on the new events and deprecate the cert_signed
and co. events.
Comment 46•4 years ago
|
||
Comment 47•4 years ago
|
||
Assignee | ||
Comment 48•4 years ago
|
||
Still waiting on final verification from Amplitude about the version
top-level field. Until then we've also updated our ETL to use pacific-based days rather than UTC. I'll need to backfill the dataset and then test against the FxA Dev project again. Once we're happy with those we'll be ready to send these events to prod.
Assignee | ||
Comment 49•4 years ago
|
||
We have updated the config and successfully added the Amplitude version
property. We are ready to ingest into prod.
Assignee | ||
Comment 50•4 years ago
|
||
We've deployed the change to prod and we are currently ingesting both the new fxa_activity - active
event, as well as the old events we will be replacing. We have two days of data in, an initial comparison can be found here.
Assignee | ||
Comment 51•4 years ago
|
||
Alex, Leif, I want to get sign-off from you both before we pull the plug on the cert_signed
and oauth access_token
events. If you have any questions or run into issues, let me know.
Comment 52•4 years ago
|
||
Looking good to me so far. I was thinking of maybe keeping the old events through the weekend to see if the weekend dip in DAU was substantially different from what we'd seen in the past, but maybe that's not necessary. If alex is ok with pulling the plug on the old events earlier then that's fine with me.
Frank, it was my understanding that FxA (:jbuck) would have to do this? Or were you going to do it on your end? Doesn't matter who does it, just want to make sure we're on the same page.
Assignee | ||
Comment 53•4 years ago
|
||
Frank, it was my understanding that FxA (:jbuck) would have to do this? Or were you going to do it on your end? Doesn't matter who does it, just want to make sure we're on the same page.
You are correct, :jbuck will need to turn them off. He indicated it's fast and easy on his end, probably updating that config you pointed me to.
Assignee | ||
Comment 54•4 years ago
|
||
Leif, if you and Alex are okay with turning off the old events sooner rather than later, we can always do future comparison analysis on the BQ data. If there is a serious issue we can also backfill.
Comment 55•4 years ago
|
||
Let's go ahead and turn off the old events. Things look good on my end and I think they looked good to alex yesterday.
Assignee | ||
Comment 56•4 years ago
|
||
Great. Jbuck, can you do the honors? We need to disable the "fxa_activity - cert_signed", "fxa_activity - access_token_checked", and "fxa_activity - access_token_created" events.
Assignee | ||
Updated•4 years ago
|
Comment 57•4 years ago
|
||
New filter has been applied in production: https://github.com/mozilla-services/cloudops-infra/pull/2147
Assignee | ||
Comment 58•4 years ago
|
||
New filter has been applied in production: https://github.com/mozilla-services/cloudops-infra/pull/2147
We can close this out! New events are flowed in daily and we've cut off the old ones.
Comment 59•4 years ago
•
|
||
Looking at the data in amplitude, I am a little concerned that we might not be de-duplicating these correctly: we have recorded a large number of fxa_activity - active
events in the past 30 days. Reopening to investigate further
Updated•4 years ago
|
Assignee | ||
Comment 60•4 years ago
|
||
(In reply to Jared Hirsch [:_6a68] [:jhirsch] (Needinfo please) from comment #59)
Looking at the data in amplitude, I am a little concerned that we might not be de-duplicating these correctly: we have recorded a large number of
fxa_activity - active
events in the past 30 days. Reopening to investigate further
Current event count looks correct for 1-event per-user per-day. Divide the total by 30 to get ~DAU for FxA. Let me know if I'm missing something.
Comment 62•4 years ago
|
||
If it helps, here are the total events I see in Amplitude. I see the drop:
https://analytics.amplitude.com/mozilla-corp/chart/new/rtsvgi6
Comment 63•4 years ago
|
||
That is indeed helpful. Thanks, Alex!
Description
•