Further ETL Work for Messaging System Reinstrumentation
Categories
(Data Platform and Tools :: General, task, P1)
Tracking
(Not tracked)
People
(Reporter: chutten, Assigned: mwilliams)
References
Details
In addition to the work done in https://mozilla-hub.atlassian.net/browse/DENG-1207 to provide Glean-fueled firefox_desktop.onboarding_v2
and firefox_desktop.snippets_v2
tables which can supply now-validated data to the messaging_system.onboarding
and messaging_system.snippets
views, we have found that there appears to be quite a lot of ETL done not against the messaging_system.onboarding
view, but directly against the messaging_system.onboarding_v1
table: (data catalogue link)
This wouldn't normally stand in the way of being able to migrate analyses and their authors to the new data, but some of that downstream ETL involves some non-trivial derivation (I'm looking at you: Event Counts Explore).
This bug is for investigating the current state of messaging_system ETL (specifically onboarding, as no one's really using snippets right now. But snippets could probably use an eyeball, too) and finding out how to best provide ETL to support existing analysis use cases. And then for enacting that best option.
We hope that we can rederive the downstream ETL against the onboarding view instead of directly on the onboarding_v1
table. That way we can then update the view to use the new onboarding_v2
data and no downstream anaylses will need to be updated (but we will still notify the authors of the change). That'd be nifty.
Assignee | ||
Comment 1•9 months ago
|
||
This looks reasonable based on conversations with relud and Glenda, and some investigation.
Broad status is that we already have firefox_desktop.onboarding_v2
and its view firefox_desktop.onboarding
, and they match the messaging_system
counterparts, so we should be safe to migrate things that point to the old view to the new view instead.
Additionally, the only change the view makes vs the table is to do some renaming / simplification to the structure of the metadata field, and looking into the queries that create the views that derive from the table shows that they seem to not be affected.
Assignee | ||
Comment 2•7 months ago
|
||
Update after merging ETL changes:
Good
- the looker view
onboarding_v1
(which drives the User Journey / Event Counts explore) looks good to me normalized_onboarding_events
view (using the new data but still in the messaging_system_derived dataset) also looks good
Bad
onboarding_users_daily_v1
table looks like it's still on the old datareview_checker_microsurvey_v1
table looks like it's still on the old data
Still investigating the latter two tables.
Comment 3•7 months ago
|
||
I do see various user_journey/event_counts
-based looks now using firefox_desktop.onboarding
, and the main thing I noticed was the lack of sample_id
, but removing that filter gets things working.
Assignee | ||
Comment 4•7 months ago
|
||
:mardak -- Thanks for raising this. The sample_id
is derived from client_id
, which is not available here, so the lack of sample_id
is expected here.
If this is something you're interested in adding, we'd need to identify an alternative field from which to derive the sample_id
, and then file a ticket for Glean to add the feature to configure this field.
This has now been live for 2 weeks without any other negative feedback noted, so I'm going to mark this as resolved. Any future issues can be dealt with separately via new bugs.
Description
•