Closed Bug 1857273 Opened 9 months ago Closed 7 months ago

Further ETL Work for Messaging System Reinstrumentation

Categories

(Data Platform and Tools :: General, task, P1)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: chutten, Assigned: mwilliams)

References

Details

In addition to the work done in https://mozilla-hub.atlassian.net/browse/DENG-1207 to provide Glean-fueled firefox_desktop.onboarding_v2 and firefox_desktop.snippets_v2 tables which can supply now-validated data to the messaging_system.onboarding and messaging_system.snippets views, we have found that there appears to be quite a lot of ETL done not against the messaging_system.onboarding view, but directly against the messaging_system.onboarding_v1 table: (data catalogue link)

This wouldn't normally stand in the way of being able to migrate analyses and their authors to the new data, but some of that downstream ETL involves some non-trivial derivation (I'm looking at you: Event Counts Explore).

This bug is for investigating the current state of messaging_system ETL (specifically onboarding, as no one's really using snippets right now. But snippets could probably use an eyeball, too) and finding out how to best provide ETL to support existing analysis use cases. And then for enacting that best option.

We hope that we can rederive the downstream ETL against the onboarding view instead of directly on the onboarding_v1 table. That way we can then update the view to use the new onboarding_v2 data and no downstream anaylses will need to be updated (but we will still notify the authors of the change). That'd be nifty.

This looks reasonable based on conversations with relud and Glenda, and some investigation.

Broad status is that we already have firefox_desktop.onboarding_v2 and its view firefox_desktop.onboarding, and they match the messaging_system counterparts, so we should be safe to migrate things that point to the old view to the new view instead.

Additionally, the only change the view makes vs the table is to do some renaming / simplification to the structure of the metadata field, and looking into the queries that create the views that derive from the table shows that they seem to not be affected.

Update after merging ETL changes:

Good

  • the looker view onboarding_v1 (which drives the User Journey / Event Counts explore) looks good to me
  • normalized_onboarding_events view (using the new data but still in the messaging_system_derived dataset) also looks good

Bad

  • onboarding_users_daily_v1 table looks like it's still on the old data
  • review_checker_microsurvey_v1 table looks like it's still on the old data

Still investigating the latter two tables.

I do see various user_journey/event_counts-based looks now using firefox_desktop.onboarding, and the main thing I noticed was the lack of sample_id, but removing that filter gets things working.

:mardak -- Thanks for raising this. The sample_id is derived from client_id, which is not available here, so the lack of sample_id is expected here.

If this is something you're interested in adding, we'd need to identify an alternative field from which to derive the sample_id, and then file a ticket for Glean to add the feature to configure this field.

This has now been live for 2 weeks without any other negative feedback noted, so I'm going to mark this as resolved. Any future issues can be dealt with separately via new bugs.

Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.