Open Bug 1741487 Opened 3 years ago Updated 2 years ago

Update stable table view generation to rename url2 -> url and text2 -> text, etc.

Categories

(Data Platform and Tools :: General, enhancement, P3)

enhancement

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: klukas, Assigned: akomar)

References

Details

(Whiteboard: [dataplatform])

Attachments

(3 files)

See https://bugzilla.mozilla.org/show_bug.cgi?id=1737656#c13 for more detail.

Many Glean ping stable tables contain incorrect fields metrics.url, metrics.text, metrics.jwe, and metrics.labeled_rate.

We can remove these from user-facing views by updating the generate_views machinery in bigquery-etl. There's already a similar special case for cleaning fenix metrics specifically and the logic for this would be similar.

Logic would basically be:

  • If there is a field with any of the above names, remove it
  • If a field metrics.url2 exists, rename it to metrics.url, same for the other 3 types

Given that url2 and text2 fields have existed for a while now in some rally pings, we likely should implement this in two steps. In the first step, we would include the contents of metrics.url2 in both the metrics.url2 and metrics.url positions, so that the metrics can be accessed using either name. Then, we'll announce the change to rally folks and make sure they have a bit of time to adapt before removing metrics.url2 since they may have existing queries depending on that name.

Assignee: nobody → fbertsch
Priority: P2 → P1
Whiteboard: [data-platform-infra-wg] → [dataplatform]
Assignee: fbertsch → nobody
Priority: P1 → P3

Given that url2 and text2 fields have existed for a while now in some rally pings, we likely should implement this in two steps. In the first step, we would include the contents of metrics.url2 in both the metrics.url2 and metrics.url positions, so that the metrics can be accessed using either name. Then, we'll announce the change to rally folks and make sure they have a bit of time to adapt before removing metrics.url2 since they may have existing queries depending on that name.

:whd how are things with Rally data, is anyone still using it? Do you think we still need to follow the original plan outlined above?

Flags: needinfo?(whd)
Assignee: nobody → akomarzewski
Status: NEW → ASSIGNED

(In reply to Arkadiusz Komarzewski [:akomar] from comment #2)

Given that url2 and text2 fields have existed for a while now in some rally pings, we likely should implement this in two steps. In the first step, we would include the contents of metrics.url2 in both the metrics.url2 and metrics.url positions, so that the metrics can be accessed using either name. Then, we'll announce the change to rally folks and make sure they have a bit of time to adapt before removing metrics.url2 since they may have existing queries depending on that name.

:whd how are things with Rally data, is anyone still using it? Do you think we still need to follow the original plan outlined above?

It looks like we should follow the original plan here since url metric is already being used in some applications and url2 field exists in their respective schemas:

➜  mozilla-pipeline-schemas git:(generated-schemas) ✗ grep -r "url2" schemas
schemas/org-mozilla-ios-firefox/topsites-impression/topsites-impression.1.bq:        "name": "url2",
schemas/mdn-yari/page/page.1.bq:        "name": "url2",
schemas/mdn-yari/action/action.1.bq:        "name": "url2",
schemas/org-mozilla-ios-firefoxbeta/topsites-impression/topsites-impression.1.bq:        "name": "url2",
schemas/org-mozilla-firefox/metrics/metrics.1.bq:        "name": "url2",
schemas/org-mozilla-firefox/topsites-impression/topsites-impression.1.bq:        "name": "url2",
schemas/org-mozilla-firefox/cookie-banner-report-site/cookie-banner-report-site.1.bq:        "name": "url2",
schemas/org-mozilla-focus-beta/cookie-banner-report-site/cookie-banner-report-site.1.bq:        "name": "url2",
schemas/rally-attention-stream/youtube-video-recommendations/youtube-video-recommendations.1.bq:        "name": "url2",
schemas/rally-attention-stream/youtube-ads/youtube-ads.1.bq:        "name": "url2",
schemas/rally-attention-stream/user-journey/user-journey.1.bq:        "name": "url2",
schemas/rally-attention-stream/youtube-video-details/youtube-video-details.1.bq:        "name": "url2",
schemas/rally-attention-stream/advertisements/advertisements.1.bq:        "name": "url2",
schemas/rally-attention-stream/article-contents/article-contents.1.bq:        "name": "url2",
schemas/rally-attention-stream/tracking-pixel/tracking-pixel.1.bq:        "name": "url2",
schemas/org-mozilla-focus-nightly/cookie-banner-report-site/cookie-banner-report-site.1.bq:        "name": "url2",
schemas/org-mozilla-focus/cookie-banner-report-site/cookie-banner-report-site.1.bq:        "name": "url2",
schemas/pine/metrics/metrics.1.bq:        "name": "url2",
schemas/org-mozilla-fennec-aurora/metrics/metrics.1.bq:        "name": "url2",
schemas/org-mozilla-fennec-aurora/topsites-impression/topsites-impression.1.bq:        "name": "url2",
schemas/org-mozilla-fennec-aurora/cookie-banner-report-site/cookie-banner-report-site.1.bq:        "name": "url2",
schemas/org-mozilla-klar/cookie-banner-report-site/cookie-banner-report-site.1.bq:        "name": "url2",
schemas/firefox-desktop/metrics/metrics.1.bq:        "name": "url2",
schemas/rally-markup-fb-pixel-hunt/fbpixelhunt-journey/fbpixelhunt-journey.1.bq:        "name": "url2",
schemas/rally-markup-fb-pixel-hunt/fbpixelhunt-event/fbpixelhunt-event.1.bq:        "name": "url2",
schemas/rally-markup-fb-pixel-hunt/fbpixelhunt-pixel/fbpixelhunt-pixel.1.bq:        "name": "url2",
schemas/org-mozilla-firefox-beta/metrics/metrics.1.bq:        "name": "url2",
schemas/org-mozilla-firefox-beta/topsites-impression/topsites-impression.1.bq:        "name": "url2",
schemas/org-mozilla-firefox-beta/cookie-banner-report-site/cookie-banner-report-site.1.bq:        "name": "url2",
schemas/org-mozilla-fenix/metrics/metrics.1.bq:        "name": "url2",
schemas/org-mozilla-fenix/topsites-impression/topsites-impression.1.bq:        "name": "url2",
schemas/org-mozilla-fenix/cookie-banner-report-site/cookie-banner-report-site.1.bq:        "name": "url2",
schemas/org-mozilla-ios-fennec/topsites-impression/topsites-impression.1.bq:        "name": "url2",
schemas/org-mozilla-fenix-nightly/metrics/metrics.1.bq:        "name": "url2",
schemas/org-mozilla-fenix-nightly/topsites-impression/topsites-impression.1.bq:        "name": "url2",
schemas/org-mozilla-fenix-nightly/cookie-banner-report-site/cookie-banner-report-site.1.bq:        "name": "url2",

The next steps seem to be:

  1. Modify the view generation logic to automatically alias metrics.url2 field as metrics.url, same for the other fields (metrics.text, metrics.jwe, and metrics.labeled_rate). This will result in both those fields returning exactly the same values. This should allow us downstream impact since the url field should not be currently used.
  2. Announce the deprecation of url2 field and allow for anything using it to update its references to use url (and other) field(s) instead.
  3. Remove the fields with suffix 2 from the view.
Flags: needinfo?(whd)

The above PR addresses point 1 from Comment 3.

We will wait until the above PR is merged and confirm that the change is working as intended. Once we're confident everything is in order, we will proceed to move forward with step 2. of the above outlined plan.

PR opened to carry out step one of the above outlines plan:
https://github.com/mozilla/bigquery-etl/pull/4029

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: