Closed Bug 1852630 Opened 1 year ago Closed 1 year ago

Rename main remainder v4 to main v5

Categories

(Data Platform and Tools :: General, task)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: relud, Assigned: relud)

Details

Attachments

(4 files)

because :rmiller does not want to use the name main remainder

also includes renaming saved-session-remainder and first-shutdown-remainder similarly.

changing the names of the tables will cause automation to delete the tables named "remainder" and point default views at the new v5 tables. deleting the tables is fine, as downstream etl has not yet been migrated. i will manually override the default views in bigquery-etl to continue pointing at the v4 tables until I can confirm all use counter analysis is migrated to use counter tables.

additionally, i will take advantage of this opportunity to remove 5 histograms from the remainder tables that were moved to use counter tables.

ni :rmiller to elaborate on the why part of this

Flags: needinfo?(rmiller)

Since this involves two special cases (table deletes and main ping clones) DSRE needs to be involved in the deployment here. I know that when we deployed the original main ping clones there was some issue with schemas updates that I will try to track down the exact steps for. I may not have documented them well since it seemed unlikely that we'd be creating even more main ping clones, but here we are.

main_v5 seems like the right choice for such a significant change to our main ping, but i'm not attached to it if that name in particular introduces problems.

As far as creating more main ping clones... this isn't intended to be a clone, this is intended to (eventually, relatively soon) replace the existing main ping altogether. The original plan was to split the use counters out from telemetry.main into their own separate ping, but given that we've recently learned that our use counter data is mostly useless (irony alert!), and that I hear murmurings of taking this opportunity to switch to using Glean for use counters, we may be able to delete that portion of the data altogether.

But that's out of scope for this bug. If main_v5 isn't a viable option, I'd settle for main_core.

Flags: needinfo?(rmiller)

I think this series of PRs as structured will remove the old tables at the same time as the new tables are created. This isn't something that can be safely propagated through the standard deployment pipeline, so we need to we employ some state rm operations to make this work. Since we're deleting tables anyway https://mozilla-hub.atlassian.net/browse/DSRE-125 applies and manual operations will be necessary. Here's the deploy plan for tomorrow:

  1. Disable automation
  2. Merge https://github.com/mozilla-services/mozilla-pipeline-schemas/pull/785
  3. Merge https://github.com/mozilla/mozilla-schema-generator/pull/252/files
    Allows remainder pings to be deleted
  4. Run mozilla-schema-generator from https://workflow.telemetry.mozilla.org/dags/probe_scraper/grid
    Updates https://github.com/mozilla-services/mozilla-pipeline-schemas/tree/generated-schemas
  5. In a local checkout of generated-schemas, minify the schema so that initial table deploy will succeed:
#!/bin/bash
cd schemas/telemetry
for i in {main,first-shutdown,saved-session}/*.5.bq; do
  echo $i
  jq -c -r < $i > ${i}.mini
  mv $i ${i}.full
  mv ${i}.mini $i
done
  1. In stage, terraform state rm relevant resources:
terraform state rm 'module.bigquery.google_bigquery_table_iam_binding.namespace_table_acls["telemetry/live/main-remainder/4/roles/bigquery.dataViewer"]'
terraform state rm 'module.bigquery.google_bigquery_table_iam_binding.namespace_table_acls["telemetry/stable/main-remainder/4/roles/bigquery.dataViewer"]'
terraform state rm 'module.bigquery.google_bigquery_table.namespace_tables["telemetry/live/first-shutdown-remainder/4"]'
terraform state rm 'module.bigquery.google_bigquery_table.namespace_tables["telemetry/live/main-remainder/4"]'
terraform state rm 'module.bigquery.google_bigquery_table.namespace_tables["telemetry/live/saved-session-remainder/4"]'
terraform state rm 'module.bigquery.google_bigquery_table.namespace_tables["telemetry/stable/first-shutdown-remainder/4"]'
terraform state rm 'module.bigquery.google_bigquery_table.namespace_tables["telemetry/stable/main-remainder/4"]'
terraform state rm 'module.bigquery.google_bigquery_table.namespace_tables["telemetry/stable/saved-session-remainder/4"]'
terraform state rm 'module.bigquery.google_bigquery_table.payload_bytes_decoded["telemetry/telemetry/first-shutdown-remainder/4"]'
terraform state rm 'module.bigquery.google_bigquery_table.payload_bytes_decoded["telemetry/telemetry/main-remainder/4"]'
terraform state rm 'module.bigquery.google_bigquery_table.payload_bytes_decoded["telemetry/telemetry/saved-session-remainder/4"]'
terraform state rm 'module.bigquery.null_resource.null_resource_schemas["telemetry/live/main-remainder/4"]'
terraform state rm 'module.bigquery.null_resource.null_resource_schemas["telemetry/stable/main-remainder/4"]'
  1. Locally deploy the new v5 pings via 'https://github.com/mozilla-services/cloudops-infra/pull/5137/commits/d7da68b500a022e807621da87c206797589abc70' and the minified schemas:

Since this is reusing main (honestly, what are the odds we would need to control schema deployment methodology at docversion granularity?), targeted applies will need to be run to avoid attempting to deploy standard main pings via schemas:

terraform apply -target='module.bigquery.google_bigquery_table.namespace_tables["telemetry/live/main/5"]'
terraform apply -target='module.bigquery.google_bigquery_table.namespace_tables["telemetry/live/first-shutdown/main/5"]'
terraform apply -target='module.bigquery.google_bigquery_table.namespace_tables["telemetry/live/saved-session/main/5"]'
terraform apply -target='module.bigquery.google_bigquery_table.namespace_tables["telemetry/stable/main/5"]'
terraform apply -target='module.bigquery.google_bigquery_table.namespace_tables["telemetry/stable/first-shutdown/main/5"]'
terraform apply -target='module.bigquery.google_bigquery_table.namespace_tables["telemetry/stable/saved-session/main/5"]'
  1. Deploy the null_resource schemas and other remaining pieces via https://ops-master.jenkinsv2.prod.mozaws.net/job/gcp-pipelines/job/data-shared/job/bigquery-stage/ from the head of https://github.com/mozilla-services/cloudops-infra/pull/5137

  2. Once complete the above should propagate to beam-stage and ingestion-sink-stage

  3. Repeat steps 6-9 above for prod

  4. Merge https://github.com/mozilla/bigquery-etl/pull/4276 / https://github.com/mozilla/bigquery-etl/pull/4353 / https://github.com/mozilla/telemetry-airflow/pull/1825 to update various ETL to use the new tables

  5. bq rm the remainder tables once ingestion systems are guaranteed to be done processing remainder pings:

for project in moz-fx-data-shar-nonprod-efed moz-fx-data-shared-prod; do
  for doctype in main saved_session first_shutdown ; do
    bq --project_id $project rm payload_bytes_decoded.telemetry_telemetry__${doctype}_remainder_v4
    for dataset_type in live stable; do 
      bq --project_id $project rm telemetry_${dataset_type}.${doctype}_remainder_v4
    done
  done
done
  1. merge cloudops-infra PR above and re-enable automation

Steps 1-10 above are complete, though propagation to beam/sink probably won't complete by UTC 28th, so the first full day with main v4 parity will be the 29th.

I am working on getting CI to pass for step 11, will update when those are merged or ready to merge.

I forgot that production schemas deploys also kick off a views deployment, which because https://github.com/mozilla/bigquery-etl/pull/4276 hasn't landed might have attempted to update the telemetry.main views. I've cancelled the job and disabled deploys for now. It looks like the job didn't update moz-fx-data-shared-prod.telemetry.main or mozdata.telemetry.main and we might even have special cases for those views anyway. I'll re-enable the job once bqetl PRs have been merged or otherwise disable views deployment temporarily in https://github.com/mozilla-services/cloudops-infra/pull/5137.

step 11 is complete

I deleted the tables in step 12 and the PR will land today so I think this rename is complete.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: