Closed Bug 1671933 (mozdata-project) Opened 4 years ago Closed 4 years ago

[Meta] mozdata project as new home for user queries

Categories

(Data Platform and Tools Graveyard :: Operations, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: klukas, Assigned: klukas)

References

Details

Tracking bug for implementing the Project Layout for BigQuery Users (mozdata) proposal.

Alias: mozdata-project
Depends on: 1671934
Depends on: 1688341

Here's a query that compares view listings in mozdata to derived-datasets:

EXECUTE IMMEDIATE
  (
  SELECT
    "create temp table project_derived AS " || STRING_AGG(FORMAT("""select * from `moz-fx-data-derived-datasets`.%s.INFORMATION_SCHEMA.TABLES""", schema_name), "\nunion all\n")
  FROM
    `moz-fx-data-derived-datasets`.INFORMATION_SCHEMA.SCHEMATA);
EXECUTE IMMEDIATE
  (
  SELECT
    "create temp table project_mozdata AS " || STRING_AGG(FORMAT("""select * from `mozdata`.%s.INFORMATION_SCHEMA.TABLES""", schema_name), "\nunion all\n")
  FROM
    `mozdata`.INFORMATION_SCHEMA.SCHEMATA);
SELECT
  table_schema,
  table_name,
  project_derived.table_type = project_mozdata.table_type AS matched,
  project_derived.table_type AS derived,
  project_mozdata.table_type AS mozdata
FROM
  project_derived
FULL JOIN
  project_mozdata
USING
  (table_schema,
    table_name)
WHERE
  table_schema NOT IN ('analysis',
    'backfill')
ORDER BY
  matched,
  table_schema,
  table_name

From that list, there are missing items in mozdata for:

  • blpadi -> static tables that will need to be copied Done
  • static -> static tables; we should update the machinery/instructions to make sure we publish these to mozdata as well
  • monitoring_derived -> these are deprecated views that can be torn down per https://github.com/mozilla/bigquery-etl/pull/1677
  • taskclusteretl -> these look valid and should probably be ported over, but they aren't defined in bigquery-etl
  • telemetry -> there are many views here which look to have been either created by hand or were deleted from bigquery-etl but never deleted from moz-fx-data-derived-datasets

I copied all the blpadi data:

bq cp moz-fx-data-derived-datasets:blpadi.adi_by_region mozdata:blpadi.adi_by_region
bq cp moz-fx-data-derived-datasets:blpadi.adi_dim_backfill mozdata:blpadi.adi_dim_backfill
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date mozdata:blpadi.adi_dimensional_by_date
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2008 mozdata:blpadi.adi_dimensional_by_date_2008
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2009 mozdata:blpadi.adi_dimensional_by_date_2009
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2010 mozdata:blpadi.adi_dimensional_by_date_2010
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2011 mozdata:blpadi.adi_dimensional_by_date_2011
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2012 mozdata:blpadi.adi_dimensional_by_date_2012
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2013 mozdata:blpadi.adi_dimensional_by_date_2013
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2014 mozdata:blpadi.adi_dimensional_by_date_2014
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2015 mozdata:blpadi.adi_dimensional_by_date_2015
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2016 mozdata:blpadi.adi_dimensional_by_date_2016
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2017 mozdata:blpadi.adi_dimensional_by_date_2017
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2018 mozdata:blpadi.adi_dimensional_by_date_2018
bq cp moz-fx-data-derived-datasets:blpadi.location mozdata:blpadi.location
bq cp moz-fx-data-derived-datasets:blpadi.user_locales mozdata:blpadi.user_locales

I've now published static tables to mozdata.

taskclusteretl

I'll set this up, I think it should probably be syndicated but will require a new option for syndicating a subset of tables. taskclusteretl.task_resolution is invalid since its source table doesn't exist anymore.

I ran ./script/list_broken_views --project-id moz-fx-data-derived-datasets which found the following invalid views which I cleaned up:

  bq rm -f moz-fx-data-derived-datasets:telemetry.active_profiles_v1
  bq rm -f moz-fx-data-derived-datasets:telemetry.android_anr_report
  bq rm -f moz-fx-data-derived-datasets:telemetry.churn
  bq rm -f moz-fx-data-derived-datasets:telemetry.firefox_kpi_dashboard
  bq rm -f moz-fx-data-derived-datasets:telemetry.remote_content_uptake
  bq rm -f moz-fx-data-derived-datasets:telemetry.retention
  bq rm -f moz-fx-data-derived-datasets:telemetry.sync_log
  bq rm -f moz-fx-data-derived-datasets:telemetry.sync_log_device_activity
  bq rm -f moz-fx-data-derived-datasets:telemetry.sync_log_device_counts
  bq rm -f moz-fx-data-derived-datasets:xfocsp_error_report.xfocsp_error_report

Also via ./script/list_broken_views --project-id moz-fx-data-shared-prod:

  bq rm -f moz-fx-data-shared-prod:analysis.clients_last_seen_v2
  bq rm -f moz-fx-data-shared-prod:analysis.monitoring_ingested_ping_counts
  bq rm -f moz-fx-data-shared-prod:org_mozilla_firefox.incline_executive_v1
  bq rm -f moz-fx-data-shared-prod:telemetry.firefox_kpi_dashboard

Added several missing telemetry views in https://github.com/mozilla/bigquery-etl/pull/1720

At this point, I see basically just taskclusteretl remaining. I'm going to move ahead with writing up a Google Doc with draft communication to end users, and start getting review on that from DE, SRE, and some DS folks.

I've started a Google doc for drafting communication about the mozdata transition. It's not yet complete, but review would be welcome as it's evolving.

https://github.com/mozilla-services/cloudops-infra/pull/2840 should add taskclusteretl, and will likely be deployed tomorrow (EDIT: deployed).

I've sent out the first message in the series of user comms for this change:
https://mail.mozilla.org/pipermail/fx-data-dev/2021-January/000396.html

The next message is scheduled for Monday.

Sent out second announcment: https://mail.mozilla.org/pipermail/fx-data-dev/2021-February/000397.html

I'll send out a reminder next Tuesday. Planned cutover day for STMO and analysis dataset is the following week (Feb 16)

Send out a reminder today. We're on track for STMO switchover next week.

Sent out a reminder today. We're going to do the cutover tomorrow. I've prepared a gist for copying the analysis tables.

We've copied over tables, switched STMO, and removed write access to the old analysis datasets. This transition is complete!

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Product: Data Platform and Tools → Data Platform and Tools Graveyard
You need to log in before you can comment on or make changes to this bug.