[Meta] mozdata project as new home for user queries
Categories
(Data Platform and Tools Graveyard :: Operations, enhancement, P3)
Tracking
(Not tracked)
People
(Reporter: klukas, Assigned: klukas)
References
Details
Tracking bug for implementing the Project Layout for BigQuery Users (mozdata) proposal.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 1•4 years ago
•
|
||
Here's a query that compares view listings in mozdata to derived-datasets:
EXECUTE IMMEDIATE
(
SELECT
"create temp table project_derived AS " || STRING_AGG(FORMAT("""select * from `moz-fx-data-derived-datasets`.%s.INFORMATION_SCHEMA.TABLES""", schema_name), "\nunion all\n")
FROM
`moz-fx-data-derived-datasets`.INFORMATION_SCHEMA.SCHEMATA);
EXECUTE IMMEDIATE
(
SELECT
"create temp table project_mozdata AS " || STRING_AGG(FORMAT("""select * from `mozdata`.%s.INFORMATION_SCHEMA.TABLES""", schema_name), "\nunion all\n")
FROM
`mozdata`.INFORMATION_SCHEMA.SCHEMATA);
SELECT
table_schema,
table_name,
project_derived.table_type = project_mozdata.table_type AS matched,
project_derived.table_type AS derived,
project_mozdata.table_type AS mozdata
FROM
project_derived
FULL JOIN
project_mozdata
USING
(table_schema,
table_name)
WHERE
table_schema NOT IN ('analysis',
'backfill')
ORDER BY
matched,
table_schema,
table_name
From that list, there are missing items in mozdata for:
blpadi -> static tables that will need to be copiedDone- static -> static tables; we should update the machinery/instructions to make sure we publish these to mozdata as well
- monitoring_derived -> these are deprecated views that can be torn down per https://github.com/mozilla/bigquery-etl/pull/1677
- taskclusteretl -> these look valid and should probably be ported over, but they aren't defined in bigquery-etl
- telemetry -> there are many views here which look to have been either created by hand or were deleted from bigquery-etl but never deleted from moz-fx-data-derived-datasets
Assignee | ||
Comment 2•4 years ago
|
||
I copied all the blpadi data:
bq cp moz-fx-data-derived-datasets:blpadi.adi_by_region mozdata:blpadi.adi_by_region
bq cp moz-fx-data-derived-datasets:blpadi.adi_dim_backfill mozdata:blpadi.adi_dim_backfill
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date mozdata:blpadi.adi_dimensional_by_date
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2008 mozdata:blpadi.adi_dimensional_by_date_2008
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2009 mozdata:blpadi.adi_dimensional_by_date_2009
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2010 mozdata:blpadi.adi_dimensional_by_date_2010
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2011 mozdata:blpadi.adi_dimensional_by_date_2011
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2012 mozdata:blpadi.adi_dimensional_by_date_2012
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2013 mozdata:blpadi.adi_dimensional_by_date_2013
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2014 mozdata:blpadi.adi_dimensional_by_date_2014
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2015 mozdata:blpadi.adi_dimensional_by_date_2015
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2016 mozdata:blpadi.adi_dimensional_by_date_2016
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2017 mozdata:blpadi.adi_dimensional_by_date_2017
bq cp moz-fx-data-derived-datasets:blpadi.adi_dimensional_by_date_2018 mozdata:blpadi.adi_dimensional_by_date_2018
bq cp moz-fx-data-derived-datasets:blpadi.location mozdata:blpadi.location
bq cp moz-fx-data-derived-datasets:blpadi.user_locales mozdata:blpadi.user_locales
Assignee | ||
Comment 3•4 years ago
|
||
I've now published static tables to mozdata
.
Comment 4•4 years ago
|
||
taskclusteretl
I'll set this up, I think it should probably be syndicated but will require a new option for syndicating a subset of tables. taskclusteretl.task_resolution
is invalid since its source table doesn't exist anymore.
Assignee | ||
Comment 5•4 years ago
|
||
I ran ./script/list_broken_views --project-id moz-fx-data-derived-datasets
which found the following invalid views which I cleaned up:
bq rm -f moz-fx-data-derived-datasets:telemetry.active_profiles_v1
bq rm -f moz-fx-data-derived-datasets:telemetry.android_anr_report
bq rm -f moz-fx-data-derived-datasets:telemetry.churn
bq rm -f moz-fx-data-derived-datasets:telemetry.firefox_kpi_dashboard
bq rm -f moz-fx-data-derived-datasets:telemetry.remote_content_uptake
bq rm -f moz-fx-data-derived-datasets:telemetry.retention
bq rm -f moz-fx-data-derived-datasets:telemetry.sync_log
bq rm -f moz-fx-data-derived-datasets:telemetry.sync_log_device_activity
bq rm -f moz-fx-data-derived-datasets:telemetry.sync_log_device_counts
bq rm -f moz-fx-data-derived-datasets:xfocsp_error_report.xfocsp_error_report
Also via ./script/list_broken_views --project-id moz-fx-data-shared-prod
:
bq rm -f moz-fx-data-shared-prod:analysis.clients_last_seen_v2
bq rm -f moz-fx-data-shared-prod:analysis.monitoring_ingested_ping_counts
bq rm -f moz-fx-data-shared-prod:org_mozilla_firefox.incline_executive_v1
bq rm -f moz-fx-data-shared-prod:telemetry.firefox_kpi_dashboard
Assignee | ||
Comment 6•4 years ago
|
||
Added several missing telemetry
views in https://github.com/mozilla/bigquery-etl/pull/1720
At this point, I see basically just taskclusteretl
remaining. I'm going to move ahead with writing up a Google Doc with draft communication to end users, and start getting review on that from DE, SRE, and some DS folks.
Assignee | ||
Comment 7•4 years ago
|
||
I've started a Google doc for drafting communication about the mozdata transition. It's not yet complete, but review would be welcome as it's evolving.
Comment 8•4 years ago
•
|
||
https://github.com/mozilla-services/cloudops-infra/pull/2840 should add taskclusteretl
, and will likely be deployed tomorrow (EDIT: deployed).
Assignee | ||
Comment 9•4 years ago
|
||
I've sent out the first message in the series of user comms for this change:
https://mail.mozilla.org/pipermail/fx-data-dev/2021-January/000396.html
The next message is scheduled for Monday.
Assignee | ||
Comment 10•4 years ago
|
||
Sent out second announcment: https://mail.mozilla.org/pipermail/fx-data-dev/2021-February/000397.html
I'll send out a reminder next Tuesday. Planned cutover day for STMO and analysis dataset is the following week (Feb 16)
Assignee | ||
Comment 11•4 years ago
|
||
Send out a reminder today. We're on track for STMO switchover next week.
Assignee | ||
Comment 12•4 years ago
|
||
Sent out a reminder today. We're going to do the cutover tomorrow. I've prepared a gist for copying the analysis tables.
Assignee | ||
Comment 13•4 years ago
|
||
We've copied over tables, switched STMO, and removed write access to the old analysis datasets. This transition is complete!
Updated•2 years ago
|
Description
•