Closed
Bug 1455383
Opened 7 years ago
Closed 7 years ago
Move core ping d2p output to (submission_date_s3, app_name, os) partitioning
Categories
(Data Platform and Tools :: General, enhancement, P1)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: frank, Assigned: relud)
References
Details
We should also move existing parquet output over. This will have to be done as a Spark job, since e.g. os is not an existing partition.
| Reporter | ||
Updated•7 years ago
|
Component: Datasets: General → Datasets: Mobile
| Reporter | ||
Updated•7 years ago
|
Points: --- → 3
Priority: -- → P3
| Assignee | ||
Updated•7 years ago
|
Assignee: nobody → dthorn
Points: 3 → 2
Priority: P3 → P2
| Assignee | ||
Comment 1•7 years ago
|
||
will bump to P1 on friday
| Assignee | ||
Comment 2•7 years ago
|
||
plan:
1. backfill from v2 to v3, up to the 'yesterday' (relative to step 3)
2. notify fx-data-dev@mozilla.org of a cutover period
3. merge and deploy https://github.com/mozilla-services/puppet-config/pull/2726 and https://github.com/mozilla-services/mozilla-pipeline-schemas/pull/149
4. backfill partial day when deploy from step 4 completes
5. at the same time as step 4, update athena/presto tables to v3
Comment 3•7 years ago
|
||
Since these appear to be entirely output schema changes, I'd prefer to run multiple outputs for a time to avoid a cutover period.
It is my understanding that the timing of (5), unless we explicitly configure p2h otherwise, will happen automatically as the new version of the dataset becomes available, even in partial form. If the goal is to have the unversioned pointer for the dataset only point to the latest version when it's fully backfilled, we will need to manage that explicitly.
| Assignee | ||
Comment 4•7 years ago
|
||
running multiple outputs sounds like a good plan. i'll update my PRs for multiple outputs.
i'm going to backfill into the telemetry-backfill bucket, so as not to trigger automatic table detection. so the unversioned table pointer will get updated when we start the second output, at which time i will copy the backfill over.
| Assignee | ||
Comment 5•7 years ago
|
||
this will be a backwards incompatible change, as the column channel will become metadata.normalizedChannel
I will notify fx-data-dev@mozilla.org of the upcoming change, asking people dependent on the column to change to using the versioned table. Then I'll track down recurring queries in STMO that will break, and make sure those get updated.
:frank does that sound good to you?
Flags: needinfo?(fbertsch)
Comment 6•7 years ago
|
||
I'm currently aiming to deploy these changes Tuesday morning (May 1).
| Assignee | ||
Updated•7 years ago
|
Priority: P2 → P1
| Assignee | ||
Comment 7•7 years ago
|
||
stmo query used to find and update queries to avoid breakage: https://sql.telemetry.mozilla.org/queries/52792/source
| Assignee | ||
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
| Reporter | ||
Updated•7 years ago
|
Flags: needinfo?(fbertsch)
Updated•3 years ago
|
Component: Datasets: Mobile → General
You need to log in
before you can comment on or make changes to this bug.
Description
•