Multiple tasks failing in bqetl_fxa_events Airflow DAG on 2022-06-23
Categories
(Data Platform and Tools :: General, defect)
Tracking
(Not tracked)
People
(Reporter: akomar, Assigned: klukas)
Details
(Whiteboard: [airflow-triage])
Multiple tasks failing in bqetl_fxa_events Airflow DAG on 2022-06-23
Assignee | ||
Comment 1•2 years ago
|
||
Invalid schema update. Field
jsonPayload.fields.metricsoptoutat has changed type from STRING to FLOAT
This has happened several times before and I have usually contacted :jbuck for intervention upstream. We sometimes have had to modify the queries to skip these fields, but sometimes he has been able to get the tables recreated with the correct type.
This failure case exists because the fxa logs sink is set up with the older sharded table approach rather than using date-partitioned tables. So it's possible to have schema drift between different days.
?ni :relud - You have the most context on this fxa ETL besides me. Would you be willing to be named owner for these queries and the overall DAG going forward?
?ni :jbuck - It looks like moz-fx-fxa-prod-0712.fxa_prod_logs.docker_fxa_auth_*
had a type regression. Can you take a look at what's going on upstream?
Comment 2•2 years ago
|
||
Alright, looks like it only broke on a single day so I'll fix tomorrow's schema today, then fix today's schema tomorrow.
Should I switch the BQ sink to a partitioned table? Seems like it'd fix this particular problem anyways...
Assignee | ||
Comment 3•2 years ago
|
||
Alright, looks like it only broke on a single day so I'll fix tomorrow's schema today, then fix today's schema tomorrow.
Sounds great. Thank you!
Should I switch the BQ sink to a partitioned table? Seems like it'd fix this particular problem anyways
I do think this would be a good long-term stability improvement, but we shouldn't do it piecemeal. We should transition all the logs together, since it will require coordinated changes to the queries to reference the new names and change how we reference dates.
If we can have both old and new sinks running concurrently for a few days, that would be ideal for a seamless transition.
Assignee | ||
Comment 5•2 years ago
|
||
(In reply to Daniel Thorn [:relud] from comment #4)
yeah, i can be the owner on this going forward
Comment 6•2 years ago
|
||
(In reply to Jeff Klukas [:klukas] (UTC-4) from comment #3)
If we can have both old and new sinks running concurrently for a few days, that would be ideal for a seamless transition.
Oh duh, right, I can just setup parallel sinks. I'll file a bug to get this migration started.
As for tomorrow's schema, I've created the table with the fixed schema.
Assignee | ||
Comment 7•2 years ago
|
||
This failed again last night, but this time with:
Cannot read field of type FLOAT64 as STRING Field: jsonPayload.fields.metricsoptoutat
:jbuck - Can you take another look?
Comment 8•2 years ago
|
||
:klukas - okay, I can fix the 23rd table now. Because I need to look this up every time, here's the query I'm going to run
SELECT
* REPLACE (
(
SELECT AS STRUCT
jsonPayload.* REPLACE (
(
SELECT AS STRUCT
jsonPayload.fields.* REPLACE (
CAST(jsonPayload.fields.metricsoptoutat AS STRING) AS metricsoptoutat
)
) AS fields
)
) AS jsonPayload
)
FROM
`moz-fx-fxa-prod-0712.fxa_prod_logs.docker_fxa_auth_20220623`
:klukas - this query has been run now, can you trigger the backfill and see if everything works correctly?
Assignee | ||
Comment 9•2 years ago
|
||
Thank you!
Looks like the affected queries ran successfully. Just waiting on all the downstream tasks to finish now.
Assignee | ||
Comment 10•2 years ago
|
||
Downstream ETL has all finished successfully except for https://workflow.telemetry.mozilla.org/tree?dag_id=bqetl_mozilla_vpn_site_metrics where we are still failing the "empty check", which I don't understand. ?ni :relud on that.
Assignee | ||
Comment 12•2 years ago
|
||
(In reply to Daniel Thorn [:relud] from comment #11)
that's unrelated to fxa events
Then we're done here. Thanks all!
Comment 13•2 years ago
|
||
I filed https://mozilla-hub.atlassian.net/browse/SVCSE-644 for follow-up with partitioned tables
Updated•2 years ago
|
Description
•