Closed Bug 1770823 Opened 3 years ago Closed 3 years ago

Airflow task taar_weekly.dataflow_import_avro_to_bigtable failing on 2022-05-22

Categories

(Data Platform and Tools :: General, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: linh, Assigned: epavlov)

Details

(Whiteboard: [airflow-triage])

The Airflow task taar_weekly.dataflow_import_avro_to_bigtable failed on 2022-05-22. See error log here.

The Airflow task taar_weekly.dataflow_import_avro_to_bigtable failed on 2022-07-10.
The Airflow task taar_weekly.taar_lite https://workflow.telemetry.mozilla.org/tree?dag_id=taar_daily

The issues seems to be : pyspark.sql.utils.AnalysisException: 'Path does not exist: gs://moz-fx-data-derived-datasets-parquet/clients_daily/v6/submission_date_s3=20220720;'

Is gs://moz-fx-data-derived-datasets-parquet/clients_daily/v6 still a relevant data source or it was deprecated and we should migrate to something else?

if you can migrate to reading mozdata.telemetry.clients_daily (or moz-fx-data-shared-prod.telemetry_derived.clients_daily_v6), that would be good.

the parquet export for gs://moz-fx-data-derived-datasets-parquet/clients_daily/v6/submission_date_s3=20220720 seems to have started failing while i was on PTO. it's still maintained exclusively for this use case, and I haven't had a chance to investigate yet.

:relud is it just replacing the name or switching to reading from BigQuery?

If the latter, it would require changing and retesting three TAAR jobs to query BigQuery instead of Spark and there's nobody to work on that. I'm still the maintainer but I'm at Pocket now, so I can only make quick fixes. Maybe the data engineering has the resources to work on this? Or we can just stick to the old scheme with parquet.

Flags: needinfo?(dthorn)

definitely stick with parquet then

Flags: needinfo?(dthorn)
Component: Datasets: General → General

parquet exports are fixed now

Job is now working. Evgeny, do we need to backfill the failed tasks, or can we mark this complete?

Flags: needinfo?(epavlov)
Priority: -- → P1

We don't need to backfill, closing

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(epavlov)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.