Closed Bug 1923176 Opened 1 year ago Closed 1 year ago

Airflow task socorro_import.crash_report_parquet failed for exec_date 2024-10-07

Categories

(Data Platform and Tools :: General, defect)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mwilliams, Assigned: srose)

Details

(Whiteboard: [airflow-triage])

Airflow task socorro_import.crash_report_parquet failed for exec_date 2024-10-07

Task link:
https://workflow.telemetry.mozilla.org/dags/socorro_import/grid?dag_run_id=scheduled__2024-10-06T00%3A00%3A00%2B00%3A00&task_id=crash_report_parquet&tab=logs

Log extract: (nothing useful)

Assignee: nobody → srose
Status: NEW → ASSIGNED

socorro_import.crash_report_parquet is a sub-DAG, so to find the actual error you have to click "Zoom into SubDag" on that task instance's details pane.

The actual task that failed is socorro_import.crash_report_parquet.run_dataproc_pyspark.

Task link:
https://workflow.telemetry.mozilla.org/dags/socorro_import.crash_report_parquet/grid?execution_date=2024-10-06T00%3A00%3A00%2B00%3A00&dag_run_id=scheduled__2024-10-06T00%3A00%3A00%2B00%3A00&task_id=run_dataproc_pyspark&tab=logs

Log extract:

google.api_core.exceptions.NotFound: 404 Not found: Cluster projects/airflow-dataproc/regions/us-west1/clusters/socorro-import-dataproc-cluster

Which is weird because the preceding socorro_import.crash_report_parquet.create_dataproc_cluster task reported that it successfully created that cluster. However, I did notice that after the create_dataproc_cluster task ran there was a 12.6 hour delay before the run_dataproc_pyspark task started running (likely due to the Airflow-negative-pool-slots issue), so I'm guessing the cluster got auto-deleted in the meantime.

Re-running the socorro_import.crash_report_parquet sub-DAG succeeded.

However, it's worth noting that clearing the crash_report_parquet task instance in the socorro_import DAG with downstream+recursive selected wasn't sufficient to do that, I also had to clear the sub-DAG run from the zoomed-in sub-DAG view.

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.