Closed Bug 1777705 Opened 2 years ago Closed 2 years ago

Airflow task bhr_collection.bhr_collection.run_dataproc_pyspark failing on 2022-06-30, 07:00:00

Categories

(Data Platform and Tools :: General, defect)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kik, Assigned: alexical)

Details

(Whiteboard: [airflow-triage])

Airflow task bhr_collection.bhr_collection.run_dataproc_pyspark failing on 2022-06-30, 07:00:00

Error log extract:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1165, in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context, task)
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1283, in _prepare_and_execute_task_with_callbacks
    result = self._execute_task(context, task_copy)
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1313, in _execute_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.7/site-packages/airflow/providers/google/cloud/operators/dataproc.py", line 1627, in execute
    super().execute(context)
  File "/usr/local/lib/python3.7/site-packages/airflow/providers/google/cloud/operators/dataproc.py", line 1067, in execute
    project_id=self.project_id, job=self.job["job"], region=self.region
  File "/usr/local/lib/python3.7/site-packages/airflow/providers/google/common/hooks/base_google.py", line 425, in inner_wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airflow/providers/google/cloud/hooks/dataproc.py", line 948, in submit_job
    metadata=metadata,
  File "/usr/local/lib/python3.7/site-packages/google/cloud/dataproc_v1beta2/services/job_controller/client.py", line 413, in submit_job
    response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
  File "/usr/local/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 69, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.NotFound: 404 Not found: Cluster projects/airflow-dataproc/regions/us-west1/clusters/bhr-collection-2022-06-30

Airflow task logs:
https://workflow.telemetry.mozilla.org/log?dag_id=bhr_collection.bhr_collection&task_id=run_dataproc_pyspark&execution_date=2022-06-30T05%3A00%3A00%2B00%3A00

Dataproc log link:
https://console.cloud.google.com/dataproc/jobs/bhr-collection_617fae4a/monitoring?region=us-west1&project=airflow-dataproc

Extract from dataproc logs:

Processing modules...
Processing modules took 0ms to complete
22/07/01 11:00:16 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 138.0 in stage 5.0 (TID 2141, bhr-collection-2022-06-30-w-1.c.airflow-dataproc.internal, executor 20): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:490)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:479)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:597)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:575)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:410)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1209)
	at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1215)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:188)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:414)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:417)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.EOFException
	at java.io.DataInputStream.readInt(DataInputStream.java:392)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:582)
	... 16 more
Assignee: nobody → dothayer
Component: Datasets: General → General

This looks to be resolved, the Airflow tasks were re-ran successfully. Kik, Doug, if that doesn't seem correct, feel free to re-open.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.