Closed
Bug 1648327
Opened 5 years ago
Closed 5 years ago
Airflow times out while monitoring Cloud Dataflow jobs
Categories
(Data Platform and Tools Graveyard :: Operations, defect)
Data Platform and Tools Graveyard
Operations
Tracking
(Not tracked)
RESOLVED
INVALID
People
(Reporter: vng, Unassigned)
Details
Attachments
(1 file)
|
224.16 KB,
image/png
|
Details |
I'm getting a timeout in airflow/contrib/kubernetes/pod_launcher.py when monitoring a Dataflow job in Airflow:
[2020-06-25 01:51:37,212] {logging_mixin.py:112} INFO - [2020-06-25 01:51:37,212] {pod_launcher.py:125} INFO - WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['--iso-date=20200614', '--gcp-project=moz-fx-data-taar-nonprod-48b6', '--avro-gcs-bucket=moz-fx-data-taar-nonprod-48b6-stage-etl', '--bigtable-instance-id=taar-stage-202006', '--gcs-to-bigtable']
[2020-06-25 01:56:44,686] {taskinstance.py:1088} ERROR - ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/airflow/models/taskinstance.py", line 955, in _run_raw_task
result = task_copy.execute(context=context)
File "/app/dags/operators/gcp_container_operator.py", line 96, in execute
result = super(UpstreamGKEPodOperator, self).execute(context) # Moz specific
File "/app/dags/operators/backport/kubernetes_pod_operator_1_10_7.py", line 251, in execute
get_logs=self.get_logs)
File "/usr/local/lib/python2.7/site-packages/airflow/contrib/kubernetes/pod_launcher.py", line 117, in run_pod
return self._monitor_pod(pod, get_logs)
File "/usr/local/lib/python2.7/site-packages/airflow/contrib/kubernetes/pod_launcher.py", line 124, in _monitor_pod
for line in logs:
File "/usr/local/lib/python2.7/site-packages/urllib3/response.py", line 808, in __iter__
for chunk in self.stream(decode_content=True):
File "/usr/local/lib/python2.7/site-packages/urllib3/response.py", line 572, in stream
for line in self.read_chunked(amt, decode_content=decode_content):
File "/usr/local/lib/python2.7/site-packages/urllib3/response.py", line 793, in read_chunked
self._original_response.close()
File "/usr/local/lib/python2.7/contextlib.py", line 35, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python2.7/site-packages/urllib3/response.py", line 455, in _error_catcher
raise ProtocolError("Connection broken: %r" % e, e)
ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
[2020-06-25 01:56:44,697] {taskinstance.py:1119} INFO - Marking task as FAILED.
[2020-06-25 01:56:44,712] {logging_mixin.py:112} INFO - [2020-06-25 01:56:44,712] {log_email_backend.py:54} INFO -
Content-Type: multipart/mixed; boundary="===============5139698362747261446=="
MIME-Version: 1.0
Subject: Airflow alert: <TaskInstance:
taar_weekly.dataflow_import_avro_to_bigtable 2020-06-14T00:00:00+00:00
[failed]>
At the time the job failed - I have checked the the Cloud Dataflow job is still running taar-profile-load-20200614 with job id: 2020-06-24_18_51_37-17541505251849162564
Retrying the job isn't appropriate as the job has not actually failed - it is still executing.
| Reporter | ||
Comment 1•5 years ago
|
||
This looks like : https://issues.apache.org/jira/browse/AIRFLOW-5571
| Reporter | ||
Comment 2•5 years ago
|
||
I'm just going to disable logs per https://github.com/mozilla/telemetry-airflow/issues/844
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → INVALID
Comment 3•5 years ago
|
||
yup, unfortunately this is the way right now. To be clear, this is a GKEPodOperator issue, not a dataflow one.
Updated•3 years ago
|
Product: Data Platform and Tools → Data Platform and Tools Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•