Open
Bug 1852266
Opened 2 years ago
Updated 1 years ago
GCP Airflow tasks are failing with common error
Categories
(Data Platform and Tools :: General, defect)
Data Platform and Tools
General
Tracking
(Not tracked)
NEW
People
(Reporter: frank, Assigned: frank)
Details
This looks to be the result of a recent Airflow upgrade. We expect that the google-provided packages had a bug.
Error:
[2023-09-08, 12:11:47 UTC] {pod.py:907} ERROR - 'NoneType' object has no attribute 'metadata'
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 550, in execute_sync
self.remote_pod = self.find_pod(self.pod.metadata.namespace, context=context)
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 492, in find_pod
raise AirflowException(f"More than one pod running with labels {label_selector}")
airflow.exceptions.AirflowException: More than one pod running with labels dag_id=copy_deduplicate,kubernetes_pod_operator=True,run_id=scheduled__2023-09-07T0100000000-51fa1e10e,task_id=copy_deduplicate_all,already_checked!=True,!airflow-worker
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 751, in patch_already_checked
name=pod.metadata.name,
AttributeError: 'NoneType' object has no attribute 'metadata'
Assignee | ||
Updated•2 years ago
|
Assignee: nobody → mducharme
Assignee | ||
Comment 1•2 years ago
|
||
The error on the first log is a 401 unauthorized
on retrieving the pod logs:
[2023-09-08, 02:02:40 UTC] {pod.py:907} ERROR - (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'd32bdcdf-6f7d-4c4b-b6b3-740e1becd5e5', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Fri, 08 Sep 2023 02:02:40 GMT', 'Content-Length': '129'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__
result = fn(*args, **kwargs)
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 368, in consume_logs
logs = self.read_pod_logs(
File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f
return self(f, *args, **kw)
File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
do = self.iter(retry_state=retry_state)
File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 325, in iter
raise retry_exc.reraise()
File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 158, in reraise
raise self.last_attempt.result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__
result = fn(*args, **kwargs)
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 494, in read_pod_logs
logs = self._client.read_namespaced_pod_log(
File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api/core_v1_api.py", line 23747, in read_namespaced_pod_log
return self.read_namespaced_pod_log_with_http_info(name, namespace, **kwargs) # noqa: E501
File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api/core_v1_api.py", line 23866, in read_namespaced_pod_log_with_http_info
return self.api_client.call_api(
File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 348, in call_api
return self.__call_api(resource_path, method,
File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
response_data = self.request(
File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 373, in request
return self.rest_client.GET(url,
File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/rest.py", line 240, in GET
return self.request("GET", url,
File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/rest.py", line 234, in request
raise ApiException(http_resp=r)
Assignee | ||
Comment 2•2 years ago
|
||
Because the 401 happens in the Airflow server, the task fails, but the pod continues running. Then when it spins up the next pod we get the duplicate labels error (which is actually what's happening).
Assignee | ||
Comment 3•2 years ago
|
||
We're rolling back to Airflow 2.5.3.
Assignee | ||
Comment 4•2 years ago
|
||
Airflow downgrade is complete. Restarting jobs now.
Assignee | ||
Comment 5•2 years ago
|
||
Taking over this ticket to backfill affected DAGs.
Assignee: mducharme → fbertsch
You need to log in
before you can comment on or make changes to this bug.
Description
•