Closed Bug 1932180 Opened 1 year ago Closed 1 year ago

Airflow dag bqetl_google_search_console failed for exec_date 2024-11-18

Categories

(Data Platform and Tools :: General, defect)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: benwu, Assigned: srose)

Details

(Whiteboard: [airflow-triage])

Attachments

(2 files)

The BigQueryTablePartitionExistenceSensor tasks in bqetl_google_search_console failed once or twice on 2024-11-18. This doesn't seem to be a recurring issue and a retry worked so this may have been a transient bigquery api issue caused by a backend change

Log extract:

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/jobs/triggerer_job_runner.py", line 529, in cleanup_finished_triggers
    result = details["task"].result()
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/jobs/triggerer_job_runner.py", line 601, in run_trigger
    async for event in trigger.run():
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/google/cloud/triggers/bigquery.py", line 752, in run
    job_id = await hook.create_job_for_partition_get(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/google/cloud/hooks/bigquery.py", line 3514, in create_job_for_partition_get
    job_query_resp = await job_client.query(query_request, cast(Session, session))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/gcloud/aio/bigquery/job.py", line 123, in query
    return await self._post_json(url, query_request, session, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/gcloud/aio/bigquery/bigquery.py", line 121, in _post_json
    resp = await s.post(url, data=payload, headers=headers,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/gcloud/aio/auth/session.py", line 161, in post
    await _raise_for_status(resp)
  File "/home/airflow/.local/lib/python3.11/site-packages/gcloud/aio/auth/session.py", line 123, in _raise_for_status
    raise aiohttp.ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 400, message='Bad Request: {\n  "error": {\n    "code": 400,\n    "message": "Unrecognized name: table_id at [1:109]",\n    "errors": [\n      {\n        "message": "Unrecognized name: table_id at [1:109]",\n        "domain": "global",\n        "reason": "invalidQuery",\n        "location": "q",\n        "locationType": "parameter"\n      }\n    ],\n    "status": "INVALID_ARGUMENT"\n  }\n}\n', url=URL('https://www.googleapis.com/bigquery/v2/projects/moz-fx-data-marketing-prod/queries')```
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED

These failed again today with the same error so maybe this is worth some investigation. Retrying worked again

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee: nobody → srose

This turns out to be caused by an Airflow bugfix which had bugs of its own. I've submitted a fix for those bugs, but there's no telling when that might get merged, included in an apache-airflow-providers-google release, or we're able to upgrade telemetry-airflow to that release. So in the meantime, I've switched our BigQueryTablePartitionExistenceSensors to run in reschedule mode instead of deferrable mode, which should prevent further BigQueryTablePartitionExistenceSensor failures due to that bug for the time being.

However, looking at that buggy Airflow bugfix also brought to light that we were affected by the pretty egregious bug it was fixing, where if the table partition didn't exist when the sensor first ran then the next time it checked it would accept any partition for the same date in any table in the same dataset. It turns out this caused us to miss Google Search Console data for some websites on 12 dates:

  • 2024-07-04 (getpocket.com, support.mozilla.org)
  • 2024-07-14 (addons.mozilla.org)
  • 2024-07-22 (developer.mozilla.org)
  • 2024-08-14 (developer.mozilla.org)
  • 2024-08-19 (support.mozilla.org)
  • 2024-09-02 (addons.mozilla.org, blog.mozilla.org)
  • 2024-09-03 (addons.mozilla.org, blog.mozilla.org)
  • 2024-09-09 (support.mozilla.org)
  • 2024-10-06 (addons.mozilla.org, blog.mozilla.org)
  • 2024-10-07 (addons.mozilla.org, blog.mozilla.org)
  • 2024-10-15 (developer.mozilla.org)
  • 2024-11-03 (developer.mozilla.org, getpocket.com, www.mozilla.org)

I manually re-ran the bqetl_google_search_console DAG to backfill data for those dates.

Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: