Closed Bug 1844886 Opened 2 years ago Closed 6 months ago

Airflow task bqetl_main_summary .client_probe_processes__v1 failed for 2023-07-21

Categories

(Data Platform and Tools :: General, defect)

defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: wichan, Unassigned)

Details

(Whiteboard: [airflow-triage])

Attachments

(1 file)

Airflow task bqetl_main_summary .client_probe_processes__v1 failed for 2023-07-21

Task link:
https://prod.telemetry-airflow.prod.dataservices.mozgcp.net/dags/bqetl_main_summary/grid?dag_run_id=scheduled__2023-07-20T02%3A00%3A00%2B00%3A00&task_id=client_probe_processes__v1

Log extract:

[2023-07-22, 00:37:49 UTC] {pod_manager.py:235} INFO - Error in query string: Error processing job 'moz-fx-data-shared-
[2023-07-22, 00:37:49 UTC] {pod_manager.py:235} INFO - prod:bqjob_r43a55583b369dc7e_000001897b07fabe_1': Queries in UNION ALL have
[2023-07-22, 00:37:49 UTC] {pod_manager.py:235} INFO - mismatched column count; query 1 has 15 columns, query 2 has 16 columns; failed
[2023-07-22, 00:37:49 UTC] {pod_manager.py:235} INFO - to parse view 'moz-fx-data-shared-prod.telemetry.client_probe_counts' at [7:3]
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO - Traceback (most recent call last):
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -   File "<string>", line 1, in <module>
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -   File "/app/bigquery_etl/cli/__init__.py", line 74, in cli
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -     group(prog_name=prog_name)
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -   File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -     return self.main(*args, **kwargs)
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -   File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -     rv = self.invoke(ctx)
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -   File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -     return _process_result(sub_ctx.command.invoke(sub_ctx))
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -   File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -     return _process_result(sub_ctx.command.invoke(sub_ctx))
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -   File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -     return ctx.invoke(self.callback, **ctx.params)
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -   File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -     return __callback(*args, **kwargs)
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -   File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -     return f(get_current_context(), *args, **kwargs)
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -   File "/app/bigquery_etl/cli/query.py", line 832, in run
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -     _run_query(
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -   File "/app/bigquery_etl/cli/query.py", line 936, in _run_query
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -     subprocess.check_call(["bq"] + query_arguments, stdin=query_stream)
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -   File "/usr/local/lib/python3.10/subprocess.py", line 369, in check_call
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO -     raise CalledProcessError(retcode, cmd)
[2023-07-22, 00:37:50 UTC] {pod_manager.py:235} INFO - subprocess.CalledProcessError: Command '['bq', 'query', '--dataset_id=telemetry_derived', '--project_id=moz-fx-data-shared-prod', '--destination_table=mozilla-public-data:telemetry_derived.client_probe_processes_v1']' returned non-zero exit status 1.
[2023-07-22, 00:37:52 UTC] {pod_manager.py:288} INFO - Pod client-probe-processes--v1-868f1goe has phase Running
[2023-07-22, 00:37:54 UTC] {kubernetes_pod.py:691} INFO - Skipping deleting pod: client-probe-processes--v1-868f1goe
[2023-07-22, 00:37:54 UTC] {taskinstance.py:1776} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/google/cloud/operators/kubernetes_engine.py", line 532, in execute
    result = super().execute(context)
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 516, in execute
    return self.execute_sync(context)
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 545, in execute_sync
    self.cleanup(
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 671, in cleanup
    raise AirflowException(
airflow.exceptions.AirflowException: Pod client-probe-processes--v1-868f1goe returned a failure:

https://console.cloud.google.com/kubernetes/pod/us-west1/workloads-prod-v1/default/client-probe-processes--v1-868f1goe/details?project=moz-fx-data-airflow-gke-prod

This seems related to having non-normalized aggregations available in GLAM now. The histogram tables have a non_norm_aggregates column while the scalar tables don't. cc are there plans to add this to the scalar tables or no?

Flags: needinfo?(efilho)

Because the normalization in question is a histogram normalization, scalars are already not normalized for that. What I'm currently doing for Glam's sake is add a non_norm_aggregates column before exporting the data. See this open PR:
https://github.com/mozilla/bigquery-etl/pull/4107/files

Flags: needinfo?(efilho)

I'm going to have the columns in the actual histogram + scalar tables instead of fixing the view query.

The job worked a second time by itself and the tables are updated to allow UNION between them. I'm considering this as fixed.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED

We can mark them as success

Flags: needinfo?(efilho)
Status: REOPENED → RESOLVED
Closed: 2 years ago6 months ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: