Closed Bug 1606361 Opened 6 years ago Closed 6 years ago

Update mozaggregator pipeline to fallback to avro dumps of BigQuery tables

Categories

(Data Platform and Tools :: General, task, P1)

task
Points:
2

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: amiyaguchi, Assigned: amiyaguchi)

References

Details

Attachments

(3 files)

The spark-bigquery connector is in beta and shown to be unreliable at this stage, with 12 days of downtime as of today. We should implement a fallback route for processing data using avro dumps of the payload_bytes_decoded tables.

See https://bugzilla.mozilla.org/show_bug.cgi?id=1605442#c5 for details.

Assignee: nobody → amiyaguchi
Points: --- → 2
Priority: -- → P1

After implementing the pathway for mobile_aggregates which have also been affected by byg 1605442, I've noticed that this solution won't be sufficient to backfill the aggregates database.

There are currently 12 days of backfill that need to be completed. From an earlier dump from 20191101:

$ gsutil ls -lh gs://amiyaguchi-dev/avro-mozaggregator/moz-fx-data-shared-prod/20191101/main_v4/ | tail -n1
TOTAL: 3416 objects, 2718035349986 bytes (2.47 TiB)

$ gsutil ls -lh gs://amiyaguchi-dev/avro-mozaggregator/moz-fx-data-shared-prod/20191101/mobile_metrics_v1/ | tail -n1
TOTAL: 34 objects, 34809544355 bytes (32.42 GiB)

$ gsutil ls -lh gs://amiyaguchi-dev/avro-mozaggregator/moz-fx-data-shared-prod/20191101/saved_session_v4/ | tail -n1
TOTAL: 66 objects, 68686737989 bytes (63.97 GiB)

Exports are subject to BigQuery export limits (10TB), so we cannot backfill more than 4 days of data for pre-release aggregates using this method.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Component: Datasets: Telemetry Aggregates → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: