For reference, the following commands were run in the testing session. I've been notified by Google Cloud Support on case #21629086 that the Dataproc team has been investigating, with an update on 2019-12-23 and 2019-12-26. I'm expecting to hear back again on 2019-12-31. Case management through the console has been unavailable since 2019-12-20, so I have been communicating via email. In order to improve robustness of the service, I'll be implementing an existing pathway for reading avro dumps of the BigQuery `payload_bytes_decoded` tables instead of relying on the Storage API for reading directly from the service. I've tested the existing scripts with the following commands in the `python_mozaggregator` repo: ```bash bin/export-avro.sh moz-fx-data-shar-nonprod-efed amiyaguchi-dev:avro_export gs://amiyaguchi-dev/avro-mozaggregator 2019-12-15 NUM_WORKERS=5 bin/dataproc.sh \ mobile \ --output gs://amiyaguchi-dev/mozaggregator/mobile_test/nonprod/20191215/ \ --num-partitions 200 \ --date 20191215 \ --source avro \ --avro-prefix gs://amiyaguchi-dev/avro-mozaggregator/moz-fx-data-shar-nonprod-efed ``` This results in the following listing ``` $ gsutil ls -r gs://amiyaguchi-dev/mozaggregator/mobile_test/nonprod/20191215 gs://amiyaguchi-dev/mozaggregator/mobile_test/nonprod/20191215/: gs://amiyaguchi-dev/mozaggregator/mobile_test/nonprod/20191215/ gs://amiyaguchi-dev/mozaggregator/mobile_test/nonprod/20191215/_SUCCESS gs://amiyaguchi-dev/mozaggregator/mobile_test/nonprod/20191215/submission_date=20191215/: gs://amiyaguchi-dev/mozaggregator/mobile_test/nonprod/20191215/submission_date=20191215/ gs://amiyaguchi-dev/mozaggregator/mobile_test/nonprod/20191215/submission_date=20191215/part-00000-e72b47cc-92e6-4620-a6e0-1844861aebee.c000.snappy.parquet ``` I'm planning on taking the following route to implement reading from avro: * [mozaggregator] modifying the published docker image `mozilla/python_mozaggregator:latest` to include the google-cloud-sdk * [mozaggregator] updating `bin/export-avro.sh` to accept an argument for the specific table to export * [airflow] add upstream export job within dags/mozaggregator_mobile.py and update arguments to reflect the alternative processing pathway * [airflow] add upstream export job within dags/mozaggregator_prerelease.py and update arguments to reflect the alternative processing pathway
Bug 1605442 Comment 5 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
I've been notified by Google Cloud Support on case #21629086 that the Dataproc team has been investigating, with an update on 2019-12-23 and 2019-12-26. I'm expecting to hear back again on 2019-12-31. Case management through the console has been unavailable since 2019-12-20, so I have been communicating via email. In order to improve robustness of the service, I'll be implementing an existing pathway for reading avro dumps of the BigQuery `payload_bytes_decoded` tables instead of relying on the Storage API for reading directly from the service. I've tested the existing scripts with the following commands in the `python_mozaggregator` repo: ```bash bin/export-avro.sh moz-fx-data-shar-nonprod-efed amiyaguchi-dev:avro_export gs://amiyaguchi-dev/avro-mozaggregator 2019-12-15 NUM_WORKERS=5 bin/dataproc.sh \ mobile \ --output gs://amiyaguchi-dev/mozaggregator/mobile_test/nonprod/20191215/ \ --num-partitions 200 \ --date 20191215 \ --source avro \ --avro-prefix gs://amiyaguchi-dev/avro-mozaggregator/moz-fx-data-shar-nonprod-efed ``` This results in the following listing ``` $ gsutil ls -r gs://amiyaguchi-dev/mozaggregator/mobile_test/nonprod/20191215 gs://amiyaguchi-dev/mozaggregator/mobile_test/nonprod/20191215/: gs://amiyaguchi-dev/mozaggregator/mobile_test/nonprod/20191215/ gs://amiyaguchi-dev/mozaggregator/mobile_test/nonprod/20191215/_SUCCESS gs://amiyaguchi-dev/mozaggregator/mobile_test/nonprod/20191215/submission_date=20191215/: gs://amiyaguchi-dev/mozaggregator/mobile_test/nonprod/20191215/submission_date=20191215/ gs://amiyaguchi-dev/mozaggregator/mobile_test/nonprod/20191215/submission_date=20191215/part-00000-e72b47cc-92e6-4620-a6e0-1844861aebee.c000.snappy.parquet ``` I'm planning on taking the following route to implement reading from avro: * [mozaggregator] modifying the published docker image `mozilla/python_mozaggregator:latest` to include the google-cloud-sdk * [mozaggregator] updating `bin/export-avro.sh` to accept an argument for the specific table to export * [airflow] add upstream export job within dags/mozaggregator_mobile.py and update arguments to reflect the alternative processing pathway * [airflow] add upstream export job within dags/mozaggregator_prerelease.py and update arguments to reflect the alternative processing pathway