We need a bucket in which to store report outputs, intermediate analysis data, results, and other derived data sets. The bucket should be readable and writable by users of telemetry-dash.m.o (Spark in particular), as well as by the pipeline-dev-iam-access-IamRole-UVGYDHTV1VZD role in "new dev".
Would it be possible to expedite this? All of my v4 validation work requires consolidating v4 data by clientId, deduping, etc, which can take hours-- and since telemetry self-serve analysis clusters are killed every 24hrs, I have to redo this *every day*, which means that many days I can't get around to doing actual work. Until Mark has built the tools to just provide consolidated v4 data, I really need a place to dump my cleaned data sets so that I only have to run these cleaning scripts like once per week or every few days, and can operate on the cleaned data without having to completely reprocess it every time. Can we aim to have this done this week?
Priority: -- → P1
The bucket net-mozaws-prod-us-west-2-pipeline-analysis has been created for this purpose. It should have S3:GetBucketLocation, S3:ListBucket, S3:PutObject, S3:GetObject, S3:DeleteObject permissions from the telemetry-spark-emr role in old dev and pipeline-dev-iam-access-IamRole-UVGYDHTV1VZD in new dev, in addition to Saptarshi's IAM.
For now, please prefix any temp/intermediate data with your username to avoid conflicts and help keep things organized. So s3://net-mozaws-prod-us-west-2-pipeline-analysis/mreid/awesome_data_set_3/...
Brendan, please reopen if you have any issues using the bucket for intermediate storage.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.