Closed Bug 1476361 Opened 6 years ago Closed 6 years ago

New public-readable bucket writeable from databricks for federated learning study model results

Categories

(Data Platform and Tools Graveyard :: Operations, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bugzilla, Assigned: jason)

References

Details

The federated learning streaming job is going to output a model in json every 30 minutes, which clients enrolled in the study will fetch periodically and use. Since the streaming job is going to be running on databricks, the bucket needs to be writeable from there as well as being world-readable.
Job also needs permission for reading from Kafka (currently we use `databricks-ec2` role for this).
In Bug 1455725 we created a s3 bucket that is publicly accessible + CDN endpoint. Can we use that instead? If so: S3 bucket = net-mozaws-prod-us-west-2-data-public CDN Endpoint = https://public-data.telemetry.mozilla.org You need to make sure you set the object acl to 'public-read'. # tested from shared_serverless cluster curl https://public-data.telemetry.mozilla.org/jthomas/testacl1.txt yay%
https://dbc-caf9527b-e073.cloud.databricks.com/#notebook/9388/command/9389 is the notebook I used to write the data if you need an example.
Fantastic, that'll work for us -- we'll put our stuff under the /awesomebar_study/ key
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Assignee: nobody → jthomas
Product: Data Platform and Tools → Data Platform and Tools Graveyard
You need to log in before you can comment on or make changes to this bug.