Closed Bug 1555162 Opened 5 years ago Closed 5 years ago

GCP infrastructure for libprio processing

Categories

(Data Platform and Tools Graveyard :: Operations, task, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whd, Assigned: hwoo)

References

Details

For the upcoming prio study rollout we need approximately the following infrastructure:

  1. Ingestion Dataflow job for ETL on prio pings
  2. Two GCP projects, along with GCS buckets, KMS (for encrypted server private keys), and service accounts for managing and running Kubernetes clusters.
  3. Relevant permissions between service accounts to manage shared state.
  4. An airflow job or similar, to schedule the execution of the job, currently expected to be run as a weekly batch job.

Some investigation into node-pool IAM permissions assignment and whether to use long-running clusters or EMR-style batch jobs will be required.

Of these, (1) will happen as a matter of course during my normal ingestion work. The other pieces, including relevant investigation, will need to be investigated and handled separately.

Some relevant notes here, and see also related bugs for some context: https://docs.google.com/document/d/1R_u8JqTAupp1Sk6Q7eeHo08GCKY_aB9-Nr6SuztpR6k/edit#

Assignee: nobody → hwoo
Priority: -- → P1
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Product: Data Platform and Tools → Data Platform and Tools Graveyard
You need to log in before you can comment on or make changes to this bug.